超融合数据平台:集消息,计算和存储于一体 (英语演讲)

所属专题:主题演讲

所属领域:

嘉宾 : Sanjeev Kulkarni | StreamlioCo-Founder & CTO

会议室 : 大宴会厅

讲师介绍

主题演讲嘉宾:Sanjeev Kulkarni

Streamlio Co-Founder & CTO

Sanjeev Kulkarni is the co-founder of Streamlio that focuses on building next generation real time processing engines. Before Streamlio, he was the technical lead for real-time analytics at Twitter where he co-created Twitter Heron. Before that, we was at Locomatix where he handled their engineering stack. Before that he worked in the Adsense team at Google leading several initiatives. He has a MS. in computer science from the University of Wisconsin, Madison.

Sanjeev Kulkarni 是 Streamlio 公司的联合创始人,专注于构建下一代实时数据处理系统。在创办 Streamlio 之前,他是 Twitter 实时分析团队的技术主管,也是 Twitter Heron 的主要作者之一。更早前,Sanjeev 在 Locomatix 公司负责公司的整个技术栈,在 Google 公司的 Adsense 团队领导了多个创新项目。Sanjeev在威斯康辛大学麦迪逊分校获得计算机科学专业硕士学位。

议题介绍

地点:大宴会厅
所属专题:主题演讲
所属领域:

演讲:超融合数据平台:集消息,计算和存储于一体 (英语演讲)

Hyper-Converged Data Platform: Unification of pub-sub, compute and storage

The decade of the Big Data revolution has seen data platform evolve from Batch Only systems to Batch and Real-time Systems. The result is a set of data systems that are either batch or real-time specific. HDFS/MapReduce and Kafka/Storm are good examples of this bunch. While a few efforts have indeed been made to converge real-time and batch compute(most notably Apache Flink and Apache Spark), these have been piecemeal affairs without taking into account data storage architecture.

Meanwhile efforts are being made to define a truly event driven system, the so called Kappa architecture with a single source of truth(aka the log) and a single way to compute. 

However these efforts are focused on fitting these existing systems to arrive at this architecture. The result is an unoptimized architecture primarily due to the legacy architecture of these existing systems.

This talk explores the requirements of a converged event driven architecture. We see how the concept of Stream Storage becomes the fundamental building block of such an architecture. We then describe how a single compute platform can sit on top of this stream storage to result in a converged data platform.

参考译文:

纵观这数十年大数据变革带来的洗礼,数据平台已从原来的 Batch Only(单批处理)系统迭代为 Batch and Real-time(批处理和实时处理)系统。一系列数据处理系统应运而生,要么是专用于批处理,要么是专用于实时处理的,HDFS/MapReduce 和 Kafka/Storm 就是很好的例证。然而,经过业界不断的努力,终于实现了将批处理和实时计算融合到一起,而最值得一提的就是 Apache Flink 和 Apache Spark,但这些只是数据处理架构,而没有考虑到数据存储架构。

同时,人们正在努力定义一个真正的事件驱动系统,即所谓的 Kappa 架构,它具有唯一的真实数据来源(即log)和单向的数据流处理。

然而,这些努力主要集中在怎样使用已有的系统来适应并满足这种 Kappa 架构,导致的结果就是一个基于传统系统的遗留架构之上的未优化的架构。

这次演讲会探讨融合的事件驱动架构具体需要哪些要素。流存储的概念如何成为这种架构的基本组成部分。随后展示如何在流存储之上构建一个计算平台,从而形成一个融合的数据平台。

想要批量报名或更多优惠?
立即联系票务报名小助手豆包
或致电:010-84780850