数据分析平台的挑战和解决方案

所属专题:大数据平台架构

嘉宾 : Zhaowei Hou | FacebookTech Lead Manager

会议室 : 第二会议厅B

讲师介绍

专题演讲嘉宾:Zhaowei Hou

Facebook Tech Lead Manager

Zhaowei Hou is a tech lead manager on Data Engineering @Facebook. He leads a team working on a highly efficient, realtime data delivery platform. The platform enables users to build reusable, unified query templates for similar business logic. Its core engine handles query parsing, parametrization, composition and communication with datasources efficiently. It has a SQL caching layer built on top of Spark SQL that is able to query up to terabytes of data efficiently. The security layer of the platform collects common metadata across all other platforms/services and provides a unified security checking interface. Previously, Zhaowei  worked at Letv(乐视网) as a tech lead and architect where he built a payment platform, and at Renren(人人网) as a tech lead on Main page(人人网首页).

议题介绍

地点:第二会议厅B
所属专题:大数据平台架构

演讲:数据分析平台的挑战和解决方案

Business intelligence & analytics play important roles in the tech industry, especially in data-driven companies like Facebook.  As the company evolves, different business teams have their own ways of building their business reports. Because data demands also grow rapidly, teams face similar strategic and technical challenges in terms of data consistency, efficiency of data delivery and data security.
The data delivery platform is targeted to address these challenges. We use query templates and the definition of metrics to solve the data consistency problem, and the Spark SQL-based caching service to solve the data delivery efficiency problem, with security naturally integrated into the metric definition.

演讲提纲

  • Problems/Challenges
  1. Same metric has different interpretation in different business domains
  2. Data queries live in everywhere in codebase
  3. Surfacing up to TB of data to dashboards in real time
  4. Data security (access control to dashboards/reports)
  • The data delivery platform
  1. Overview
  2. The framework
  3. Query template & Metric engine
  4. SQL Caching 
  5. Data Security 
  • Summary
  • Q&A

参考翻译:

商业智能和商业分析(Business Intelligence & Analytics)在技术领域扮演着重要的角色,尤其是在像Facebook这样的数据驱动型企业。随着公司不断的发展壮大,不同的商业团队根据自身需求和方式进行商业报告,因此,伴随着大家对数据的需求越来越大,在数据一致性、数据查询效率和数据安全层面上,团队们面临着相似的战略规划和技术挑战。

这个数据递送平台 (Data Delivery Platform) 就是为了对症下药解决这些挑战而建立的,我们团队使用查询模版和Metrics定义来解决数据一致性问题,利用Spark SQL 解决数据分发效率的问题,与此同时,安全性是自然而然贯穿到metric定义中的。

演讲提纲

  • 问题和挑战
  1. 同样的Metric在不同的业务领域有不同的解释
  2. 代码库要承担高强度的实时数据查询
  3. 将多达TB级数据实时呈现到数据报表
  4. 数据安全(对数据报表的访问控制)
  • 数据递送平台 (Data Delivery Platform)
  1. 概览
  2. 整体框架
  3. 查询模版 & Metric引擎
  4. SQL 缓存   
  5.  数据安全
  • 总结
  • Q&A

听众收益

  • 在中型之大型公司数据规范化的重要性以及如何做到。
  • 如何灵活并选择正确的技术解决相应的数据难题。
  • 如何用正确的架构把每一个解决方案模块整合起来。

极客邦控股(北京)有限公司

北京市朝阳区望京利泽中二路洛娃大厦C座6层1607