Facebook万亿级混合复杂时空数据的处理决策

所属专题:大数据平台架构实践

所属领域:

嘉宾 : 宾理涵 | FacebookSoftware Engineer Tech Lead

会议室 : 爱晚亭

讲师介绍

专题演讲嘉宾:宾理涵

Facebook Software Engineer Tech Lead

毕业于美国佛罗里达大学硕士,就职于Facebook的GeoAPI组担任技术领导。带领团队开发了Geospatial Indexing平台,将地理大数据和实时数据流检索导入以提供实时的查询和计算,该平台已经用于多个面向用户的产品。他还参与了Facebook搜索引擎中的地理查询的设计和开发。

此前就职于Qualcomm,任Qualcomm在标准制定组织Khronos Group的代表,参与OpenCL标准制定。

ArchSummit采访了宾理涵老师,详见《聊聊Facebook的大规模时空数据处理技术栈》

议题介绍

地点:爱晚亭
所属专题:大数据平台架构实践
所属领域:

演讲:Facebook万亿级混合复杂时空数据的处理决策

Big Spatial Data @ Facebook

Big Geospatial Data at scale has all the challenges of data at scale along with some quirks very specific to spatio-temporal data. However, these very quirks (like the bounds of latitude/longitude, Euclidean vs. great circle distances, the "true" shape of the earth and the extremely skewed distribution of geospatial features) can be leveraged into interesting and productive trade-offs to offset and address these challenges. With more and more mobile devices thrown into the mix (both as producers and consumers of spatio-temporal data), realtime and accurate lookup of points and polygons based on GPS locations and queries about k-nearest and Top-K based on geospatial contexts are a very common and relevant problem. At the same time, providing scalable offline aggregation and query capabilities of spatio-temporal data for analytics use cases becomes vital to making sense of it. 

The Facebook Location Infrastructure team handles spatio-temporal data at Facebook scale (using a mix of in-house and open source technologies and pragmatic trade-offs/decisions). This presentation will cover various design decisions and architectural choices taken to ramp up Trillions of operations per day on a heterogeneous mix of spatio-temporal data (for both online and analytics oriented use cases).

大规模空间数据除了在规模上的挑战以外,还有一些独特的问题需要解决:例如经纬度,直线距离和弧面距离的区别,以及同样的经纬度差在两极和赤道的面积差等。这些特性既是挑战也是性能优化上的机遇。随着越来越多的移动设备,物联网设备产生海量的时空数据,如何有效存储,检索,实时的k-nearest查询,关联性排序,如何高效的解决在离线大规模空间数据分析中常用到的Spatial Join都是我们要解决的问题。

Facebook Location Infrastructure团队处理大规模时空数据过程中,在内部技术和开源技术之间采取折中而务实的办法。本次演讲将会涵盖经过验证的,在处理每天万亿级混合复杂操作的时空数据背后的多个设计决策,和架构选型内容(包括在线和面向分析的用户案例)。

演讲提纲

  • Challenges of handling geospatial data at Facebook scale
  • Facebook data warehouse overview
  • Deep dive into the Big Spatial Data @ Facebook
  • Security & Privacy Consideration

译版:

  1. 处理Facebook级别的时空数据所要面对的挑战
  2. Facebook数据仓库一览
  3. 深度剖析Facebook大规模时空数据
  4. 安全和隐私方面的考虑

听众受益

  • 了解Facebook的Big Spatial Data平台
  • 获得最新的空间数据信息处理思路
想要批量报名或更多优惠?
立即联系票务报名小助手豆包
或致电:010-84780850