Big Spatial Data @ Facebook
Big Geospatial Data at scale has all the challenges of data at scale along with some quirks very specific to spatio-temporal data. However, these very quirks (like the bounds of latitude/longitude, Euclidean vs. great circle distances, the "true" shape of the earth and the extremely skewed distribution of geospatial features) can be leveraged into interesting and productive trade-offs to offset and address these challenges. With more and more mobile devices thrown into the mix (both as producers and consumers of spatio-temporal data), realtime and accurate lookup of points and polygons based on GPS locations and queries about k-nearest and Top-K based on geospatial contexts are a very common and relevant problem. At the same time, providing scalable offline aggregation and query capabilities of spatio-temporal data for analytics use cases becomes vital to making sense of it.
The Facebook Location Infrastructure team handles spatio-temporal data at Facebook scale (using a mix of in-house and open source technologies and pragmatic trade-offs/decisions). This presentation will cover various design decisions and architectural choices taken to ramp up Trillions of operations per day on a heterogeneous mix of spatio-temporal data (for both online and analytics oriented use cases).
大规模空间数据除了在规模上的挑战以外,还有一些独特的问题需要解决:例如经纬度,直线距离和弧面距离的区别,以及同样的经纬度差在两极和赤道的面积差等。这些特性既是挑战也是性能优化上的机遇。随着越来越多的移动设备,物联网设备产生海量的时空数据,如何有效存储,检索,实时的k-nearest查询,关联性排序,如何高效的解决在离线大规模空间数据分析中常用到的Spatial Join都是我们要解决的问题。
Facebook Location Infrastructure团队处理大规模时空数据过程中,在内部技术和开源技术之间采取折中而务实的办法。本次演讲将会涵盖经过验证的,在处理每天万亿级混合复杂操作的时空数据背后的多个设计决策,和架构选型内容(包括在线和面向分析的用户案例)。
演讲提纲
- Challenges of handling geospatial data at Facebook scale
- Facebook data warehouse overview
- Deep dive into the Big Spatial Data @ Facebook
- Security & Privacy Consideration
译版:
- 处理Facebook级别的时空数据所要面对的挑战
- Facebook数据仓库一览
- 深度剖析Facebook大规模时空数据
- 安全和隐私方面的考虑
听众受益
- 了解Facebook的Big Spatial Data平台
- 获得最新的空间数据信息处理思路