Pinterest高可扩展基础设施架构技术点 (英语演讲)



嘉宾 : 武永胜 | PinterestHead of Big Data and Machine Learning Platform

会议室 : 巴塞罗那厅



Pinterest Head of Big Data and Machine Learning Platform

Yongsheng Wu leads Big Data and Machine Learning Platform at Pinterest, where the team provides a unified big data and machine learning platform to enable engineers to derive trustworthy, actionable insights and apply ML to solve complex problems with ease and confidence. Prior to that, Yongsheng was an early engineer on infrastructure team to help scale Pinterest from ~10M MAUs to 200+M MAUs; he also led teams to spearhead Pinterest's transition to a micro-services based architecture, from lucene/solr based search system to highly scalable, efficient, performant and extensible home-grown search infrastructure, and offer asynchronous job processing system, distributed caching, storage and serving systems as services to all engineers at Pinterest.

Before Pinterest, Yongsheng worked at Twitter, Salesforce, Seven Networks and Oracle. Yongsheng holds Master's degree in Computer Science at Stanford, and he is a USTC alumni from China.

武永胜目前在Pinterest领导大数据和机器学习平台团队,提供统一的大数据和机器学习平台,帮助工程师们更自信地用机器学习平台来解决复杂问题,并获得信任。在此之前,永胜是基础设施团队的一名早期工程师,帮助Pinterest从大约10M MAU扩展到200+M MAU;此外,他还率队带领Pinterest度过微服务架构转型期,从基于lucene/solr的搜索系统到高可伸缩,高效率,高性能和可扩展的本土搜索基础架构,并为Pinterest所有工程师提供异步任务处理系统,分布式缓存,存储和服务系统。

在加入Pinterest之前,永胜在Twitter,Salesforce,Seven Networks和Oracle工作过。永胜拥有斯坦福大学计算机科学硕士学位。此外,他还是来自中国的“在美科大校友会”成员。



演讲:Pinterest高可扩展基础设施架构技术点 (英语演讲)

Scaling Pinterest

In this talk, Yongsheng will cover how they scale the online infrastructure at Pinterest in the past 8 years to serve 200-300M MAUs. Over the years, online infrastructure at Pinterest has evolved from a Python Django application on a single MySQL instance in 2010 to a modern microservice-based architecture. Yongsheng will cover the following key technologies that allows Pinterest to horizontally scale to serve hundreds of millions of users with great user experiences:

  • Microservice framework, powered by Twitter Finagle with highly resilient service discovery and real-time configuration management systems built on top of Apache Zookeeper.
  • Pinlater, an asynchronous job processing framework, open sourced by Pinterest, which embraces idempotency/commutativity, and allows them to execute non-critical part of request processing logic asynchronous reliably to deliver delightful online user experiences.
  • Distributed caching, storage and serving systems.
  • Pinterest adopted caching proxy, Mcrouter, open sourced by Facebook, to enable AZ failure resilient, highly consistent, and highly performant distributed caching system.
  • Pinterest uses Apache HBase and sharded MySQL to deliver a homegrown distributed storage system with integrated caching layer, code named Zen, to enable rapid product innovations in self-serviced manner through graph data model. This system was inspired by TAO and Dragon from Facebook.
  • Scorpion Serving, a high throughput, low latency ML serving system at Pinterest powered by C++, Folly, and RocksDB, which scores tens of millions of (user, pin) pairs per second with P99 latency less than 20ms.
  • Muse: a homegrown search engine, implemented in C++, which Pinterest used to replace Apache Solr and Apache Lucene to deliver high throughput, low long-tail latency search at scale, with capability of real-time indexing, as services to the entire Pinterest engineering.

As their business grows, Pinterest needed to address the business continuity risk of single geo region outage. Yongsheng will also share the major changes which they had to make to enable Pinterest to serve active-active across multiple geo locations with strong consistency between caching and persistent storage tiers.

Yongsheng will wrap up his talk with their future plan, and key learnings acquired throughout the years as they work on scaling online infrastructure at Pinterest to enable hundreds of millions of people to discover and do what they love.


在本演讲中,永胜老师会分享在过去的8年他们服务 200-300M MAUs 过程中,是如何扩展 Pinterest 在线基础设施架构的。经过多年迭代,Pinterest 的在线基础设施从2010年的一个基于单个 MySQL 实例的 Python Django 应用,不断演进成最新的基于微服务的架构体系。永胜会分享使 Pinterest 能够横向扩展,为数以亿计的用户提供卓越用户体验的关键技术点:

  1. 微服务框架,由 Twitter Finagle 提供支持,具有高度弹性的服务发现和基于 Apache Zookeeper 构建的实时配置管理系统。
  2. Pinlater 是一个异步任务处理框架,由 Pinterest 开源,包含幂等/交换性,并允许它们执行非关键部分的请求处理逻辑,以提供高质量的在线用户体验。
  3. 分布式缓存,存储和服务系统:
    • Pinterest 采用了​​由 Facebook 开源的缓存代理 Mcrouter,以实现 AZ 失效弹性,高度一致和高性能的分布式缓存系统。
    • Pinterest 使用 Apache HBase 和 MySQL 将分散式存储系统与集成缓存层(代号为 Zen)分离开来,通过图形数据模型以自助服务的方式实现快速产品创新。这个系统受 Facebook 的 TAO 和 Dragon 的启发。
    • Scorpion Serving 是一款由 C++,Folly 和 RocksDB 提供支持的高吞吐量,低延迟 ML 服务系统,每秒可以获得数千万用户配对,P99延迟小于20ms。
  4. Muse: 这是Pinterest内部研发的搜索引擎,采用 C++实现,Pinterest 用于替代 Apache Solr 和 Apache Lucene,以提供高吞吐量,低延迟大规模搜索,具有实时索引功能,为整个Pinterest 工程提供服务。

随着业务的增长,Pinterest 需要解决单个地理区域系统故障的业务连续性风险。永胜也将分享他们必须做出的主要变化,以确保 Pinterest 能够在多个地理位置之间积极主动地进行服务,并且在缓存和持久存储层之间具有强大的一致性。