地点:大宴会厅

专题:主题演讲

by 郭蕾

Geekbang极客邦科技
总编辑

今年是ArchSummit全球架构师峰会的第7年,就一个会议内容产品而言,我们想要给用户交付什么知识?又是如何完成交付的?怎么去理解内容交付这件事?

by 刘磊

机器学习专家

Machine Learning on Multi-modal Sensor Data

Digital self-tracking technology has become more accessible to the public in recent years with the development of connected portable devices (such as smart phones, smart watches, smart bands, and other personal biological monitoring devices), human biosensors, as well as information management systems designed for monitoring, storing and analyzing human self-tracking data. The proliferation of such technology has made it much easier than any time before to collect biological and physiological signals, such as the electrocardiogram (ECG), oxygen saturation (SpQ2), heart rate (HR), the electroencephalogram (EEG), Galvanic Skin Response (GSR), blood pressure/oxygen level, body temperature, etc. These self-tracking data can help us better understand each individual’s health conditions by monitoring and analyzing such data.  As a result, mining such sensor data has been gaining significant attention from both industry and academia in recent years.

This presentation focuses on technologies for machine learning platform on analyzing self-monitored data for real-world applications. As an example, a project about multi-modal sensor data for activity recognition will be introduced. In this project, we first collect the sensor data from smart home setting in Bristol (UK), and learn the patterns of human behaviors by machine learning technology. The built machine learning system has the capability to predict the activities of daily living and posture and ambulation. These prediction efforts have practical impact on identifying and alerting stroke, falls, and other emergency scenarios for clinical professionals and caregivers, such as elder people healthcare.

Outline:

  1. Background of Machine Learning on Multi-modal Sensor Data
    1. Sensors in real-world scenarios
    2. Machine Learning on Sensor Data
    3. Challenges of learning from multi-modal sensor data
  2. Machine learning platform and structure
    1. Overall System architecture
    2. Preprocessing and feature engineering
    3. Modeling and Offline evaluation
    4. Online deployment, evaluation, and model updating
  3. Challenges and insight discoveries
    1. Data sparsity and missing issue
    2. Imbalanced data
    3. Sequential behavior pattern over time
  4. Summary and Application Discussion
    1. Application scenario #1: Sequential activity recognition for auto-monitoring system in police station
    2. Application scenario #2: Machine learning on multi-modal sensor data for elder people emergency alert system

参考译文:

随着如手机、智能手表和生物传感器等便携式设备的普及,数字化自跟踪技术近几年已经获得了长足发展。同样,用于监测、存储和分析人类自跟踪数据的信息管理系统设计也在不断的优化。

和以往任何时候相比,由于技术的快速发展,使得收集生物数据和生理数据这个过程变得越来越容易,例如心电图(ECG),血氧饱和度(SpQ2),心率(HR),脑电波信号(EEG),皮肤电反应(GSR),血压,含氧量,体温等等。这些自跟踪数据可以帮助我们通过监测和分析这些数据,来更好地理解个体的健康状况。正因为此,挖掘传感数据中的隐含价值受到越来越多工业界和学术界的关注。

本次演讲会聚焦在介绍机器学习平台在传感数据中的应用,详细对利用多源传感器数据结合机器学习技术来检测人体行为项目做详细介绍。在这一项目中,首先收集位于Bristol (UK)的智能家庭传感器数据,包括重力传感器,环境传感器和三维视觉传感器,之后通过机器学习技术来构建人类行为监测模型。这一机器学习系统能通过分析传感数据,动态准确的检测日常生活里的人体行为活动。这一系统对于识别或者预测中风,摔倒以及其他危险事情有很切实的意义,尤其是在临床案例和老人照顾方面。

演讲提纲

1. 机器学习在多源传感数据上应用的背景介绍

  • 现实生活中传感器的应用
  • 机器学习和传感器数据的结合
  • 机器学习应用于多源传感数据的技术挑战

2. 机器学习平台和架构

  • 系统架构
  • 预处理和特征工程
  • 建模和离线性能评估
  • 在线部署、评估和模型更新

3. 挑战和项目收获

  • 稀疏数据与数据缺失问题
  • 数据分布不均衡问题
  • 时序行为模式分析

4. 总结和应用挑战

  • 案例:警察局自动监测系统里的连续行为识别
  • 案例:老年人紧急状况监测,预警系统

by Sanjeev Kulkarni

Streamlio
Co-Founder & CTO

Hyper-Converged Data Platform: Unification of pub-sub, compute and storage

The decade of the Big Data revolution has seen data platform evolve from Batch Only systems to Batch and Real-time Systems. The result is a set of data systems that are either batch or real-time specific. HDFS/MapReduce and Kafka/Storm are good examples of this bunch. While a few efforts have indeed been made to converge real-time and batch compute(most notably Apache Flink and Apache Spark), these have been piecemeal affairs without taking into account data storage architecture.

Meanwhile efforts are being made to define a truly event driven system, the so called Kappa architecture with a single source of truth(aka the log) and a single way to compute. 

However these efforts are focused on fitting these existing systems to arrive at this architecture. The result is an unoptimized architecture primarily due to the legacy architecture of these existing systems.

This talk explores the requirements of a converged event driven architecture. We see how the concept of Stream Storage becomes the fundamental building block of such an architecture. We then describe how a single compute platform can sit on top of this stream storage to result in a converged data platform.

参考译文:

纵观这数十年大数据变革带来的洗礼,数据平台已从原来的 Batch Only(单批处理)系统迭代为 Batch and Real-time(批处理和实时处理)系统。一系列数据处理系统应运而生,要么是专用于批处理,要么是专用于实时处理的,HDFS/MapReduce 和 Kafka/Storm 就是很好的例证。然而,经过业界不断的努力,终于实现了将批处理和实时计算融合到一起,而最值得一提的就是 Apache Flink 和 Apache Spark,但这些只是数据处理架构,而没有考虑到数据存储架构。

同时,人们正在努力定义一个真正的事件驱动系统,即所谓的 Kappa 架构,它具有唯一的真实数据来源(即log)和单向的数据流处理。

然而,这些努力主要集中在怎样使用已有的系统来适应并满足这种 Kappa 架构,导致的结果就是一个基于传统系统的遗留架构之上的未优化的架构。

这次演讲会探讨融合的事件驱动架构具体需要哪些要素。流存储的概念如何成为这种架构的基本组成部分。随后展示如何在流存储之上构建一个计算平台,从而形成一个融合的数据平台。

by Ian Gorton

美国东北大学西雅图分校
计算机科学系主任

软件技术、应用领域、数据与系统规模等的快速变化节奏给实现成功的软件产品带来了前所未有的挑战。快速修改以支持需求变化,及以低成本扩展应对更多请求和更大数据集的能力,已成为成功系统的共同特征。这样系统本质上决定于一个灵活且可扩展的软件架构,在此基础上有效支持未来的业务成长和规模扩张。

作为一个活跃的研究领域,软件架构在全球学术界已有近30年的历史,并继续成为软件工程的一个研究热点。现代架构研究正作为基础在诸多领域支持工程团队构建越来越复杂的软件系统。例如,架构分析工具基于代码结构能识别并预测一个架构设计的薄弱环节;性能建模工具可以帮助架构师探究其设计在未来的可能表现和扩展能力,以在早期帮助架构设计决策;智能交互查询式知识库提供架构师科学数据,以实现设计决策和产品选择。通过与软件架构科研团队的合作,开发组织能够对其日常面对但却关乎系统长远成功与否的持久性问题获得深入理解。

我将分享我们在软件架构领域对开发成功系统产生巨大影响的最新成果;同时介绍开发团队和组织如何与科研团队合作,以解决他们所面对的最复杂、难以解决的架构问题。

by 方国伟

平安科技
CTO兼总架构师

云计算在过去十来年中取得了很大的发展。由于经济学的规模效应和专业分工的深入,公有云的接受度也越来越高。然后随着数据隐私问题的日渐突出和行业监管的要求,近年来专有云作为公有云中的一种特殊形态也正在被越来越多的行业用户接受。平安云有自己独特的诞生背景和业务目标,其成长和建设过程对于近来正在纷纷成立的金融科技公司以及许多希望构建云平台或采用云服务的公司有一定的参考价值。这个分享将从业务和技术的角度讲述平安如何打造有自己特色的专有云,同时服务好集团内外的客户。

想要批量报名或更多优惠?
立即联系票务报名小助手豆包
或致电:010-84780850