专栏名称: 机器学习研究会

机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织，旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外，协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。

【报名】AWS李沐：基于系统和算法的协同设计的大规模分布式机器学习

机器学习研究会 · 公众号 · AI · 2017-03-21 19:07

正文

请到「今天看啥」查看全文

点击上方 “机器学习研究会” 可以订阅哦

摘要

转自：将门创投

活动信息：

主题： Scaling Distributed Machine Learning with System and Algorithm Co-design

时间：3月22日（周三）13:00

直播地点：http://jiangmen.gensee.com/webcast/site/entry/join-6779d5df0b1a4eb2a7a2d9a16ba6d06f?winzoom=1

嘉宾介绍

李沐博士

亚马逊AWS机器学习高级应用科学家

加入亚马逊之前，李沐博士曾是AI创业公司Marianas Labs的CTO。在此之前，他还曾在百度任职，担任百度深度学习研究院（IDL）的主任研发架构师一职。

李沐博士拥有美国卡耐基梅隆大学计算机科学博士学位。他的主要研究兴趣集中在大规模机器学习，尤其是大规模分布式系统和机器学习算法的协同设计。

李沐博士曾以第一作者的身份在计算机科学领域的大会以及期刊上发表了多篇论文，包括以下方向：跨度理论（FOCS）、机器学习（NIPS、ICML）、应用层面（CVPR、KDD）、操作系统（OSDI）。

分享提纲

Due to the rapid growth of data and the ever increasing model complexity, which often manifests itself in the large number of model parameters, today, many important machine learning problems cannot be efficiently solved by a single machine. Distributed optimization and inference is becoming more and more inevitable for solving large scale machine learning problems in both academia and industry. However, obtaining an efficient distributed implementation of an algorithm, is far from trivial. Both intensive computational workloads and the volume of data communication demand careful design of distributed computation systems and distributed machine learning algorithms. In this thesis, we focus on the co-design of distributed computing systems and distributed optimization algorithms that are specialized for large machine learning problems.

In the first part, we propose two distributed computing frameworks: Parameter Server, a distributed machine learning framework that features efficient data communication between the machines; MXNet, a multi-language library that aims to simplify the development of deep neural network algorithms. We have witnessed the wide adoption of the two proposed systems in the past two years. They have enabled and will continue to enable more people to harness the power of distributed computing to design efficient large-scale machine learning applications.

In the second part, we examine a number of distributed optimization problems in machine learning, leveraging the two computing platforms. We present new methods to accelerate the training process, such as data partitioning with better locality properties, communication friendly optimization methods, and more compact statistical models. We implement the new algorithms on the two systems and test on large scale real data sets. We successfully demonstrate that careful co-design of computing systems and learning algorithms can greatly accelerate large scale distributed machine learning.

原文链接：

http://mp.weixin.qq.com/s/nNa1bKG3s4sqK7mbLEWAGg

“完整内容”请点击【阅读原文】

↓↓↓