专栏名称: 机器学习研究会

机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织，旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外，协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。

【推荐】CMU课程：深度增强学习与控制（附视频）

机器学习研究会 · 公众号 · AI · 2018-01-29 23:12

正文

点击上方“机器学习研究会”可以订阅

摘要

转自：爱可可-爱生活

Class goals

Implement and experiment with existing algorithms for learning control policies guided by reinforcement, expert demonstrations or self-trials.
Evaluate the sample complexity, generalization and generality of these algorithms.
Be able to understand research papers in the field of robotic learning.
Try out some ideas/extensions of your own. Particular focus on incorporating true sensory signal from vision or tactile sensing, and exploring the synergy between learning from simulation versus learning from real experience.

Schedule

The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress.

Date	Topic (slides)	Lecturer	Readings
1/18	Introduction	Katerina	[1]
1/23	Markov decision processes (MDPs), POMDPs	Katerina	[SB, Ch 3]
1/25	Solving known MDPs: Dynamic Programming	Katerina	[SB, Ch 4]
1/30	Monte Carlo learning: value function (VF) estimation and optimization	Russ	[SB, Ch 5]
2/1	Temporal difference learning: VF estimation and optimization, Q learning, SARSA	Russ	[SB, Ch 8]
2/2	Recitation: OpenAI Gym recitation	Devin
2/6	Planning and learning: Dyna, Monte carlo tree search	Katerina	[SB, Ch 8; 2]
2/8	VF approximation, MC, TD with VF approximation, Control with VF approximation	Russ	[SB, Ch 9]
2/13	VF approximation, Deep Learning, Convnets, back-propagation	Russ	[GBC, Ch 6]
2/15	Deep Learning, Convnets, optimization tricks	Russ	[GBC, Ch 9]
2/20	Deep Q Learning : Double Q learning, replay memory	Russ
2/22,27	Policy Gradients I, Policy Gradients II	Russ	[GBC, Ch 13]
2/28	Recitation: Homework 2 Overview (TensorFlow.org, Keras.io, Bridges User Guide; Code Snippets)	Devin
3/1	Continuous Actions, Variational Autoencoders, multimodal stochastic policies	Russ
3/6	Imitation Learning I: Behavior Cloning, DAGGER, Learning to Search	Katerina	[5-13]
3/8	Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial Imitation Learning	Katerina	[14-20]
3/20	Sidd Srinivasa: Robotic manipulation	Guest
3/22	Optimal control, trajectory optimization	Katerina	[21]
3/27	Manuela Veloso: Mobile colaborative robots--RoboCUP	Guest
3/29	Imitation learning III: imitating controllers, learning local models, GPS	Katerina	[22-26]
4/3	Chris Atkeson: What (D)RL ignores: State Estimation, Robustness, And Alternative Strategies	Guest
4/5	End-to-end policy optimization through back-propagation	Katerina	[27-29]
4/10	Exploration and Exploitation	Russ	[SB, Ch 2]
4/12	Hierarchical RL and Tranfer Learning	Russ
4/13	Recitation: Trajectory optimization - iterative LQR(10:00-11:30am, 8102 GHC)	Katerina
4/17	Transfer learning(2): Simulation to Real World	Katerina	[30-37]
4/19	Maxim Likhachev: Learning in Planning: Experience Graphs	Guest
4/24	Memory Augmented RL	Russ
4/26	Learning to learn, one shot learning	Katerina	[38-42]

Resources

Readings

[SB] Sutton & Barto, Reinforcement Learning: An Introduction
[GBC] Goodfellow, Bengio & Courville, Deep Learning
Smith & Gasser, The Development of Embodied Cognition: Six Lessons from Babies
Silver, Huang et al., Mastering the Game of Go with Deep Neural Networks and Tree Search
Houthooft et al., VIME: Variational Information Maximizing Exploration
Stadie et al., Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
Bagnell, An Invitation to Imitation
Nguyen, Imitation Learning with Recurrent Neural Networks
Bengio et al., Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
III et al., Searn in Practice
Bojarski et al., End to End Learning for Self-Driving Cars
Guo et al., Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
Rouhollah et al., Learning real manipulation tasks from virtual demonstrations using LSTM
Ross et al., Learning Monocular Reactive UAV Control in Cluttered Natural Environments
Ross et al., A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
Ziebart et al., Navigate Like a Cabbie: Probabilistic Reasoning from Observed Context-Aware Behavior
Abbeel et al., Apprenticeship Learning via Inverse Reinforcement Learning
Ho et al., Model-Free Imitation Learning with Policy Optimization
Finn et al., Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Ziebart et al., Maximum Entropy Inverse Reinforcement Learning
Ziebart et al., Human Behavior Modeling with Maximum Entropy Inverse Optimal Control
Finn et al., Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models
Tassa et al., Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization
Watter et al., Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images
Levine et al., Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics
Levine et al., Guided Policy Search
Levine et al., End-to-End Training of Deep Visuomotor Policies
Kumar et al., Learning Dexterous Manipulation Policies from Experience and Imitation
Mishra et al., Prediction and Control with Temporal Segment Models
Lillicrap et al., Continuous control with deep reinforcement learning
Heess et al., Learning Continuous Control Policies by Stochastic Value Gradients
Mordatch et al., Combining model-based policy search with online model learning for control of physical humanoids
Rajeswaran et al., EPOpt: Learning Robust Neural Network Policies Using Model Ensembles
Zoph et al., Neural Architecture Search with Reinforcement Learning
Tzeng et al., Adapting Deep Visuomotor Representations with Weak Pairwise Constraints
Ganin et al., Domain-Adversarial Training of Neural Networks
Rusu et al., Sim-to-Real Robot Learning from Pixels with Progressive Nets
Hanna et al., Grounded Action Transformation for Robot Learning in Simulation
Christiano et al., Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model
Xiong et al., Supervised Descent Method and its Applications to Face Alignment
Duan et al., One-Shot Imitation Learning
Lake et al., Building Machines That Learn and Think Like People
Andrychowicz et al., Learning to learn by gradient descent by gradient descent
Finn et al., Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

请到「今天看啥」查看全文

【推荐】CMU课程：深度增强学习与控制（附视频）

正文

Class goals

Schedule

Resources

Readings

请到「今天看啥」查看全文