专栏名称: 机器学习研究会
机器学习研究会是北京大学大数据与机器学习创新中心旗下的学生组织,旨在构建一个机器学习从事者交流的平台。除了及时分享领域资讯外,协会还会举办各种业界巨头/学术神牛讲座、学术大牛沙龙分享会、real data 创新竞赛等活动。
目录
相关文章推荐
爱可可-爱生活  ·  【[2k星]Homebox:为家庭用户打造的 ... ·  21 小时前  
量子位  ·  华为昇腾推理DeepSeek-R1,性能比肩 ... ·  2 天前  
极市平台  ·  一文详尽之Scaling Law ·  3 天前  
极市平台  ·  一文详尽之Scaling Law ·  3 天前  
新智元  ·  o3-mini物理推理粉碎DeepSeek ... ·  4 天前  
51好读  ›  专栏  ›  机器学习研究会

【推荐】CMU课程:深度增强学习与控制(附视频)

机器学习研究会  · 公众号  · AI  · 2018-01-29 23:12

正文

                                                                                                                                                                                      
点击上方“机器学习研究会”可以订阅
摘要
 

转自:爱可可-爱生活

Class goals

  • Implement and experiment with existing algorithms for learning control policies guided by reinforcement, expert demonstrations or self-trials.

  • Evaluate the sample complexity, generalization and generality of these algorithms.

  • Be able to understand research papers in the field of robotic learning.

  • Try out some ideas/extensions of your own. Particular focus on incorporating true sensory signal from vision or tactile sensing, and exploring the synergy between learning from simulation versus learning from real experience.

Schedule

The following schedule is tentative, it will continuously change based on time constraints and interest of the people in the class. Reading materials and lecture notes will be added as lectures progress. 

DateTopic (slides)LecturerReadings
1/18IntroductionKaterina[1]
1/23Markov decision processes (MDPs), POMDPsKaterina[SB, Ch 3]
1/25Solving known MDPs: Dynamic ProgrammingKaterina[SB, Ch 4]
1/30Monte Carlo learning: value function (VF) estimation and optimizationRuss[SB, Ch 5]
2/1Temporal difference learning: VF estimation and optimization, Q learning, SARSARuss[SB, Ch 8]
2/2Recitation: OpenAI Gym recitationDevin
2/6Planning and learning: Dyna, Monte carlo tree searchKaterina[SB, Ch 8; 2]
2/8VF approximation, MC, TD with VF approximation, Control with VF approximationRuss[SB, Ch 9]
2/13VF approximation, Deep Learning, Convnets, back-propagationRuss[GBC, Ch 6]
2/15Deep Learning, Convnets, optimization tricksRuss[GBC, Ch 9]
2/20Deep Q Learning : Double Q learning, replay memoryRuss
2/22,27Policy Gradients I, Policy Gradients IIRuss[GBC, Ch 13]
2/28Recitation: Homework 2 Overview (TensorFlow.org, Keras.io, Bridges User Guide; Code Snippets)Devin
3/1Continuous Actions, Variational Autoencoders, multimodal stochastic policiesRuss
3/6Imitation Learning I: Behavior Cloning, DAGGER, Learning to SearchKaterina[5-13]
3/8Imitation Learning II: Inverse RL, MaxEnt IRL, Adversarial Imitation LearningKaterina[14-20]
3/20Sidd Srinivasa: Robotic manipulationGuest
3/22Optimal control, trajectory optimizationKaterina[21]
3/27Manuela Veloso: Mobile colaborative robots--RoboCUPGuest
3/29Imitation learning III: imitating controllers, learning local models, GPSKaterina[22-26]
4/3Chris Atkeson: What (D)RL ignores: State Estimation, Robustness, And Alternative StrategiesGuest
4/5End-to-end policy optimization through back-propagationKaterina[27-29]
4/10Exploration and ExploitationRuss[SB, Ch 2]
4/12Hierarchical RL and Tranfer LearningRuss
4/13Recitation: Trajectory optimization - iterative LQR(10:00-11:30am, 8102 GHC)Katerina
4/17Transfer learning(2): Simulation to Real WorldKaterina[30-37]
4/19Maxim Likhachev: Learning in Planning: Experience GraphsGuest
4/24Memory Augmented RLRuss
4/26Learning to learn, one shot learningKaterina[38-42]

Resources

Readings

  1. [SB] Sutton & Barto, Reinforcement Learning: An Introduction

  2. [GBC] Goodfellow, Bengio & Courville, Deep Learning

  3. Smith & Gasser, The Development of Embodied Cognition: Six Lessons from Babies

  4. Silver, Huang et al., Mastering the Game of Go with Deep Neural Networks and Tree Search

  5. Houthooft et al., VIME: Variational Information Maximizing Exploration

  6. Stadie et al., Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models

  7. Bagnell, An Invitation to Imitation

  8. Nguyen, Imitation Learning with Recurrent Neural Networks

  9. Bengio et al., Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

  10. III et al., Searn in Practice

  11. Bojarski et al., End to End Learning for Self-Driving Cars

  12. Guo et al., Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

  13. Rouhollah et al., Learning real manipulation tasks from virtual demonstrations using LSTM

  14. Ross et al., Learning Monocular Reactive UAV Control in Cluttered Natural Environments

  15. Ross et al., A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

  16. Ziebart et al., Navigate Like a Cabbie: Probabilistic Reasoning from Observed Context-Aware Behavior

  17. Abbeel et al., Apprenticeship Learning via Inverse Reinforcement Learning

  18. Ho et al., Model-Free Imitation Learning with Policy Optimization

  19. Finn et al., Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

  20. Ziebart et al., Maximum Entropy Inverse Reinforcement Learning

  21. Ziebart et al., Human Behavior Modeling with Maximum Entropy Inverse Optimal Control

  22. Finn et al., Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models

  23. Tassa et al., Synthesis and Stabilization of Complex Behaviors through Online Trajectory Optimization

  24. Watter et al., Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images

  25. Levine et al., Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics

  26. Levine et al., Guided Policy Search

  27. Levine et al., End-to-End Training of Deep Visuomotor Policies

  28. Kumar et al., Learning Dexterous Manipulation Policies from Experience and Imitation

  29. Mishra et al., Prediction and Control with Temporal Segment Models

  30. Lillicrap et al., Continuous control with deep reinforcement learning

  31. Heess et al., Learning Continuous Control Policies by Stochastic Value Gradients

  32. Mordatch et al., Combining model-based policy search with online model learning for control of physical humanoids

  33. Rajeswaran et al., EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

  34. Zoph et al., Neural Architecture Search with Reinforcement Learning

  35. Tzeng et al., Adapting Deep Visuomotor Representations with Weak Pairwise Constraints

  36. Ganin et al., Domain-Adversarial Training of Neural Networks

  37. Rusu et al., Sim-to-Real Robot Learning from Pixels with Progressive Nets

  38. Hanna et al., Grounded Action Transformation for Robot Learning in Simulation

  39. Christiano et al., Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

  40. Xiong et al., Supervised Descent Method and its Applications to Face Alignment

  41. Duan et al., One-Shot Imitation Learning

  42. Lake et al., Building Machines That Learn and Think Like People

  43. Andrychowicz et al., Learning to learn by gradient descent by gradient descent

  44. Finn et al., Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks





    请到「今天看啥」查看全文