This lecture is part of the Machine Learning Masters program at the University of Tübingen. The course is run by the Autonomous Learning Group.
Dates:
Tue 14:15 - 15:45: lecture in N10 Morgenstelle 3 (Alte Botanik)
Tue. 16:00 - 17:30: exercise sessions/recitation in N15 (C-Bau) und N09 (Hörsaalzentrum OG), Morgenstelle (starting on April 25th)
Course description:
The course will provide you with theoretical and practical knowledge of reinforcement learning, a field of machine learning concerned with decision-making and interaction with dynamical systems, such as robots. We start with a brief overview of supervised learning and spend the most time on reinforcement learning. The exercises will help you get hands-on with methods and deepen your understanding.
Qualification Goals:
Students gain an understanding of reinforcement learning formulations, problems, and algorithms on a theoretical and practical level. After this course, students should be able to implement and apply deep reinforcement learning algorithms to new problems.
People:
Instructor: Prof. Georg Martius
Teaching Assistants: Marco Bagatella, Pavel Kolev, Marin Vlastelica, Sebastian Blaes,
Course materials:
Both slides and exercises are available on ILIAS.
Lectures
- Lecture 1 Introduction to the course, Reinforcement Learning (RL) History and RL setup; Background reading: Sutton and Barto Reinforcement learning for the next few lectures (for this lecture, parts of Chapter 3)
- Lecture 2 MDPs; Background reading: Sutton and Barto Reinforcement learning Chapter 4
- Lecture 3 Model-free Prediction; Background reading: Sutton and Barto Reinforcement learning First part of Chapters 5, 6, 7, 12
- Lecture 4 Model-free Control; Background reading: Sutton and Barto Reinforcement learning Chapters 5.2, 5.3, 5.5, 6.4, 6.5, 12.7
- Lecture 5 Neural Networks and Imitation Learning; Background reading: C.M. Bishop Pattern Recognition and Machine Learning, Chap. 5
- Lecture 6: Value Function Approximation; Background reading: Sutton and Barto Reinforcement learning Chapters 9.1-9.8, 10.1, 10.2, 11.1-11.3. Supplementary: DQN paper 1, paper 2, NFQ paper
- Lecture 7: Policy Gradient; Background reading: Sutton and Barto Reinforcement learning Chapters 13
- Lecture 8: Policy Gradient and Actor-Critic; Background reading: Natural Actor Critic Paper, TRPO Paper, PPO Paper
- Lecture 9: Q-learning style Actor-Critic; Background reading: DPG Paper, DDPG Paper, TD3 Paper, SAC paper
- Lecture 10: Exploration and Tricks to improve Deep RL (with recent work from my group); Background reading: ICM Paper, RND Paper, Pink-Noise Paper, HER paper
- Lecture 11: Model-based Methods: Dyna-Q, MBPO; Background reading: Sutton and Barto Reinforcement learning Chapters 8 and the MBPO Paper
- Lecture 12: Model-based Methods II: Online-Planning: (with recent work from my group); CEM, PETS, iCEM; CEE-US; Background reading: PETS paper, iCEM paper (video), CEE-US paper (videos)
- Lecture 13: Alpha Go and Alpha Zero, Dreamer. Background reading: AlphaGo paper (also in Ilias, because behind the paywall), AlphaZero Paper, and Dreamer Paper
- Lecture 14: Offline RL. Background reading: CQL paper, CRR paper, Benchmarking paper
Exercises:
See ILIAS!
Related readings:
- Sutton & Barto, Reinforcement Learning: An Introduction
- Bertsekas, Dynamic Programming and Optimal Control, Vol. 1
- Bishop, Pattern Recognition and Machine Learning