Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.
In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent maximizes a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.
Lecturers & Course Instructors
The lectures will be recorded and provided on YouTube before the lecture slot. The lecture slot can be used as a consultation hour and if you give me a short ping we can organise an ad-hoc zoom meeting. The Q&A session for the lecture content will take place at the beginning of the exercise session.
Course Material
We compiled a YouTube playlist for this summer term here. However, you can also access the videos directly below, or if you are curious, you can have a look on the previous year’s edition as we will only occasionally record new content. However, exercises will change a bit.
Topic | Video | Duration | Material | |
28.04.2022: Lecture 1: Introduction to Reinforcement Learning |
||||
1.00/01 | Opening Remarks & Introduction to RL | live/zoom | – | slides (pdf) |
1.02 | Markov Decision Processes | link | 37:49 | slides (pdf) |
05.05.2022: Q&A on Lecture 2: Dynamic Programming (slides, video) |
1:47:05 | |||
2.01 | Introduction to Dynamic Programming | link | 24:48 | slides (pdf) |
2.02 | Dynamic Programming: Value Iteration | link | 28:14 | slides (pdf) |
2.03 | Dynamic Programming: Policy Iteration | link | 25:45 | slides (pdf) |
2.04 | Summary | link | 12:03 | slides (pdf) |
2.05 | Hands-On: DP on Frozen Lake | link | 16:15 | code (zip) |
12.05.2022: Q&A on Lecture 3: Model-free Prediction (slides, video) |
1:39:06 | |||
3.01 | Introduction to Model-free Reinforcement Learning | link | 7:45 | slides (pdf) |
3.02 | Model-free Prediction with Monte Carlo | link | 31:19 | slides (pdf) |
3.03 | Model-free Prediction with Time-Difference Learning | link | 47:49 | slides (pdf) |
3.04 | Hands-On: TD Learning on Frozen Lake | link | 12:13 | code (zip) |
19.05.2022: Q&A on Lecture 4: Model-free Control (slides, video) |
1:51:17 | |||
4.01 | Model-free Control | link | 58:33 | slides (pdf) |
4.02 | Hands-On: OpenAI Gym Intro | link | 14:37 | code (zip) |
4.03 | Hands-On: Q-Learning on Frozen lake | link | 15:15 | code (zip) |
4.04 | Summary on Model-free Reinforcement Learning | link | 22:52 | slides (pdf) |
26.05.2021: Holiday/Vacation | ||||
02.06.2022: Q&A on Lecture 5: Value Function Approximation (slides, video) |
2:04:39 | |||
5.01 | Value Function Approximation | link | 14:11 | slides (pdf) |
5.02 | Linear Value Function Approximation | link | 34:26 | slides (pdf) |
5.03 | Deep Q-Networks (error on slide #15 corrected in pdf) | link | 44:13 | slides (pdf) |
5.04 | Hands-On: DQN on Cartpole | link | 22:51 | code (ipynb) |
5.05 | Summary | link | 8:58 | slides (pdf) |
09.06.2022: Q&A on Lecture 6: Policy-based RL – Part 1 (slides, video) |
2:03:41 | |||
6.01 | Introduction to Policy-based RL | link | 54:44 | slides (pdf) |
6.02 | Policy Gradients | link | 54:49 | slides (pdf) |
6.03 | Hands-On: Monte-Carlo Policy Gradient | link | 14:08 | code (zip) |
16.06.2021: Holiday/Vacation | ||||
23.06.2022: Q&A on Lecture 7: Policy-based RL – Part 2 (slides, video) |
2:01:05 | |||
7.01 | Actor-Critics | link | 21:48 | slides (pdf) |
7.02 | Trust-Region Policy Optimization (TRPO) | link | 1:06:09 | slides (pdf) |
7.03 | Proximal Policy Optimization (PPO) | link | 17:50 | slides (pdf) |
7.04 | Deep Deterministic Policy Gradient (DDPG) | link | 15:18 | slides (pdf) |
30.06.2022: Q&A on Model-based RL – Part 1 (slides, video) | 1:38:05 | |||
8.01 | Introduction to Model-based RL | link | 33:01 | slides (pdf) |
8.02 | Background Planning | link | 19:22 | slides (pdf) |
8.03 | Online Planning with Discrete Actions | link | 45:42 | slides (pdf) |
07.07.2022: Q&A on Model-based RL – Part 2 (slides, video) | 1:08:05 | |||
9.01 | Online Planning with Continuous Actions | link | 22:09 | slides (pdf) |
9.02 | Real-World Application: Uncertainty in Model-based RL | link | 39:25 | slides (pdf) |
9.03 | Summary on Model-based RL | link | 6:31 | slides (pdf) |
14.07.2022: Q&A on Exploration Strategies (slides, video) | 1:32:20 | |||
10.01 | Motivation, Multi-Armed Bandits, Regret | link | 33:45 | slides (pdf) |
10.02 | Classic Exploration Strategies | link | 58:35 | slides (pdf) |
21.07.2022: Q&A on Exploration in Deep RL (slides, video) | 2:07:43 | |||
11.01 | Count-based Exploration | link | 32:32 | slides (pdf) |
11.02 | Prediction-based Exploration | link | 56:25 | slides (pdf) |
11.03 | Memory-based Exploration | link | 38:46 | slides (pdf) |
28.07.2022: Course Wrap-Up (slides, video) | ||||
Summary on Content | ||||
Discussion of Course Evaluation Results | ||||
Q&A on the Exam |
Exercises
# | Topic | Material | Due Date | |
28.04.2022 | 1 | Markov Decision Processes | ex1.pdf | 05.05.2022 |
05.05.2022 | 2 | Dynamic Programming | ex2.pdf | ex2_skeleton.zip | 12.05.2022 |
12.05.2022 | 3 | OpenAI Gym & TD-Learning | ex3.pdf | ex3_skeleton.zip | 19.05.2022 |
19.05.2022 | 4 | TD Control | ex4.pdf | ex4_skeleton.zip | 02.06.2022 |
19.05.2022 | 5 | PyTorch | ex5.pdf | ex5_skeleton.zip | 02.06.2022 |
26.05.2022 | no exercises / holiday | |||
02.06.2022 | 6 | DQNs v0.2 on Cartpole | ex6.pdf | ex6_skeleton.zip | 09.06.2022 |
09.06.2022 | 7 | DDQNs on Atari | ex7.pdf (Update) | ex7_skeleton.zip | 23.06.2022 |
09.06.2022 | 8 | VPG | ex8.pdf | ex8_skeleton.zip | 23.06.2022 |
16.06.2022 | no exercises / holiday | |||
23.06.2022 | 9 | A2C | ex9.pdf | ex9_skeleton.zip | 07.07.2022 |
30.06.2022 | 10 | MCTS | ex10.pdf | ex10_skeleton.zip | 14.07.2022 |
07.07.2022 | ||||
14.07.2022 | 11 | Bandits | ex11.pdf | ex11_skeleton.zip | 21.07.2022 |
21.07.2022 | 12 | RND & ICM on MountainCar | ex12.pdf | ex12_skeleton.zip | 28.07.2022 |
28.07.2022 |
Course Evaluation
The evaluation of the lecture can be found here and the evaluation of the exercises here.
Literature
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
- Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
- UC Berkeley CS188: Intro to AI [link]
- University College London Course on RL [link]
- Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
- https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
- https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
Interesting talks, articles, and blog-posts:
- Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
- David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
- Technion Research: Why does reinforcement learning not work (for you)? [link]
- RL algorithms quick overview [link]
Code examples and exercises:
- GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP