Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.
In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent maximizes a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.
Lecturers & Course Instructors
Course Material
The lecture will take place on Wednesdays from 8:15 to 9:45 in H9.
# | Date | Topic | Material/Information |
1 | 17.04. | Introduction to RL, Markov Decision Processes | 01 Intro RL, MDPs.pdf |
2 | 24.04. | Dynamic Programming | 02 Dynamic Programming.pdf |
01.05. | public holiday | ||
3 | 08.05. | Model-free Prediction | 03 Model-free Prediction.pdf |
4 | 15.05. | Model-free Control | 04 Model-free Control.pdf |
22.05. | no lecture | We will use this slot for exercises! | |
5 | 29.05. | Value Function Approximation, DQNs | 05 Value Function Approximation.pdf |
6 | 05.06. | Policy-based RL #1 | 06 Policy-based RL 1.pdf |
7 | 12.06. | Policy-based RL #2 | 07 Policy-based RL 2.pdf |
8 | 19.06. | Exploration-Exploitation, Regret, Bandits | 08 Exploration-Exploitation.pdf |
9 | 26.06. | Exploration in Deep RL, Intrinsic Motivation | 09 Exploration in DeepRL.pdf |
10 | 03.07. | Model-based RL #1 (Discrete Actions) | 10 Model-based RL.pdf |
11 | 10.07. | Offline Reinforcement Learning | 11 Offline RL.pdf |
12 | 17.07. | Course Wrap-Up, Discussion of Evaluation Results Starting 9:00: “Reinforcement Learning for and with Foundation Models” (Guest Lecture by Dr. Georgios Kontes, Fraunhofer IIS) |
12 WrapUp.pdf 12 RL for and with Foundation Models.pdf |
Exercises
The exercises will take place on Tuesdays from 10:15 to 11:45 in H10. Exercise sheets will be available after the lecture in the week before. We will give a wrap up of the contents and discuss the solutions of the exercises in the exercise sessions.
Week | Date | Topic | Material | Who? |
0 | no exercises | |||
1 | 23.04. | MDPs (slides) | ex1.pdf | Nico |
2 | 30.04. | Dynamic Programming (slides) | ex2.pdf, ex2_skeleton.zip | Alex |
3 | 07.05. | OpenAI Gym, PyTorch-Intro (slides) | ex3.pdf, ex3_skeleton.zip | Alex |
4 | 14.05. | TD-Learning (slides) | ex4.pdf, ex4_skeleton.zip | Nico |
5 | 22.05. | TD-Control: Zoom Session (slides) Attention: Lecture Slot! |
ex5.pdf, ex5_skeleton.zip |
Nico + Alex |
28.05. | TD-Control (cont’d) | Nico | ||
6 | 04.06. | DQN (slides) | ex6.pdf, ex6_skeleton.zip | Nico |
7 | 11.06. | VPG (slides) | ex7.pdf, ex7_skeleton.zip | Alex |
8 | 18.06. | A2C (slides) | ex8.pdf, ex8_skeleton.zip | Nico |
9 | 25.06. | Multi-armed Bandits (slides) | ex9.pdf, ex9_skeleton.zip | Alex |
10 | 02.07. | RND/ICM (slides) | ex10.pdf, ex10_skeleton.zip | Alex |
11 | 09.07. | MCTS (slides) | ex11.pdf, ex11_skeleton.zip | Alex |
12 | 16.07. | BCQ (slides) | ex12.pdf, ex12_skeleton.zip | Nico |
Course Evaluation
The evaluation of the lecture and the exercises will be made available here.
Literature
- Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
- Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
- UC Berkeley CS188: Intro to AI [link]
- University College London Course on RL [link]
- Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
- https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
- https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html
Interesting talks, articles, and blog-posts:
- Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
- David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
- Technion Research: Why does reinforcement learning not work (for you)? [link]
- RL algorithms quick overview [link]
Code examples and exercises:
- GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP