Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.

In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent maximizes a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.

Lecturers & Course Instructors

Christopher Mutschler
(course instructor)
Lukas Schmidt
(teaching assistant / exercises)
Sebastian Rietsch
(teaching assistant / exercises)

The lectures will be recorded and provided on YouTube before the lecture slot. The lecture slot can be used as a consultation hour and if you give me a short ping we can organise an ad-hoc zoom meeting. The Q&A session for the lecture content will take place at the beginning of the exercise session.

Course Material

Lecture

We compiled a YouTube playlist here. However, you can also access the videos directly below.

  Topic Video Duration Material
15.04.2021: Lecture 1: Introduction to Reinforcement Learning (Q&A: 19.04.2021)
1:44:21  
1.01 Opening Remarks link 14:33 slides (pdf)
1.02 Introduction to RL link 51:59 slides (pdf)
1.03 Markov Decision Processes link 37:49 slides (pdf)
22.04.2021: Lecture 2: Dynamic Programming (Q&A: 26.04.2021)
 1:47:05  
2.01 Introduction to Dynamic Programming link 24:48 slides (pdf)
2.02 Dynamic Programming: Value Iteration link 28:14 slides (pdf)
2.03 Dynamic Programming: Policy Iteration link 25:45 slides (pdf)
2.04 Summary link 12:03 slides (pdf)
2.05 Hands-On: DP on Frozen Lake link 16:15 code (zip)
29.04.2021: Lecture 3: Model-free Prediction (Q&A: 03.05.2021)
 1:39:06  
3.01 Introduction to Model-free Reinforcement Learning link 7:45 slides (pdf)
3.02 Model-free Prediction with Monte Carlo link 31:19 slides (pdf)
3.03 Model-free Prediction with Time-Difference Learning link 47:49 slides (pdf)
3.04 Hands-On: TD Learning on Frozen Lake link 12:13 code (zip)
06.05.2021: Lecture 4: Model-free Control (Q&A: 10.05.2021)
 1:51:17  
4.01 Model-free Control link 58:33 slides (pdf)
4.02 Hands-On: OpenAI Gym Intro link  14:37 code (zip)
4.03 Hands-On: Q-Learning on Frozen lake link  15:15 code (zip)
4.04 Summary on Model-free Reinforcement Learning link 22:52 slides (pdf)
13.05.2021: Holiday/Vacation
20.05.2021: Lecture 5: Value Function Approximation (Q&A: 27.05.2021)
2:04:39  
5.01 Value Function Approximation link 14:11 slides (pdf)
5.02 Linear Value Function Approximation link 34:26 slides (pdf)
5.03 Deep Q-Networks (error on slide #15 corrected in pdf) link 44:13 slides (pdf)
5.04 Hands-On: DQN on Cartpole link  22:51 code (ipynb)
5.05 Summary link 8:58 slides (pdf)
27.05.2021: Lecture 6: Policy-based RL – Part 1 (Q&A: 31.05.2021)
2:03:41  
6.01 Introduction to Policy-based RL link 54:44 slides (pdf)
6.02 Policy Gradients  link  54:49 slides (pdf)
6.03 Hands-On: Monte-Carlo Policy Gradient link 14:08 code (zip)
03.06.2021: Holiday/Vacation
10.06.2021: Lecture 7: Policy-based RL – Part 2 (Q&A: 14.06.2021)
2:01:05  
7.01 Actor-Critics link 21:48 slides (pdf)
7.02 Trust-Region Policy Optimization (TRPO) link  1:06:09 slides (pdf)
7.03 Proximal Policy Optimization (PPO) link  17:50 slides (pdf)
7.04 Deep Deterministic Policy Gradient (DDPG) link  15:18 slides (pdf)
17.06.2021: Model-based RL – Part 1 (Q&A: 21.06.2021)   1:38:05  
8.01  Introduction to Model-based RL  link 33:01 slides (pdf)
8.02 Background Planning link 19:22 slides (pdf)
8.03 Online Planning with Discrete Actions link 45:42 slides (pdf)
24.06.2021: Model-based RL – Part 2 (Q&A: 28.06.2021)  1:08:05  
9.01 Online Planning with Continuous Actions link 22:09 slides (pdf)
9.02 Real-World Application: Uncertainty in Model-based RL link  39:25 slides (pdf)
9.03 Summary on Model-based RL link 6:31 slides (pdf)
01.07.2021: Exploration Strategies (Q&A: 05.07.2021) 1:32:20  
10.01 Motivation, Multi-Armed Bandits, Regret link 33:45 slides (pdf)
10.02  Classic Exploration Strategies link 58:35 slides (pdf)
08.07.2021: Exploration in Deep RL (Q&A: 12.07.2021) 2:07:43  
11.01 Count-based Exploration link 32:32 slides (pdf)
11.02 Prediction-based Exploration link 56:25 slides (pdf)
11.03 Memory-based Exploration link 38:46 slides (pdf)
15.07.2021: Course Wrap-Up (Zoom-Session on 15.07.2021 8:30)   slides (pdf) 
  Summary on Content      
  Discussion of Course Evaluation Results      
  Q&A on the Exam      
Optional Lecture: Offline RL (T.B.D.)    
Optional Lecture: Dependable RL (T.B.D.)    

 

Exercises

zipDate Material
19.04.2021 ex1.pdf
26.04.2021 ex2.pdfex2.zip
03.05.2021 ex3.pdf | ex3.zip
10.05.2021 ex4.pdf | ex4.zip
17.05.2021 ex5.pdf | ex5.zip
24.05.2021 27.05.2021 no exercise on Monday due to holiday -> we will use Thursday morning in this week
ex6_7.pdf | ex6_7.zip
31.05.2021
07.06.2021 ex8_9.pdf | ex8_9.zip
14.06.2021
21.06.2021 ex10_11.pdf | ex10_11.zip
28.06.2021
05.07.2021 ex12.pdf | ex12.zip

Evaluation

The evaluation of the lecture and exercise can be found here.

Literature

  • Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
  • Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
  • UC Berkeley CS188: Intro to AI [link]
  • University College London Course on RL [link]
  • Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
  • https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
  • https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Interesting talks, articles, and blog-posts:

  • Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
  • David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
  • Technion Research: Why does reinforcement learning not work (for you)? [link]
  • RL algorithms quick overview [link]

Code examples and Exercises:

  • GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP