Reinforcement Learning @FAU2022

Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.

In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent maximizes a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.

Lecturers & Course Instructors

Christopher Mutschler
(course instructor)
Sebastian Rietsch
(teaching assistant / exercises)

The lectures will be recorded and provided on YouTube before the lecture slot. The lecture slot can be used as a consultation hour and if you give me a short ping we can organise an ad-hoc zoom meeting. The Q&A session for the lecture content will take place at the beginning of the exercise session.

Course Material

We compiled a YouTube playlist for this summer term here. However, you can also access the videos directly below, or if you are curious, you can have a look on the previous year’s edition as we will only occasionally record new content. However, exercises will change a bit.

 

  Topic Video Duration Material
28.04.2022: Lecture 1: Introduction to Reinforcement Learning
   
1.00/01 Opening Remarks & Introduction to RL live/zoom slides (pdf)
1.02 Markov Decision Processes link 37:49 slides (pdf)
05.05.2022: Q&A on Lecture 2: Dynamic Programming (slides, video)
 1:47:05  
2.01 Introduction to Dynamic Programming link 24:48 slides (pdf)
2.02 Dynamic Programming: Value Iteration link 28:14 slides (pdf)
2.03 Dynamic Programming: Policy Iteration link 25:45 slides (pdf)
2.04 Summary link 12:03 slides (pdf)
2.05 Hands-On: DP on Frozen Lake link 16:15 code (zip)
12.05.2022: Q&A on Lecture 3: Model-free Prediction (slides, video)
 1:39:06  
3.01 Introduction to Model-free Reinforcement Learning link 7:45 slides (pdf)
3.02 Model-free Prediction with Monte Carlo link 31:19 slides (pdf)
3.03 Model-free Prediction with Time-Difference Learning link 47:49 slides (pdf)
3.04 Hands-On: TD Learning on Frozen Lake link 12:13 code (zip)
19.05.2022: Q&A on Lecture 4: Model-free Control (slides, video)
 1:51:17  
4.01 Model-free Control link 58:33 slides (pdf)
4.02 Hands-On: OpenAI Gym Intro link  14:37 code (zip)
4.03 Hands-On: Q-Learning on Frozen lake link  15:15 code (zip)
4.04 Summary on Model-free Reinforcement Learning link 22:52 slides (pdf)
26.05.2021: Holiday/Vacation
02.06.2022: Q&A on Lecture 5: Value Function Approximation (slides, video)
2:04:39  
5.01 Value Function Approximation link 14:11 slides (pdf)
5.02 Linear Value Function Approximation link 34:26 slides (pdf)
5.03 Deep Q-Networks (error on slide #15 corrected in pdf) link 44:13 slides (pdf)
5.04 Hands-On: DQN on Cartpole link  22:51 code (ipynb)
5.05 Summary link 8:58 slides (pdf)
09.06.2022: Q&A on Lecture 6: Policy-based RL – Part 1 (slides, video)
2:03:41  
6.01 Introduction to Policy-based RL link 54:44 slides (pdf)
6.02 Policy Gradients  link  54:49 slides (pdf)
6.03 Hands-On: Monte-Carlo Policy Gradient link 14:08 code (zip)
16.06.2021: Holiday/Vacation
23.06.2022: Q&A on Lecture 7: Policy-based RL – Part 2 (slides, video)
2:01:05  
7.01 Actor-Critics link 21:48 slides (pdf)
7.02 Trust-Region Policy Optimization (TRPO) link  1:06:09 slides (pdf)
7.03 Proximal Policy Optimization (PPO) link  17:50 slides (pdf)
7.04 Deep Deterministic Policy Gradient (DDPG) link  15:18 slides (pdf)
30.06.2022: Q&A on Model-based RL – Part 1 (slides, video)  1:38:05  
8.01  Introduction to Model-based RL  link 33:01 slides (pdf)
8.02 Background Planning link 19:22 slides (pdf)
8.03 Online Planning with Discrete Actions link 45:42 slides (pdf)
07.07.2022: Q&A on Model-based RL – Part 2 (slides, video) 1:08:05  
9.01 Online Planning with Continuous Actions link 22:09 slides (pdf)
9.02 Real-World Application: Uncertainty in Model-based RL link  39:25 slides (pdf)
9.03 Summary on Model-based RL link 6:31 slides (pdf)
14.07.2022: Q&A on Exploration Strategies (slides, video) 1:32:20  
10.01 Motivation, Multi-Armed Bandits, Regret link 33:45 slides (pdf)
10.02  Classic Exploration Strategies link 58:35 slides (pdf)
21.07.2022: Q&A on Exploration in Deep RL (slides, video) 2:07:43  
11.01 Count-based Exploration link 32:32 slides (pdf)
11.02 Prediction-based Exploration link 56:25 slides (pdf)
11.03 Memory-based Exploration link 38:46 slides (pdf)
28.07.2022: Course Wrap-Up (slides, video)    
  Summary on Content      
  Discussion of Course Evaluation Results      
  Q&A on the Exam      

 

Exercises

 

  # Topic Material Due Date
28.04.2022 1 Markov Decision Processes ex1.pdf 05.05.2022
05.05.2022 2 Dynamic Programming ex2.pdf | ex2_skeleton.zip 12.05.2022
12.05.2022 3 OpenAI Gym & TD-Learning ex3.pdf | ex3_skeleton.zip 19.05.2022
19.05.2022 4 TD Control ex4.pdf | ex4_skeleton.zip 02.06.2022
19.05.2022 5 PyTorch ex5.pdf | ex5_skeleton.zip 02.06.2022
26.05.2022 no exercises / holiday
02.06.2022 6 DQNs v0.2 on Cartpole ex6.pdf | ex6_skeleton.zip 09.06.2022
09.06.2022 7 DDQNs on Atari ex7.pdf (Update) | ex7_skeleton.zip 23.06.2022
09.06.2022 8 VPG ex8.pdf | ex8_skeleton.zip 23.06.2022
16.06.2022 no exercises / holiday
23.06.2022 9 A2C ex9.pdf | ex9_skeleton.zip 07.07.2022
30.06.2022 10 MCTS ex10.pdf | ex10_skeleton.zip 14.07.2022
07.07.2022        
14.07.2022 11 Bandits ex11.pdf | ex11_skeleton.zip 21.07.2022
21.07.2022 12 RND & ICM on MountainCar ex12.pdf | ex12_skeleton.zip 28.07.2022
28.07.2022        

 

Course Evaluation

The evaluation of the lecture can be found here and the evaluation of the exercises here.

Literature

  • Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
  • Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
  • UC Berkeley CS188: Intro to AI [link]
  • University College London Course on RL [link]
  • Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
  • https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
  • https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Interesting talks, articles, and blog-posts:

  • Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
  • David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
  • Technion Research: Why does reinforcement learning not work (for you)? [link]
  • RL algorithms quick overview [link]

Code examples and exercises:

  • GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP