Reinforcement Learning @FAU2021 – Christopher Mutschler

Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.

In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent maximizes a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.

Lecturers & Course Instructors

Christopher Mutschler
(course instructor)

Lukas Schmidt
(teaching assistant / exercises)

Sebastian Rietsch
(teaching assistant / exercises)

The lectures will be recorded and provided on YouTube before the lecture slot. The lecture slot can be used as a consultation hour and if you give me a short ping we can organise an ad-hoc zoom meeting. The Q&A session for the lecture content will take place at the beginning of the exercise session.

Course Material

Lecture

We compiled a YouTube playlist here. However, you can also access the videos directly below.

	Topic	Video	Duration	Material
15.04.2021: Lecture 1: Introduction to Reinforcement Learning (Q&A: 19.04.2021)			1:44:21
1.01	Opening Remarks	link	14:33	slides (pdf)
1.02	Introduction to RL	link	51:59	slides (pdf)
1.03	Markov Decision Processes	link	37:49	slides (pdf)
22.04.2021: Lecture 2: Dynamic Programming (Q&A: 26.04.2021)			1:47:05
2.01	Introduction to Dynamic Programming	link	24:48	slides (pdf)
2.02	Dynamic Programming: Value Iteration	link	28:14	slides (pdf)
2.03	Dynamic Programming: Policy Iteration	link	25:45	slides (pdf)
2.04	Summary	link	12:03	slides (pdf)
2.05	Hands-On: DP on Frozen Lake	link	16:15	code (zip)
29.04.2021: Lecture 3: Model-free Prediction (Q&A: 03.05.2021)			1:39:06
3.01	Introduction to Model-free Reinforcement Learning	link	7:45	slides (pdf)
3.02	Model-free Prediction with Monte Carlo	link	31:19	slides (pdf)
3.03	Model-free Prediction with Time-Difference Learning	link	47:49	slides (pdf)
3.04	Hands-On: TD Learning on Frozen Lake	link	12:13	code (zip)
06.05.2021: Lecture 4: Model-free Control (Q&A: 10.05.2021)			1:51:17
4.01	Model-free Control	link	58:33	slides (pdf)
4.02	Hands-On: OpenAI Gym Intro	link	14:37	code (zip)
4.03	Hands-On: Q-Learning on Frozen lake	link	15:15	code (zip)
4.04	Summary on Model-free Reinforcement Learning	link	22:52	slides (pdf)
13.05.2021: Holiday/Vacation
20.05.2021: Lecture 5: Value Function Approximation (Q&A: 27.05.2021)			2:04:39
5.01	Value Function Approximation	link	14:11	slides (pdf)
5.02	Linear Value Function Approximation	link	34:26	slides (pdf)
5.03	Deep Q-Networks (error on slide #15 corrected in pdf)	link	44:13	slides (pdf)
5.04	Hands-On: DQN on Cartpole	link	22:51	code (ipynb)
5.05	Summary	link	8:58	slides (pdf)
27.05.2021: Lecture 6: Policy-based RL – Part 1 (Q&A: 31.05.2021)			2:03:41
6.01	Introduction to Policy-based RL	link	54:44	slides (pdf)
6.02	Policy Gradients	link	54:49	slides (pdf)
6.03	Hands-On: Monte-Carlo Policy Gradient	link	14:08	code (zip)
03.06.2021: Holiday/Vacation
10.06.2021: Lecture 7: Policy-based RL – Part 2 (Q&A: 14.06.2021)			2:01:05
7.01	Actor-Critics	link	21:48	slides (pdf)
7.02	Trust-Region Policy Optimization (TRPO)	link	1:06:09	slides (pdf)
7.03	Proximal Policy Optimization (PPO)	link	17:50	slides (pdf)
7.04	Deep Deterministic Policy Gradient (DDPG)	link	15:18	slides (pdf)
17.06.2021: Model-based RL – Part 1 (Q&A: 21.06.2021)			*1:38:05*
8.01	Introduction to Model-based RL	link	33:01	slides (pdf)
8.02	Background Planning	link	19:22	slides (pdf)
8.03	Online Planning with Discrete Actions	link	45:42	slides (pdf)
24.06.2021: Model-based RL – Part 2 (Q&A: 28.06.2021)			1:08:05
9.01	Online Planning with Continuous Actions	link	22:09	slides (pdf)
9.02	Real-World Application: Uncertainty in Model-based RL	link	39:25	slides (pdf)
9.03	Summary on Model-based RL	link	6:31	slides (pdf)
01.07.2021: Exploration Strategies (Q&A: 05.07.2021)			1:32:20
10.01	Motivation, Multi-Armed Bandits, Regret	link	33:45	slides (pdf)
10.02	Classic Exploration Strategies	link	58:35	slides (pdf)
08.07.2021: Exploration in Deep RL (Q&A: 12.07.2021)			2:07:43
11.01	Count-based Exploration	link	32:32	slides (pdf)
11.02	Prediction-based Exploration	link	56:25	slides (pdf)
11.03	Memory-based Exploration	link	38:46	slides (pdf)
15.07.2021: Course Wrap-Up (Zoom-Session on 15.07.2021 8:30)				slides (pdf)
	Summary on Content
	Discussion of Course Evaluation Results
	Q&A on the Exam
Optional Lecture: Offline RL (T.B.D.)
Optional Lecture: Dependable RL (T.B.D.)

Exercises

zipDate	Material
19.04.2021	ex1.pdf
26.04.2021	ex2.pdf \| ex2.zip
03.05.2021	ex3.pdf \| ex3.zip
10.05.2021	ex4.pdf \| ex4.zip
17.05.2021	ex5.pdf \| ex5.zip
~~24.05.2021~~ 27.05.2021	no exercise on Monday due to holiday -> we will use Thursday morning in this week ex6_7.pdf \| ex6_7.zip
31.05.2021
07.06.2021	ex8_9.pdf \| ex8_9.zip
14.06.2021	ex8_9.pdf \| ex8_9.zip
21.06.2021	ex10_11.pdf \| ex10_11.zip
28.06.2021	ex10_11.pdf \| ex10_11.zip
05.07.2021	ex12.pdf \| ex12.zip

Evaluation

The evaluation of the lecture and exercise can be found here.

Literature

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
UC Berkeley CS188: Intro to AI [link]
University College London Course on RL [link]
Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Interesting talks, articles, and blog-posts:

Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
Technion Research: Why does reinforcement learning not work (for you)? [link]
RL algorithms quick overview [link]

Code examples and Exercises:

GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP