Reinforcement Learning @FAU2022 – Christopher Mutschler

Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.

In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent maximizes a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.

Lecturers & Course Instructors

Christopher Mutschler
(course instructor)

Sebastian Rietsch
(teaching assistant / exercises)

The lectures will be recorded and provided on YouTube before the lecture slot. The lecture slot can be used as a consultation hour and if you give me a short ping we can organise an ad-hoc zoom meeting. The Q&A session for the lecture content will take place at the beginning of the exercise session.

Course Material

We compiled a YouTube playlist for this summer term here. However, you can also access the videos directly below, or if you are curious, you can have a look on the previous year’s edition as we will only occasionally record new content. However, exercises will change a bit.

	Topic	Video	Duration	Material
28.04.2022: Lecture 1: Introduction to Reinforcement Learning
1.00/01	Opening Remarks & Introduction to RL	live/zoom	–	slides (pdf)
1.02	Markov Decision Processes	link	37:49	slides (pdf)
05.05.2022: Q&A on Lecture 2: Dynamic Programming (slides, video)			1:47:05
2.01	Introduction to Dynamic Programming	link	24:48	slides (pdf)
2.02	Dynamic Programming: Value Iteration	link	28:14	slides (pdf)
2.03	Dynamic Programming: Policy Iteration	link	25:45	slides (pdf)
2.04	Summary	link	12:03	slides (pdf)
2.05	Hands-On: DP on Frozen Lake	link	16:15	code (zip)
12.05.2022: Q&A on Lecture 3: Model-free Prediction (slides, video)			1:39:06
3.01	Introduction to Model-free Reinforcement Learning	link	7:45	slides (pdf)
3.02	Model-free Prediction with Monte Carlo	link	31:19	slides (pdf)
3.03	Model-free Prediction with Time-Difference Learning	link	47:49	slides (pdf)
3.04	Hands-On: TD Learning on Frozen Lake	link	12:13	code (zip)
19.05.2022: Q&A on Lecture 4: Model-free Control (slides, video)			1:51:17
4.01	Model-free Control	link	58:33	slides (pdf)
4.02	Hands-On: OpenAI Gym Intro	link	14:37	code (zip)
4.03	Hands-On: Q-Learning on Frozen lake	link	15:15	code (zip)
4.04	Summary on Model-free Reinforcement Learning	link	22:52	slides (pdf)
26.05.2021: Holiday/Vacation
02.06.2022: Q&A on Lecture 5: Value Function Approximation (slides, video)			2:04:39
5.01	Value Function Approximation	link	14:11	slides (pdf)
5.02	Linear Value Function Approximation	link	34:26	slides (pdf)
5.03	Deep Q-Networks (error on slide #15 corrected in pdf)	link	44:13	slides (pdf)
5.04	Hands-On: DQN on Cartpole	link	22:51	code (ipynb)
5.05	Summary	link	8:58	slides (pdf)
09.06.2022: Q&A on Lecture 6: Policy-based RL – Part 1 (slides, video)			2:03:41
6.01	Introduction to Policy-based RL	link	54:44	slides (pdf)
6.02	Policy Gradients	link	54:49	slides (pdf)
6.03	Hands-On: Monte-Carlo Policy Gradient	link	14:08	code (zip)
16.06.2021: Holiday/Vacation
23.06.2022: Q&A on Lecture 7: Policy-based RL – Part 2 (slides, video)			2:01:05
7.01	Actor-Critics	link	21:48	slides (pdf)
7.02	Trust-Region Policy Optimization (TRPO)	link	1:06:09	slides (pdf)
7.03	Proximal Policy Optimization (PPO)	link	17:50	slides (pdf)
7.04	Deep Deterministic Policy Gradient (DDPG)	link	15:18	slides (pdf)
30.06.2022: Q&A on Model-based RL – Part 1 (slides, video)			*1:38:05*
8.01	Introduction to Model-based RL	link	33:01	slides (pdf)
8.02	Background Planning	link	19:22	slides (pdf)
8.03	Online Planning with Discrete Actions	link	45:42	slides (pdf)
07.07.2022: Q&A on Model-based RL – Part 2 (slides, video)			1:08:05
9.01	Online Planning with Continuous Actions	link	22:09	slides (pdf)
9.02	Real-World Application: Uncertainty in Model-based RL	link	39:25	slides (pdf)
9.03	Summary on Model-based RL	link	6:31	slides (pdf)
14.07.2022: Q&A on Exploration Strategies (slides, video)			1:32:20
10.01	Motivation, Multi-Armed Bandits, Regret	link	33:45	slides (pdf)
10.02	Classic Exploration Strategies	link	58:35	slides (pdf)
21.07.2022: Q&A on Exploration in Deep RL (slides, video)			2:07:43
11.01	Count-based Exploration	link	32:32	slides (pdf)
11.02	Prediction-based Exploration	link	56:25	slides (pdf)
11.03	Memory-based Exploration	link	38:46	slides (pdf)
28.07.2022: Course Wrap-Up (slides, video)
	Summary on Content
	Discussion of Course Evaluation Results
	Q&A on the Exam

Exercises

	#	Topic	Material	Due Date
28.04.2022	1	Markov Decision Processes	ex1.pdf	05.05.2022
05.05.2022	2	Dynamic Programming	ex2.pdf \| ex2_skeleton.zip	12.05.2022
12.05.2022	3	OpenAI Gym & TD-Learning	ex3.pdf \| ex3_skeleton.zip	19.05.2022
19.05.2022	4	TD Control	ex4.pdf \| ex4_skeleton.zip	02.06.2022
19.05.2022	5	PyTorch	ex5.pdf \| ex5_skeleton.zip	02.06.2022
26.05.2022	no exercises / holiday
02.06.2022	6	DQNs v0.2 on Cartpole	ex6.pdf \| ex6_skeleton.zip	09.06.2022
09.06.2022	7	DDQNs on Atari	ex7.pdf (Update) \| ex7_skeleton.zip	23.06.2022
09.06.2022	8	VPG	ex8.pdf \| ex8_skeleton.zip	23.06.2022
16.06.2022	no exercises / holiday
23.06.2022	9	A2C	ex9.pdf \| ex9_skeleton.zip	07.07.2022
30.06.2022	10	MCTS	ex10.pdf \| ex10_skeleton.zip	14.07.2022
07.07.2022
14.07.2022	11	Bandits	ex11.pdf \| ex11_skeleton.zip	21.07.2022
21.07.2022	12	RND & ICM on MountainCar	ex12.pdf \| ex12_skeleton.zip	28.07.2022
28.07.2022

Course Evaluation

The evaluation of the lecture can be found here and the evaluation of the exercises here.

Literature

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
UC Berkeley CS188: Intro to AI [link]
University College London Course on RL [link]
Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Interesting talks, articles, and blog-posts:

Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
Technion Research: Why does reinforcement learning not work (for you)? [link]
RL algorithms quick overview [link]

Code examples and exercises:

GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP