Reinforcement Learning @FAU2023 – Christopher Mutschler

Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.

In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent maximizes a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.

Lecturers & Course Instructors

Christopher Mutschler
(course instructor)

Sebastian Rietsch
(teaching assistant / exercises)

Nico Meyer
(teaching assistant / exercises)

Course Material

We changed the lecture content a bit in this year. Please refer to previous version here (SS2021) and here (SS2022). The lecture will take place on Wednesdays from 8:15 to 9:45 in 11401.00.116 (H14 Bernhard-Ilschner-Hörsaal (0.61), Martensstraße 5-7, 91058 Erlangen, ER-Südgelände).

Week	Date	Topic	Material
1	19.04.	Introduction to RL, Markov Decision Processes	01 Intro RL, MDPs.pdf
2	26.04.	Dynamic Programming	02 Dynamic Programming.pdf
3	03.05.	Model-free Prediction	03 Model-free Prediction.pdf
4	10.05.	Model-free Control	04 Model-free Control.pdf
5	17.05.	Value Function Approximation, DQNs	05 Value Function Approximation.pdf
6	24.05.	Policy-based RL #1	06 Policy-based RL 1.pdf
7	31.05.	Policy-based RL #2	07 Policy-based RL 2.pdf
8	07.06.	Guest Lecture: Quantum Reinforcement Learning (Nico Meyer, Fraunhofer IIS)	08 Quantum Reinforcement Learning.pdf
9	14.06.	Model-based RL #1 (Discrete Actions)	09 Model-based RL 1.pdf
10	21.06.	Model-based RL #2 (Continuous Actions)	10 Model-based RL 2.pdf
11	28.06.	Exploration-Exploitation, Regret, Bandits	11 Exploration-Exploitation.pdf
12	05.07.	Exploration in Deep RL, Intrinsic Motivation (2:07:34) 12.01 Count-based Exploration 12.02 Prediction-based Exploration 12.03 Memory-based Exploration	12 Exploration in Deep RL.pdf video (32:32) video (56:25) video (38:46)
13	12.07.	Offline Reinforcement Learning (1:55:22) 13.01 Intro to Offline RL 13.02 Challenges of Offline RL 13.03 Policy-constrained Offline RL 13.04 BEAR 13.05 Conservative Policy Evaluation	13 Offline RL.pdf video (16:13) video (32:26) video (33:37) video (17:21) video (15:45)
14	19.07.	Guest Lecture: ChatGPT (Georgios Kontes, Fraunhofer IIS) Course Wrap-Up, Discussion of Evaluation Results, Discussion of latest HW	14 WrapUp.pdf 14 ChatGPT_RL.pdf

Exercises

The exercises will take place on Fridays from 10:15 to 11:45 in 11401.00.116 (H14 Bernhard-Ilschner-Hörsaal (0.61), Martensstraße 5-7, 91058 Erlangen, ER-Südgelände).

Week	Date	Topic	Material	Due Date (discussion of solution on…)
1	no exercises
2	28.04.	MDPs (slides)	ex1.pdf	28.04.
	28.04.	Dynamic Programming (slides)	ex2.pdf, ex2_skeleton.zip	05.05.
3	05.05.	OpenAI Gym, TD-Learning (slides)	ex3.pdf, ex3_skeleton.zip	12.05.
4	12.05.	TD-Control (slides)	ex4.pdf, ex4_skeleton.zip	19.05.
5	19.05.	PyTorch, DQNs (slides)	ex5.pdf, ex5_skeleton.zip (live) ex6.pdf, ex6_skeleton.zip	02.06.
6	26.05.	PyTorch, DQNs (slides)	ex5.pdf, ex5_skeleton.zip (live) ex6.pdf, ex6_skeleton.zip	02.06.
7	02.06.	VPG, A2C, PPO (slides)	ex7.pdf, ex7_skeleton.zip	16.06.
8	09.06.	VPG, A2C, PPO (slides)	ex7.pdf, ex7_skeleton.zip	16.06.
9	16.06.	MCTS (slides)	ex8.pdf, ex8_skeleton.zip	23.06.
10	23.06.	CEM (slides)	ex9.pdf, ex9_skeleton.zip	30.06.
11	30.06.	Multi-armed Bandits (slides)	ex10.pdf, ex10_skeleton.zip	07.07.
12	07.07.	RND/ICM (slides)	ex11.pdf, ex11_skeleton.zip	14.07.
13	14.07.	BCQ (Nico)	ex12.pdf, ex12_skeleton.zip	19.07. (lecture slot)

Course Evaluation

The evaluation of the lecture and the exercises will be made available here.

Literature

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
UC Berkeley CS188: Intro to AI [link]
University College London Course on RL [link]
Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Interesting talks, articles, and blog-posts:

Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
Technion Research: Why does reinforcement learning not work (for you)? [link]
RL algorithms quick overview [link]

Code examples and exercises:

GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP