Reinforcement Learning @LMU2020 – Christopher Mutschler

Reinforcement Learning (RL) is an area of Machine Learning that has recently made large advances and has been publicly visible by reaching and surpassing human skill levels in games like Go and Starcraft. These successes show that RL has the potential to transform many areas of research and industry by automatizing the development of processes that once needed to be engineered explicitly.

In contrast to other machine learning paradigms, which require the presence of (labeled or unlabeled) data, RL considers an agent that takes actions in an environment and learns from resulting feedback. The agent tries to maximize a reward signal that it receives for desirable outcomes, while at the same time trying to explore the world in which it operates to find yet unknown, potentially more rewarding action sequences–a dilemma known as the exploration-exploitation tradeoff. Recent advances in machine learning based on deep learning have made RL methods particularly powerful since they allow for agents with particularly well performing models of the world.

The seminar will start with 6 introductory lectures to RL where we cover the foundations of RL (i.e., Markov decision processes and dynamic programming techniques) before we go to model-free prediction and control algorithms such as TD-learning, SARSA and Q-learning. We will also get the general idea behind value function approximation techniques such as Deep Q-Networks (DQN) and study advanced policy-gradient and actor-critic methods.

Students will then explore different intermediate to advanced areas of RL, for which they will prepare a presentation as well as code examples. The presentations will take place at the end of the semester (or beginning of the summer break) as a block seminar (we doodle dates that fit for the students and assign slots on a best-effort basis).

Table of Contents

Lecture

	Topic	Video	Duration	Material
0.01	Opening Remarks	1080p	06:11	pdf
*Lecture 1: Introduction (Q&A Session: 21.04.2020 13:00)*			*1:23:06*
1.01	Introduction to Reinforcement Learning	1080p	46:21	pdf
1.02	Markov Decision Processes	1080p	36:45	pdf
*Lecture 2: Dynamic Programming (Q&A Session: 28.04.2020 13:00)*			*1:39:16*
2.01	Introduction to Dynamic Programming	1080p	18:21	pdf
2.02	Value Iteration	1080p	27:44	pdf
2.03	Policy Iteration	1080p	25:59	pdf
2.04	Example: Frozen lake with Value Iteration	1080p	16:15	DP_agent.ipynb, FrozenLake.py
2.05	Summary	1080p	10:57	pdf
*Lecture 3: Model-free Prediction (Q&A Session: 05.05.2020 13:00, pdf)*			*1:25:00*
3.01	Introduction to Model-free Reinforcement Learning	1080p	05:18	pdf
3.02	Monte-Carlo Learning	1080p	25:38	pdf
3.03	Time-Difference Learning	1080p	41:52	pdf
3.04	Example: Frozen Lake with TD-Learning	1080p	12:12	TD_agent.ipynb, FrozenLake.py
*Lecture 4: Model-free Control (Q&A Session: 12.05.2020 13:00)*			*1:30:35*
4.01	Model-Free Control	1080p	37:38	pdf
4.02	Example: Intro to OpenAI Gym	1080p	14:36	0_GymDemo.ipynb, logger.py
4.03	Example: Frozen Lake with Q-Learning	1080p	15:14	Q-Learning_Agent.ipynb
4.04	Summary	1080p	23:07	pdf
*Lecture 5: Value Function Approximation (Q&A Session: 19.05.2020 13:00)*			*1:51:12*
5.01	Value Function Approximation	1080p	15:25	pdf
5.02	Linear Value Function Approximation	1080p	31:32	pdf
5.03	Deep Q-Networks	1080p	37:11	pdf
5.04	Example: Cartpole with DQNs	1080p	22:50	DQN.ipynb
5.05	Summary	1080p	04:14	pdf
*Lecture 6: Policy-based RL (Q&A Session: 26.05.2020 13:00)*			*2:13:57*
6.01	Introduction to Policy-based Reinforcement Learning	1080p	47:51	pdf
6.02	Policy Gradients	1080p	47:50	pdf
6.03	Example: Monte-Carlo Policy Gradient	1080p	14:07	Monte-Carlo Policy Gradient.ipynb
6.04	Actor-Critics	1080p	24:19	pdf
*Additional Q&A Session: 23.06.2020 12:00 (pdf)*

Notes/Remarks:

in Lecture 5.03 there is a small mistake: in DQN we sample the action to be taken by the agent from the evaluation network and *not* from the target network. The target is solely used to generate the Q-targets!

Student Presentations

The student presentations should take approx. 30 minutes (including Q&A) and be held in English. We expect the student to present runnable code as a part of their presentations. The students should hence be familiar with python. Further we assume that the students to have a deeper understanding of linear algebra and analysis, multivariate statistics, and machine learning in general.

All presentations are held in our zoom meeting slot from 12:00 – 14:00 (s.t. – we will start without delay!) on the respective dates.

Date	Student	Topic	Material
30.06.	Sven Lorenz	Multi-Agent Reinforcement Learning (literature)	pdf
30.06.	Sebastian Fischer	MAML: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (literature)	pdf
07.07.	Asmik Nalmpatian	Trust-Region Policy Optimization (TRPO) (literature)	pdf
	Tobias Weber	Batch RL (literature)	pdf
	Jonas Schweisthal	World Models (literature)	pdf, ipynb
14.07.	Amadeu Scheppach	Proximal Policy Optimization (PPO) (literature) + Example: DD-PPO: Near-Perfect PointGoal Navigators (literature)	pdf
	Matthias Gruber	Model-based Reinforcement Learning (literature)	pdf
	Hyeyoung Park	Explainable Reinforcement Learning (literature)	pdf
21.07.	Tobias Altmiks	Deep Deterministic Policy Gradient (DDPG) & Twin-Delayed DDPG (TD3) (literature)	pdf
28.07.	Alexander Pohl	Asynchronous Advantage Actor-Critic (A3C) (+A2C) (literature) + Unsupervised Auxilary Tasks	pdf
	Lennart Schneider	Option-Critics (literature)+Hierarchical Reinforcement Learning (literature)	pdf, ipynb
	Stefan Depperschmidt	Curiosity (literature)	pdf
29.07.	Rifat Amin	Imitation Learning (literature) + GAIL: Generative Adversarial Imitation Learning (literature)	pdf

Short Presentations

The short presentations should take approx. 10 minutes and focus on a paper from the recent ICLR conference.

Student	Topic	Slides
Adelina Khoroshevskaya	Qureshi et al.: Composing Task-Agnostic Policies with Deep Reinforcement Learning [paper]	pdf
Ilona Bamiller	Hafner et al: Dream to Control: Learning Behaviors by Latent Imagination [paper]	pdf
Julian Raith	Freeman et al.: Learning to Predict without looking ahead: World Models without Forward Prediction [paper]	pdf
Sergio Antelo	Casanova et al.: Reinforced Active Learning for Image Segmentation [paper]	pdf
Pranav Ragupathy	Freeman et al.: Learning to Predict without looking ahead: World Models without Forward Prediction [paper]	pdf
Viet Tran	Casanova et al.: Reinforced Active Learning for Image Segmentation [paper]	pdf
Xiao-Yin Janet To	Rajeswaran et al.: EPOpt: Learning Robust Neural Network Policies Using Model Ensembles [paper]	pdf

Exams

You can find the scheduling of the exams here.

Evaluation

You can find the results of the lecture and seminar evaluation here: SoSe_2020-Reinforcement_Learning_–_Seminar

Literature

While there is particular literature given in the slides of the videos the following list serves as a general basis to get into the topic but also to go deeper at particular points.

Richard S. Sutton and Andrew G. Barto. 2018. Reinforcement Learning: An Introduction. A Bradford Book, Cambridge, MA, USA. [link]
Bellman, R.E. 1957. Dynamic Programming. Princeton University Press, Princeton, NJ. Republished 2003: Dover, ISBN 0-486-42809-5.
UC Berkeley CS188: Intro to AI [link]
University College London Course on RL [link]
Advanced Deep Learning and Reinforcement Learning (UCL + DeepMind) [link]
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_dp.html
https://cs.stanford.edu/people/karpathy/reinforcejs/gridworld_td.html

Interesting talks, articles, and blog-posts:

Joelle Pineau: Reproducible, Reusable, and Robust Reinforcement Learning [youtube]
David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | AI Podcast #86 with Lex Fridman [youtube]
Technion Research: Why does reinforcement learning not work (for you)? [link]
RL algorithms quick overview [link]

Code examples and Exercises:

GitHub Repo of Denny Britz: https://github.com/dennybritz/reinforcement-learning/tree/master/DP