Thesis & Projects
- Thesis
- Projects
Publications
Datasets
Teaching
- Reinforcement Learning @FAU2025
- Archive

2020

Daniel Landgraf: Hierarchical Learning and Model Predictive Control (FAU Erlangen-Nürnberg, 2020)

Posted by By mugga May 2, 2022Posted in2020, Finished, Master Thesis, Reinforcement Learning

Recent progress in (deep) learning algorithms has shown great potential to learn complex tasks purely from large numbers of samples or interactions with the environment. However, performing such interactions can…

Scroll to Top

There is also a different way to go on from here. Alternatively we can estimate the distances of the mobile tag to the receiver units using two-way ranging (TWR). TWR uses the most simplistic idea: we exchange messages between the mobile tag and the stationary receiver and measure the round trip time (RTT) of the message. From this RTT we can directly calculate the distance between those two. The hardware footprint is low as we do not need to precisely synchronize the clocks of the receiver and the mobile tag. We only need to synchronize the channel access. However, here also lies the general drawback of this approach: to estimate a single position we need to send 2 messages (back and forth) between 4 receivers and the mobile tag (= 8 messages). As the channel is potentially shared to localise several mobile objects the positioning update rate quickly drops.

Literature for DD-PPO: Near-Perfect PointGoal Navigators:

https://arxiv.org/abs/1911.00357
https://ai.facebook.com/blog/near-perfect-point-goal-navigation-from-25-billion-frames-of-experience/

Literature for Curiosity:

https://arxiv.org/abs/1705.05363
https://pathak22.github.io/large-scale-curiosity/resources/largeScaleCuriosity2018.pdf
https://pathak22.github.io/large-scale-curiosity/

Literature for Multi-Agent Reinforcement Learning:

https://rlss.inria.fr/files/2019/07/RLSS_Multiagent.pdf (This is a very good intro to understand the problem, game theory problem, symbols, etc.; but don’t look into the algorithms described here)
Multiagent Cooperation and Competition with Deep Reinforcement Learning: https://arxiv.org/abs/1511.08779
Emergent Complexity via Multi-Agent Competition: https://arxiv.org/abs/1710.03748
Emergent Tool Use From Multi-Agent Autocurricula: https://arxiv.org/abs/1909.07528

Literature for Explainable RL:

Gunning, D. (2017). Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web: https://www.darpa.mil/attachments/XAIProgramUpdate.pdf
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM: https://arxiv.org/pdf/1602.04938.pdf
http://www.heatmapping.org
Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504: https://arxiv.org/abs/1705.08504
Bastani, O., Pu, Y., & Solar-Lezama, A. (2018). Verifiable Reinforcement Learning via Policy Extraction. arXiv preprint arXiv:1805.08328: https://arxiv.org/pdf/1805.08328.pdf + https://obastani.github.io/docs/viper-presentation.pdf

Literature for Simulation-to-Reality Transfer:

Peng, X. B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018, May). Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 1-8). IEEE.: https://arxiv.org/pdf/1710.06537.pdf
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017, September). Domain randomization for transferring deep neural networks from simulation to the real world. In Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on (pp. 23-30). IEEE: https://arxiv.org/pdf/1703.06907.pdf
Rusu, A. A., Vecerik, M., Rothörl, T., Heess, N., Pascanu, R., & Hadsell, R. (2016). Sim-to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286: https://arxiv.org/abs/1610.04286
Sadeghi, F., & Levine, S. (2016). CAD2RL: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201: https://arxiv.org/pdf/1611.04201.pdf
Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., … & Levine, S. (2018, May). Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 4243-4250). IEEE: https://arxiv.org/pdf/1709.07857.pdf
Rajeswaran, A., Ghotra, S., Ravindran, B., & Levine, S. (2016). Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283: https://arxiv.org/pdf/1610.01283.pdf

Literature for MAML:

https://arxiv.org/abs/1703.03400
https://towardsdatascience.com/model-agnostic-meta-learning-maml-8a245d9bc4ac
https://medium.com/towards-artificial-intelligence/how-to-train-maml-model-agnostic-meta-learning-90aa093f8e46

Literature for Hierarchical Reinforcement Learning:

https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/
https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf
https://arxiv.org/abs/1805.08296

Literature for Option Critics:

https://arxiv.org/pdf/1609.05140.pdf
https://alversafa.github.io/blog/2018/11/28/optncrtc.html
https://papers.nips.cc/paper/8243-learning-abstract-options.pdf

Literature for Batch Reinforcement Learning:

Lange, S., Gabel, T., & Riedmiller, M. (2012). Batch reinforcement learning. In Reinforcement learning (pp. 45-73). Springer, Berlin, Heidelberg: http://ml.informatik.uni-freiburg.de/former/_media/publications/langegabelriedmiller2011chapter.pdf
Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of machine learning research, 4(Dec), 1107-1149: http://www.jmlr.org/papers/volume4/lagoudakis03a/lagoudakis03a.pdf + http://www.intelligence.tuc.gr/~lagoudakis/DOCS/thesis.pdf
Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(Apr), 503-556: http://www.jmlr.org/papers/volume6/ernst05a/ernst05a.pdf
Riedmiller, M. (2005, October). Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In European Conference on Machine Learning (pp. 317-328). Springer, Berlin, Heidelberg: http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf

Literature for World Models:

David Ha, Jürgen Schmidhuber: Recurrent World Models Facilitate Policy Evaluation. NeurIPS 2018: 2455-2467.
https://arxiv.org/abs/1803.10122
https://worldmodels.github.io
https://medium.com/@SmartLabAI/world-models-a-reinforcement-learning-story-cdcc86093c5

Literature for Model-Based RL:

Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11) (pp. 465-472): https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Deisenroth_ICML_2011.pdf
Deisenroth, M. P., Fox, D., & Rasmussen, C. E. (2015). Gaussian processes for data-efficient learning in robotics and control. IEEE transactions on pattern analysis and machine intelligence, 37(2), 408-423: http://robotics.caltech.edu/wiki/images/d/d2/GPsDataEfficientLearning.pdf
Deisenroth, M. P. (2010). Efficient reinforcement learning using Gaussian processes (Vol. 9). KIT Scientific Publishing: https://pdfs.semanticscholar.org/c9f2/1b84149991f4d547b3f0f625f710750ad8d9.pdf
Gal, Y., McAllister, R., & Rasmussen, C. E. (2016, April). Improving PILCO with Bayesian neural network dynamics models. In Data-Efficient Machine Learning workshop, ICML: http://mlg.eng.cam.ac.uk/yarin/website/PDFs/DeepPILCO.pdf
Punjani, A., & Abbeel, P. (2015, May). Deep learning helicopter dynamics models. In Robotics and Automation (ICRA), 2015 IEEE International Conference on (pp. 3223-3230). IEEE: https://people.eecs.berkeley.edu/~pabbeel/papers/2015-ICRA-deep-learning-heli.pdf
Chua, K., Calandra, R., McAllister, R., & Levine, S. (2018). Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. arXiv preprint arXiv:1805.12114: https://arxiv.org/abs/1805.12114

Literature for Generative Adversarial Reinforcement Learning:

https://arxiv.org/abs/1606.03476
https://hollygrimm.com/rl_gail
https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60
http://proceedings.mlr.press/v70/baram17a.html
https://arxiv.org/abs/1612.02179

Literature for Imitation Learning:

Pomerleau, D. A. (1989). Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (pp. 305-313): https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network
Giusti, A., Guzzi, J., Ciresan, D. C., He, F. L., Rodríguez, J. P., Fontana, F., … & Scaramuzza, D. (2016). A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots. IEEE Robotics and Automation Letters, 1(2), 661-667: http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., … & Zhang, X. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316: https://arxiv.org/abs/1604.07316
Ross, S., Gordon, G., & Bagnell, D. (2011, June). A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 627-635): https://arxiv.org/abs/1011.0686
Ross, S., Melik-Barkhudarov, N., Shankar, K. S., Wendel, A., Dey, D., Bagnell, J. A., & Hebert, M. (2013, May). Learning monocular reactive uav control in cluttered natural environments. In 2013 IEEE international conference on robotics and automation (pp. 1765-1772). IEEE: https://arxiv.org/abs/1211.1690

Literature for Deep Deterministic Policy Gradient:

https://arxiv.org/abs/1509.02971
http://proceedings.mlr.press/v32/silver14.pdf
https://spinningup.openai.com/en/latest/algorithms/ddpg.html
https://spinningup.openai.com/en/latest/algorithms/td3.html
https://arxiv.org/abs/1802.09477

Literature for Asynchronous Advantage Actor-Critic (A3C) (+ A2C):

- https://arxiv.org/abs/1602.01783
- https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2
- https://arxiv.org/pdf/1611.05397.pdf

Literature for Trust-Region Policy Optimization (TRPO):

https://arxiv.org/abs/1506.02438
https://arxiv.org/abs/1502.05477
https://www.youtube.com/watch?v=jcF-HaBz0Vw
https://spinningup.openai.com/en/latest/algorithms/trpo.html
https://medium.com/@jonathan_hui/rl-trust-region-policy-optimization-trpo-explained-a6ee04eeeee9

Literature for Proximal Policy Optimization (PPO):

https://arxiv.org/abs/1707.06347
https://openai.com/blog/openai-baselines-ppo/
https://www.youtube.com/watch?v=5P7I-xPq8u8
https://spinningup.openai.com/en/latest/algorithms/ppo.html
https://arxiv.org/abs/1506.02438
https://arxiv.org/abs/1707.02286

2.05 Summary