Skip to content
Christopher Mutschler
  • Thesis & Projects
    • Thesis
    • Projects
  • Research
    • AI-based Positioning
  • Publications
  • Datasets
    • Soccer Tracking Data
    • Warehouse Dataset
    • Tool Tracking Dataset
    • OnHW: Online Handwriting Dataset
    • Channel Impulse Responses
  • Teaching
    • Reinforcement Learning @FAU2023
    • Archive
      • RL@FAU2022
      • RL@FAU2021
      • RL@LMU2020
      • DataFest: Tool-Tracking with Machine Learning
  • Misc

Student Project

Alexander Mattick: Beam Tracking as a Time-Varying Reinforcement Learning Problem

Posted by By mugga November 1, 2022Posted in5G & 6G, Reinforcement Learning, Running, Student Project
The envisioned transition to 5G and 6G technologies have started to transform the properties of established communication networks. In the core of this transformation lies the capability of Base Stations…
Read More

Abhinav Singh: Safe Imitation Learning for Beam Tracking

Posted by By mugga November 1, 2022Posted in5G & 6G, Reinforcement Learning, Running, Student Project
The envisioned transition to 5G and 6G technologies have started to transform the properties of established communication networks. In the core of this transformation lies the capability of Base Stations…
Read More

2021: Hyeyoung Park: Towards Interpretable (and Robust) Reinforcement Learning Policies through Local Lipschitzness and Randomization

Posted by By mugga May 2, 2022Posted inFinished, Reinforcement Learning, Student Project
Reinforcement Learning is broadly applicable for diverse tasks across many domains. On many problems, it has achieved superhuman performance [5]. However, the black-box neural networks used by modern RL algorithms…
Read More
Scroll to Top

There is also a different way to go on from here. Alternatively we can estimate the distances of the mobile tag to the receiver units using  two-way ranging (TWR). TWR uses the most simplistic idea: we exchange messages between the mobile tag and the stationary receiver and measure the round trip time (RTT) of the message. From this RTT we can directly calculate the distance between those two. The hardware footprint is low as we do not need to precisely synchronize the clocks of the receiver and the mobile tag. We only need to synchronize the channel access. However, here also lies the general drawback of this approach: to estimate a single position we need to send 2 messages (back and forth) between 4 receivers and the mobile tag (= 8 messages). As the channel is potentially shared to localise several mobile objects the positioning update rate quickly drops.

Literature for Multi-Agent Reinforcement Learning:

  • https://rlss.inria.fr/files/2019/07/RLSS_Multiagent.pdf (This is a very good intro to understand the problem, game theory problem, symbols, etc.; but don’t look into the algorithms described here)
  • Multiagent Cooperation and Competition with Deep Reinforcement Learning: https://arxiv.org/abs/1511.08779
  • Emergent Complexity via Multi-Agent Competition: https://arxiv.org/abs/1710.03748
  • Emergent Tool Use From Multi-Agent Autocurricula: https://arxiv.org/abs/1909.07528

Literature for Asynchronous Advantage Actor-Critic (A3C) (+ A2C):

    • https://arxiv.org/abs/1602.01783
    • https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-8-asynchronous-actor-critic-agents-a3c-c88f72a5e9f2
    • https://arxiv.org/pdf/1611.05397.pdf

Literature for DD-PPO: Near-Perfect PointGoal Navigators:

  • https://arxiv.org/abs/1911.00357
  • https://ai.facebook.com/blog/near-perfect-point-goal-navigation-from-25-billion-frames-of-experience/

Literature for Explainable RL:

  • Gunning, D. (2017). Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web: https://www.darpa.mil/attachments/XAIProgramUpdate.pdf
  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, August). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1135-1144). ACM: https://arxiv.org/pdf/1602.04938.pdf
  • http://www.heatmapping.org
  • Bastani, O., Kim, C., & Bastani, H. (2017). Interpreting blackbox models via model extraction. arXiv preprint arXiv:1705.08504: https://arxiv.org/abs/1705.08504
  • Bastani, O., Pu, Y., & Solar-Lezama, A. (2018). Verifiable Reinforcement Learning via Policy Extraction. arXiv preprint arXiv:1805.08328: https://arxiv.org/pdf/1805.08328.pdf + https://obastani.github.io/docs/viper-presentation.pdf
2.05 Summary

Literature for Curiosity:

  • https://arxiv.org/abs/1705.05363
  • https://pathak22.github.io/large-scale-curiosity/resources/largeScaleCuriosity2018.pdf
  • https://pathak22.github.io/large-scale-curiosity/

Literature for Simulation-to-Reality Transfer:

  • Peng, X. B., Andrychowicz, M., Zaremba, W., & Abbeel, P. (2018, May). Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 1-8). IEEE.: https://arxiv.org/pdf/1710.06537.pdf
  • Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., & Abbeel, P. (2017, September). Domain randomization for transferring deep neural networks from simulation to the real world. In Intelligent Robots and Systems (IROS), 2017 IEEE/RSJ International Conference on (pp. 23-30). IEEE: https://arxiv.org/pdf/1703.06907.pdf
  • Rusu, A. A., Vecerik, M., Rothörl, T., Heess, N., Pascanu, R., & Hadsell, R. (2016). Sim-to-real robot learning from pixels with progressive nets. arXiv preprint arXiv:1610.04286: https://arxiv.org/abs/1610.04286
  • Sadeghi, F., & Levine, S. (2016). CAD2RL: Real single-image flight without a single real image. arXiv preprint arXiv:1611.04201: https://arxiv.org/pdf/1611.04201.pdf
  • Bousmalis, K., Irpan, A., Wohlhart, P., Bai, Y., Kelcey, M., Kalakrishnan, M., … & Levine, S. (2018, May). Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In 2018 IEEE International Conference on Robotics and Automation (ICRA) (pp. 4243-4250). IEEE: https://arxiv.org/pdf/1709.07857.pdf
  • Rajeswaran, A., Ghotra, S., Ravindran, B., & Levine, S. (2016). Epopt: Learning robust neural network policies using model ensembles. arXiv preprint arXiv:1610.01283: https://arxiv.org/pdf/1610.01283.pdf

Literature for MAML:

  • https://arxiv.org/abs/1703.03400
  • https://towardsdatascience.com/model-agnostic-meta-learning-maml-8a245d9bc4ac
  • https://medium.com/towards-artificial-intelligence/how-to-train-maml-model-agnostic-meta-learning-90aa093f8e46

Literature for Hierarchical Reinforcement Learning:

  • https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/
  • https://papers.nips.cc/paper/6233-hierarchical-deep-reinforcement-learning-integrating-temporal-abstraction-and-intrinsic-motivation.pdf
  • https://arxiv.org/abs/1805.08296

Literature for Option Critics:

  • https://arxiv.org/pdf/1609.05140.pdf
  • https://alversafa.github.io/blog/2018/11/28/optncrtc.html
  • https://papers.nips.cc/paper/8243-learning-abstract-options.pdf

Literature for Batch Reinforcement Learning:

  • Lange, S., Gabel, T., & Riedmiller, M. (2012). Batch reinforcement learning. In Reinforcement learning (pp. 45-73). Springer, Berlin, Heidelberg: http://ml.informatik.uni-freiburg.de/former/_media/publications/langegabelriedmiller2011chapter.pdf
  • Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of machine learning research, 4(Dec), 1107-1149:  http://www.jmlr.org/papers/volume4/lagoudakis03a/lagoudakis03a.pdf + http://www.intelligence.tuc.gr/~lagoudakis/DOCS/thesis.pdf
  • Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(Apr), 503-556: http://www.jmlr.org/papers/volume6/ernst05a/ernst05a.pdf
  • Riedmiller, M. (2005, October). Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In European Conference on Machine Learning (pp. 317-328). Springer, Berlin, Heidelberg: http://ml.informatik.uni-freiburg.de/former/_media/publications/rieecml05.pdf

Literature for World Models:

  1. David Ha, Jürgen Schmidhuber: Recurrent World Models Facilitate Policy Evaluation. NeurIPS 2018: 2455-2467.
  2. https://arxiv.org/abs/1803.10122
  3. https://worldmodels.github.io
  4. https://medium.com/@SmartLabAI/world-models-a-reinforcement-learning-story-cdcc86093c5

Literature for Model-Based RL:

  • Deisenroth, M., & Rasmussen, C. E. (2011). PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11) (pp. 465-472): https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Deisenroth_ICML_2011.pdf
  • Deisenroth, M. P., Fox, D., & Rasmussen, C. E. (2015). Gaussian processes for data-efficient learning in robotics and control. IEEE transactions on pattern analysis and machine intelligence, 37(2), 408-423: http://robotics.caltech.edu/wiki/images/d/d2/GPsDataEfficientLearning.pdf
  • Deisenroth, M. P. (2010). Efficient reinforcement learning using Gaussian processes (Vol. 9). KIT Scientific Publishing: https://pdfs.semanticscholar.org/c9f2/1b84149991f4d547b3f0f625f710750ad8d9.pdf
  • Gal, Y., McAllister, R., & Rasmussen, C. E. (2016, April). Improving PILCO with Bayesian neural network dynamics models. In Data-Efficient Machine Learning workshop, ICML: http://mlg.eng.cam.ac.uk/yarin/website/PDFs/DeepPILCO.pdf
  • Punjani, A., & Abbeel, P. (2015, May). Deep learning helicopter dynamics models. In Robotics and Automation (ICRA), 2015 IEEE International Conference on (pp. 3223-3230). IEEE: https://people.eecs.berkeley.edu/~pabbeel/papers/2015-ICRA-deep-learning-heli.pdf
  • Chua, K., Calandra, R., McAllister, R., & Levine, S. (2018). Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models. arXiv preprint arXiv:1805.12114: https://arxiv.org/abs/1805.12114

Literature for Generative Adversarial Reinforcement Learning:

  • https://arxiv.org/abs/1606.03476
  • https://hollygrimm.com/rl_gail
  • https://medium.com/@sanketgujar95/generative-adversarial-imitation-learning-266f45634e60
  • http://proceedings.mlr.press/v70/baram17a.html
  • https://arxiv.org/abs/1612.02179

Literature for Imitation Learning:

  • Pomerleau, D. A. (1989). Alvinn: An autonomous land vehicle in a neural network. In Advances in neural information processing systems (pp. 305-313): https://papers.nips.cc/paper/95-alvinn-an-autonomous-land-vehicle-in-a-neural-network
  • Giusti, A., Guzzi, J., Ciresan, D. C., He, F. L., Rodríguez, J. P., Fontana, F., … & Scaramuzza, D. (2016). A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots. IEEE Robotics and Automation Letters, 1(2), 661-667: http://rpg.ifi.uzh.ch/docs/RAL16_Giusti.pdf
  • Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., … & Zhang, X. (2016). End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316: https://arxiv.org/abs/1604.07316
  • Ross, S., Gordon, G., & Bagnell, D. (2011, June). A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 627-635): https://arxiv.org/abs/1011.0686
  • Ross, S., Melik-Barkhudarov, N., Shankar, K. S., Wendel, A., Dey, D., Bagnell, J. A., & Hebert, M. (2013, May). Learning monocular reactive uav control in cluttered natural environments. In 2013 IEEE international conference on robotics and automation (pp. 1765-1772). IEEE: https://arxiv.org/abs/1211.1690

Literature for Deep Deterministic Policy Gradient:

  • https://arxiv.org/abs/1509.02971
  • http://proceedings.mlr.press/v32/silver14.pdf
  • https://spinningup.openai.com/en/latest/algorithms/ddpg.html
  • https://spinningup.openai.com/en/latest/algorithms/td3.html
  • https://arxiv.org/abs/1802.09477

Literature for Trust-Region Policy Optimization (TRPO):

  • https://arxiv.org/abs/1506.02438
  • https://arxiv.org/abs/1502.05477
  • https://www.youtube.com/watch?v=jcF-HaBz0Vw
  • https://spinningup.openai.com/en/latest/algorithms/trpo.html
  • https://medium.com/@jonathan_hui/rl-trust-region-policy-optimization-trpo-explained-a6ee04eeeee9

Literature for Proximal Policy Optimization (PPO):

  • https://arxiv.org/abs/1707.06347
  • https://openai.com/blog/openai-baselines-ppo/
  • https://www.youtube.com/watch?v=5P7I-xPq8u8
  • https://spinningup.openai.com/en/latest/algorithms/ppo.html
  • https://arxiv.org/abs/1506.02438
  • https://arxiv.org/abs/1707.02286