Abhinav Singh: Safe Imitation Learning for Beam Tracking

The envisioned transition to 5G and 6G technologies have started to transform the properties of established communication networks. In the core of this transformation lies the capability of Base Stations (BSs) to utilize directional beams for transmission towards User Equipment (UE) served by the network. Here, a (large) library of pre-determined directional beams (called the codebook) is available, and the question is which beam needs to serve a specific UE at any point in time.

Of course, this problem becomes more complicated when the UEs are moving in given trajectories, in which case the BS needs to decide in real-time which beam needs to be used for tracking the UE – a problem called Beam Tracking [1]. Several approaches have been proposed, but among the most promising ones is Imitation Learning (IL) [2]. Here, we assume that an expert (i.e., a module that can determine the best serving beam by utilizing additional information) is available, but invoking it is expensive. Our goal is to distill the decision strategy of the expert in a student module (usually a neural network called a controller or a policy) that has low inference complexity.

This teacher-student training setup facilitates the student to make serval mistakes that the expert can retrospectively correct, in an algorithm first referred to as Dagger in literature [2]. This implies though that there is a performance loss until the student network starts to approximate the expert performance, which could be detrimental for the network performance. To alleviate this problem, a plethora of safe imitation learning approaches have been proposed [3-6].

The Masters Project will consist of implementing a modular and extendable library of safe imitation learning algorithms and evaluating them on a Beam Tracking scenario defined within 6G SENTINEL research project.

Main contact at the department: Dr.-Ing. Christopher Mutschler

External point of contact: Dr. Georgios Kontes (Self-Learning Systems Group, Fraunhofer IIS, Nuremberg)


  1. Giordani, M., Polese, M., Roy, A., Castor, D., & Zorzi, M. (2018). A tutorial on beam management for 3GPP NR at mmWave frequencies. IEEE Communications Surveys & Tutorials, 21(1), 173-196.
  2. Ross, S., Gordon, G., & Bagnell, D. (2011, June). A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics (pp. 627-635). JMLR Workshop and Conference Proceedings.
  3. Zhang, J., & Cho, K. (2016). Query-efficient imitation learning for end-to-end autonomous driving. arXiv preprint arXiv:1605.06450.
  4. J. Menda, K., Driggs-Campbell, K., & Kochenderfer, M. J. (2019, November). Ensembledagger: A bayesian approach to safe imitation learning. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5041-5048). IEEE.
  5. Menda, K., Driggs-Campbell, K., & Kochenderfer, M. J. (2017). Dropoutdagger: A bayesian approach to safe imitation learning. arXiv preprint arXiv:1709.06166.
  6. Cronrath, C., Jorge, E., Moberg, J., Jirstrand, M., & Lennartson, B. (2018). BAgger: A Bayesian algorithm for safe and query-efficient imitation learning. In Machine Learning in Robot Motion Planning–IROS 2018 Workshop.
  7. Raffin, A., Hill, A., Ernestus, M., Gleave, A., Kanervisto, A., & Dormann, N. (2019). Stable baselines3.

Work Packages

WP1IL Basics: In the beginning of the Thesis, familiarization with DAgger-like safe imitation learning algorithms takes place.
WP2Safe IL Algorithms Implementation: An implementation of the basic Dagger algorithm [2], as well as the most common safety-enabled extensions [3-6] is provided in pytorch. The resulting library:should be modular;should include helper functions for the generation of training datasets from any type of expert (defined as a function);should support both discrete and continuous actions; should support FFN, CNN and LSTM architectures for the policies; andshould provide functionality to use the learned policy within the stable-baselines-3 framework [7]
WP3Safe IL for Beam Tracking: Application of the defined algorithms for the Beam Tracking problem. Here, the simulator and already available OpenAI Gym environment from 6G SENTINEL project will be utilized. The most suitable policy type (FFN or CNN with or without frame stacking vs LSTM) will be determined and a comprehensive hyper-parameter search/tuning process will be performed to ensure the final performance of the available algorithms.
WP4Evaluation: The tuned algorithms will be evaluated in different UE trajectories and under different values of the safety threshold. A systematic analysis will indicate the best combinations of policy types and safe IL algorithms for different types of UE movements. 
WP5Documentation: The implementation details and assumptions will be clearly documented alongside the delivered pytorch code. A comprehensive usage manual should also accompany the code. Finally, a documentation of the entire work in a paper-style report, as well as in a final presentation will be delivered.

Formal Criteria

  • Code (including documentation)
  • Report in the form of a paper
  • Final presentation