Meta Reinforcement Learning for Optimization of Electric Circuit Parameters

The design and optimization of electric circuits is currently still an experience driven approach. Especially in the case of resonant systems, the strong non-linear system behavior requires a lot of optimization loops during the design process to maximize power transfer and efficiency. A simple example of the above systems is the boost converter. Recently, methods using genetic algorithms (GAs) like NSGA2 [1] have demonstrated the potential for optimizing and tuning such circuits [2]. They are, however, limited by the fact that newly discovered (optimal) parameters for one electric circuit are hardly – if even at all – transferable to other circuits. One would therefore have to solve each electric circuit problem individually. At the same time, circuit optimization methods based on Reinforcement Learning (RL) like L2DC [3] have emerged, which show similar potential, but also do not solve the transfer problem directly. They are also prone to not converge to satisfying solutions.

Meta Learning, or learning-to-learn, instead of solving a task from scratch, aims to improve the learning algorithm itself based on past learning experience [4]. Applied in the domain of RL (also known as Meta RL), it promises better task performance as well as the potential to alleviate the computational burden of training task-related agents. In the domain of circuit optimization, this idea has yet to be investigated thoroughly.

Current RL agents fail to optimize varying electric circuits. In this master thesis, the potential of integrating Meta Learning strategies into state-of-the-art RL approaches is to be investigated. Since it is difficult and time consuming to train RL agents on simulated electric circuits, the first goal of the thesis is to develop a suitable abstraction of the optimization problem described above and implement it as a Markov Decision Process (MDP) solvable by RL. The second goal is to identify and apply a Meta RL approach which is capable of solving this newly formulated optimization problem. These strategies can range from learning from a distribution of tasks [5] to meta learning the cumulative reward of the RL agent [6]. The master student can freely choose which category of strategies to follow.

In order to empirically gain insight on the advantages of the Meta Learning approaches, the Meta RL agent is to be benchmarked against classical optimization algorithms like stochastic
gradient descent as well as basic RL agents.

Required Skills:

Basic knowledge of reinforcement learning and machine learning
Python

Supervisors: Sebastian Rietsch, Georg Kruse, Christopher Mutschler

References

https://www.tandfonline.com/doi/abs/10.1163/156939308784160703
https://ieeexplore.ieee.org/document/8450100
https://arxiv.org/pdf/1812.02734.pdf
https://arxiv.org/pdf/2004.05439.pdf
https://arxiv.org/pdf/1703.03400.pdf
https://arxiv.org/pdf/1805.09801.pdf.