Student Projects & Theses

Openings for student projects and theses

We have several opportunities for students to work with us on the latest research in Reinforcement Learning.

The projects are listed below, but you are also encouraged to propose your own topic and collaborate with us. In the latter case, please also go through our open projects and publications to see if the topic fits in with our expertise and interests before e-mailing. For example, we get many e-mails about LLM topics, however, LLMs are not our strongest focus and you should probably approach other chairs first regarding LLM topics.

If you are interested, please send us an email to nrprojects@informatik.uni-freiburg.de with your interests, your transcript of records and optionally your CV.

To ensure your inquiry reaches the right people quickly, please include the type of project or position you are interested in (e.g., Bachelor thesis, Master thesis, student research project, HiWi) directly in the email subject line.
Clear subject lines help us route requests efficiently and respond faster. Messages without a clear academic context may take longer to process.

Current Project Openings

If none of the listed projects are of interest, feel free to contact us for new projects via the address mentioned above! Please also include your fields of interest and prior knowledge, as it helps us find a suitable supervisor.

Inverse Reinforcement Learning for Autonomous Driving

Position type: Master's project/thesis

Description: Modern autonomous driving research currently follows two main directions. On one side, high-end systems like GigaFlow are powerful but closed-source, requiring experts to manually write and tune hundreds of complex rules to make the car drive well. On the other side, open-source projects like PufferDrive are accessible to everyone but use very simple rules, which often makes the car drive in a "robotic" way that doesn't feel natural or human. In this project, we would like to investigate Inverse Reinforcement Learning (IRL) to bridge this gap. Instead of a human engineer spending months writing rules, the computer watches real people drive and "learns" what a good driving score should look like. By staying within the Reinforcement Learning framework, our system remains resilient in unpredictable, 'out-of-distribution' scenarios while adopting the smooth, intuitive driving style of a human.

In this project, we aim to:

Learn a reward function resembling human behaviour
Train a PPO agent on the learned reward function and compare it RL agents trained on hand-crafted reward functions

References:

Cusumano-Towner, M., Hafner, D., Hertzberg, A., Huval, B., Petrenko, A., Vinitsky, E., ... & Koltun, V. (2025). Robust autonomy emerges from self-play. arXiv: https://arxiv.org/abs/2502.03349
PufferDrive Repository: https://github.com/Emerge-Lab/PufferDrive
Arora, S., & Doshi, P. (2021). A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297, 103500. Elsevier: https://www.sciencedirect.com/science/article/pii/S0004370221000515
Kalweit, G., Kalweit, M., Alyahyay, M., Jaeckel, Z., Steenbergen, F., Hardung, S., ... & Boedecker, J. (2022). NeuRL: closed-form inverse reinforcement learning for neural decoding. arXiv: https://arxiv.org/abs/2204.04733

Contact: Anna Rothenhäusler, Daniel Jost

Status: Open

Previous Projects

Being aware of the Uncertainty - Prediction of Occupancies in Occlusions

Position type: Master's project/thesis

Description: In autonomous driving, occlusions are critical because they hide potentially hazardous objects - such as pedestrians, cyclists or other vehicles - from the sensors' view, which can lead to incorrect decisions or delayed reactions. Understanding and predicting what might be behind occlusions allows autonomous systems to make safer, more cautious maneuvers, reducing the risk of collisions in complex, dynamic environments.

In this project, we aim to:

improve state of art algorithms to predict occupancies in occlusions
seek to assess the uncertainty associated with occluded areas to distinguish between known, unknowns and truly unpredictable scenarios

References:

Lange, B., Li, J., & Kochenderfer, M. J. (2024, May). Scene informer: Anchor-based occlusion inference and trajectory prediction in partially observable environments. In 2024 IEEE International Conference on Robotics and Automation (ICRA) (pp. 14138-14145). IEEE. arXiv: arXiv:2309.13893
Christianos, F., Karkus, P., Ivanovic, B., Albrecht, S. V., & Pavone, M. (2023, May). Planning with occluded traffic agents using bi-level variational occlusion models. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 5558-5565). IEEE. arXiv: arXiv:2210.14584

Contact: Anna Rothenhäusler

Status: Closed

Convex Conceptual Regions in Neuronal Representations

Description: The idea of convex decision regions were initially proposed by Lenka et al., and have been applied in the analysis of characterizations of deep neural networks. We would like to apply this idea to analyze biological neural networks. The goal of this project would be using this method to analyze different published datasets of recorded neural activities, so that we can have an understanding about whether there are similar characteristics between artificial and biological neural networks. Besides, we may also introduce extension or adaptation to this method during this project, such that this method can be made as a standard technique for data analysis and more researchers can benefit from it.

References:

Tetkova, Lenka, et al. "On convex conceptual regions in deep network representations." CoRR (2023).

Contact: Hao Zhu

Status: Closed

Benchmarking Solution Methods for the Fitting Problem of Discrete Latent Factor Models

Description: Discrete latent factor models (DLFMs) are widely used in various domains such as machine learning, economics, neuroscience, psychology, etc. Currently, fitting a DLFM to some dataset relies on a customized solver for individual models, which requires lots of effort to implement and is limited to the targeted specific instance of DLFMs. In our previous work, we introduced a general framework for solving the DLFM fitting problem via multi-convex programming, which can be applied to a wide range of DLFMs by only adapting several lines of code. The goal of this project is to demonstrate the application of our framework in real-world datasets, and compare the performance between our solution method and some existing methods.

References:

Zhu, Hao, et al. "Multi-convex Programming for Discrete Latent Factor Models Prototyping." arXiv preprint arXiv:2504.01431 (2025).
Jha, Aditi, Zoe C. Ashwood, and Jonathan W. Pillow. "Active learning for discrete latent variable models." Neural computation 36.3 (2024): 437-474.

Contact: Hao Zhu

Status: Closed

Reward Augmentation in Reinforcement Learning

Position type: Master's project/thesis

Description: Reinforcement Learning (RL) especially from sparse rewards becomes increasingly challenging as environments grow in complexity. Without frequent feedback, agents struggle to learn effective policies, leading to slow convergence or even failure to learn. A common approach to mitigate this issue is reward augmentation—modifying the reward function to provide additional learning signals.

In this project, we explore various reward augmentation techniques, such as:

Exploration bonuses (e.g., count-based, entropy-based, or curiosity-driven rewards)
Intrinsic motivation (e.g., novelty-seeking objectives)

Demonstration-based rewards (e.g., rewards based on similarity to expert demonstrations)

The goal of this project is to implement and benchmark multiple reward augmentation strategies across different RL environments. Students will gain hands-on experience in reinforcement learning, working with state-of-the-art algorithms and evaluating their effectiveness in improving learning efficiency.

References:

N. Vieillard, O. Pietquin, and M. Geist, “Munchausen Reinforcement Learning,” Nov. 04, 2020, arXiv: arXiv:2007.14430. Accessed: Mar. 06, 2023. [Online]. Available: http://arxiv.org/abs/2007.14430
D. Pathak, P. Agrawal, A. A. Efros, and T. Darrell, “Curiosity-driven Exploration by Self-supervised Prediction,” May 15, 2017, arXiv: arXiv:1705.05363. Accessed: Oct. 10, 2024. [Online]. Available: http://arxiv.org/abs/1705.05363
S. Lobel, A. Bagaria, and G. Konidaris, “Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning,” Jun. 05, 2023, arXiv: arXiv:2306.03186. doi: 10.48550/arXiv.2306.03186.
M. C. Machado, M. G. Bellemare, and M. Bowling, “Count-Based Exploration with the Successor Representation,” Nov. 26, 2019, arXiv: arXiv:1807.11622. Accessed: May 23, 2024. [Online]. Available: http://arxiv.org/abs/1807.11622

Contact: Erfan Azad

Status: Closed

Patient Health Trajectory Prediction Using a Transformer Model

Position type: Master's project/thesis

Description: A recent study on the Enhanced Transformer for Health Outcome Simulation (ETHOS) demonstrated how tokenized electronic health records (EHRs) can be used to train a transformer model for next-token prediction, enabling modeling of patient health trajectories and supporting zero-shot prediction for clinical tasks such as inpatient mortality or readmission risk.

This project will replicate the ETHOS results using the MIMIC-IV dataset, with an additional focus on implementing delirium prediction. Other goals may include refining the EHR tokenization process to improve prediction accuracy, replacing the GPT-2 architecture used with a more modern LLM architecture, investigating the performance of predicting rare tokens, and visualizing attention weights to improve explainability. Finally, we aim to extend the model's ability to simulate personalized treatment outcomes by exploring alternative health trajectories based on different medication and intervention scenarios relevant to delirium.

Contact: Lisa Graf

Status: Closed

References: Renc, P., Jia, Y., Samir, A.E. et al. Zero shot health trajectory prediction using transformer. npj Digit. Med. 7, 256 (2024). https://doi.org/10.1038/s41746-024-01235-0

Reinforcement Learning for RoboCup Soccer Keep Away

Position type: project

Description: Unlike other RoboCup leagues, the Small Size League (SSL) has seen limited application of reinforcement learning (RL) techniques. This project aims to explore the potential benefits of utilizing RL methods in the SSL by focusing on a specific task: a "keep-away" scenario. We propose to demonstrate how state-of-the-art RL algorithms can be effectively employed to enhance team performance in dynamic, real-time environments.

Your task is to set up a simulation environment and train a keep-away task similar to the one in the paper references below. For more information please refer to the attached PDF.

Contact: Lisa Graf

More information: PDF

References: Stone, P., Sutton, RS, Singh, S. (2001). Reinforcement Learning for 3 vs. 2 Keepaway. In: Stone, P., Balch, T., Kraetzschmar, G. (eds) RoboCup 2000: Robot Soccer World Cup IV. RoboCup 2000. Lecture Notes in Computer Science(), vol 2019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45324-5_23

Status : closed

Integration of process models (biological models of forests) into neural networks

Position type: Project

Description : When studying complex phenomena in nature, a simplified version of a system is often used to describe the underlying effects and processes. In forest science, this is done through process models (PM), which combine empirical measurements with theoretical understanding of the underlying processes. However, PMs can represent complex phenomena incompletely or oversimplified. Neural networks (NNs) can outperform PMs with more flexible representations when large data sets are available. For small and sparse datasets, NNs can benefit from the integration of domain knowledge such as PMs.
Specifically for time series, LSTMs have been successfully combined with prior knowledge.
In this project, we would like to investigate whether we can improve the performance by combining new model architectures such as Mamba or Kolmogorov–Arnold Networks (KAN) with PMs. Several approaches could be explored:

Use Mamba or other new model architectures instead of the current NN:

Combine the PM model into the loss function to help guide the NNs learning processes

Combine PM model directly into the architecture of Mamba or of other new model types, eg for Mamba:

Integrate the PM into the Mamba architecture by using Mamba's gated mechanism to evaluate the value of the PM model's performance for time series prediction.

Contact: Hanne Raum

Reinforcement Learning development based on an excavator

Position type: project

Description: Experienced excavator operators can move the working arm along desired trajectories very precisely and quickly, so the demands on the automation functions are also correspondingly high.

In this project, the trajectory tracking problem for hydraulic cylinders of an excavator will be solved using an (offline) reinforcement learning approach. Both model-based and model-free approaches will be considered and compared. In model-based, either a simplified physics-based or a data-driven model of the excavator is used as a proxy model for training the agent, prior to deploying it on the real excavator. Model-free, on the other hand, directly makes use of the data obtained from the real excavator, without the need of an intermediate model. Important aspects to be addressed are the robustness and transferability of the approaches, since the ideal solution is robust to minor parameter changes and easily transferable to different excavators.

The project includes the following essential topics:

Research the state of the art regarding the application of offline reinforcement learning, comprising both model-based and model-free concepts [1,2]
Improve the robustness and efficiency of current MPC methods, based on the techniques [3]
Test on real-world excavator in cooperation with Bosch Research if the method works well in simulation

Contact: Yuan Zhang

References:

[1] A General Approach for the Automation of Hydraulic Excavator Arms Using Reinforcement Learning, https://ieeexplore.ieee.org/document/9743573

[2] Learning Excavation of Rigid Objects with Offline Reinforcement Learning, https://arxiv.org/abs/2303.16427

[3] Latent Linear Quadratic Regulator for Robotic Control Tasks, https://arxiv.org/abs/2407.11107v1

Data Evaluation on the Intraoperative Heart-Lung Machine in Pediatric and Adult Cardiac Surgery

Description : The Heart Center at the University Hospital of Freiburg is conducting research to optimize heart-lung machine (HLM) therapy, which is crucial for peri- and post-operative patient care. More than 600 standardized intraoperative data sets are available for evaluation since 2022. In collaboration with cardiovascular technology, pediatric cardiology and informatics, we're looking for a student research assistant to analyze these data. Your role will involve on-site processing at the University Hospital, including attending heart surgery alongside specialists.

More information : German , English

Contact : Lisa Graf

Status : closed

Mitigating Extrapolation Error In Offline Inverse Reinforcement Learning

Description : Offline Inverse Reinforcement Learning aims to learn a reward function and its corresponding policy from previously collected expert demonstrations. Offline (Deep) RL algorithms use neural networks to approximate the true value functions (Q(s,a) or V(s)) and hence are prone to extrapolation error when estimating the value of out-of-distribution states (states that are not seen in the demonstrations). To combat these two common groups of strategies are employed within the offline RL literature: 1) Restrict the policy to stay close to the demonstrations 2) Restrict the value function to have lower values for out-of-distribution states. In the Inverse Reinforcement setting, we have control over the reward function. In this project, we would like to investigate possible modifications to the reward function such that the derived Value functions and Policies behave well for out-of-distribution states. This can be done by applying the offline RL techniques, used for restricting the policy or value function, to the reward function, as well as developing novel methods to achieve this goal.

Contact : Erfan Azad

Status : full

Context-aware Reinforcement Learning using Time-series Transformer

Description Generalization to different tasks is always a challenge to reinforcement learning. Sometimes, a small change in the environment could drastically influence the performance of the policy learned through reinforcement learning. We then need to train an agent which can learn the context/environment change by itself and thus, be able to solve a set of similar tasks. Transformer has been widely used in CV and NLP domains. Recently, there are also a few new models designed for time-serial data. In RL, we also often have time-serial data. Therefore, it's potentially beneficial if we use Transformer model to figure out the context of the environment.

Contact Baohe Zhang

Status full

Benchmarking Constrained Reinforcement Learning Algorithms

Description Constrained Reinforcement Learning is developed for solving tasks which have not only the reward function, but also a set of constraints to follow. As a new field, there isn't yet a benchmark that has compared the performance of algorithms in a scientific manner. Therefore, a new benchmark may be a good move to push the community forward. In this project, you would expect to re-implement some constrained RL algorithms and design a new environment with a set of tasks to compare these algorithms.

Contact Baohe Zhang

Status full

Vision Transformers for efficient policy learning

Description Learning strategies from raw videos is often infeasible in real world robotics, as current approaches require large amounts of training data. Extracting object keypoints, can make training significantly faster, unlocking a plethora of interesting tasks. However, they currently require specialized pretraining.
Using vision transformers can remove the need for specialized training and thus make the technique widely available.
In this project, the student(s) first evaluate the keypoint quality for state-of-the-art methods and then extend the technique to more challenging situations.
Hands-on policy learning on a real robot is possible and encouraged.

Contact Jan Ole von Hartz

Keypoints for efficient policy learning

Description As in the project above, we use object keypoints to learn strategies more efficiently.
In this project, the student(s) combine object keypoints with the novel SAC-GMM algorithm for policy learning on a real robot.

Contact Jan Ole von Hartz

Reinforcement Learning for Spatial Graph Design

Description In this project, the development of a Reinforcement Learning agent for the design of spatial graphs is to be explored ( more details ).

Status full

Monte Carlo Tree Search for Antibody Design

Description In this project, we want to utilize Monte Carlo Tree Search methods for the design of antibodies in a simulation ( more details ).

Status full

Uncertainty-driven offline model-based RL

Description In this project, the development and usage of world-models in combination with uncertainty estimations for offline reinforcement learning is to be explored ( more details ).

State full

Application of Recurrent Neural Network in Autonomous Driving

Description The state observation is sometimes noisy and partially observed in autonomous driving, which is challenging to solve with usual RL architectures. The recurrent neural network (RNN) is a simple and potential representation for this partial observation. In this project, students are encouraged to explore the usage of RNN in autonomous driving applications.

State full

Autoinflammatory Disease Treatment Recommendation

Description In cooperation with the foundation Rhumatismes-Enfants-Suisse , we develop algorithms for autoinflammatory disease treatment recommendation. The project mainly focuses on unsupervised deep learning, and depending on the progress, on basic deep reinforcement learning ( more details ).

Contact Maria Huegle

State full

High-Level Decision Making in Autonomous Driving

Description We develop deep reinforcement learning algorithms for autonomous lane changes using the open-source traffic simulator SUMO . We focus on various aspects, for example on mixed action spaces, constraints and including predictions of traffic participants.

Contact Gabriel Kalweit and Maria Hügle

State full

Machine Learning for Disease Progression Prediction in Rheumatoid Arthritis

Description In cooperation with the University Hospital in Lausanne, we develop algorithms to predict the disease progression in arthritis based on the Swiss Quality Management (SCQM) database, including lab values, medication, clinical data and patient reported outcomes.

Contact Maria Huegle

State full

Unsupervised Skill Learning from Video

Description In his thesis, Markus Merklinger introduces a model to leverage information from multiple label-free demonstrations in order to yield a meaningful embedding for unseen tasks. A distance measure in the learned embedding space can then be used as a reward function within a reinforcement learning system.

Contact Oier Mees and Gabriel Kalweit

Unsupervised Learning for Early Seizure Detection

In cooperation with the Epilepsy Center in Freiburg, we develop unsupervised learning algorithms to detect epileptic seizures based on intracranial EEG (EcoG) data .

Contact Maria Huegle