Theses

Thesis and Labrotation Projects

Please check this page regularly, new topics will be added on a rolling basis. As long as there are no topics on this page yet, feel free to contact our group directly.

Available Topics

Development of Novel Out-of-Distribution Detection Algorithms

Target group: MSc Students in Computer Science or related fields

Short description: This project will develop and implement novel techniques to detect out-of-distribution features in the input data.

Background: As machine learning and artificial intelligence methods are increasingly used in sensitive applications, a need for such methods to be interpretable to humans has arisen, leading to the formation of the field of “explainable AI” (XAI).
However, most XAI methods do not address a well-defined problem and are hence difficult to benchmark. The UNIML group has started to provide problem definitions, benchmarks and performance metrics for assessing “explanation performance”.
This project will propose novel techniques to derive out-of-distribution (OOD) features for deep neural networks. In particular this project aims to detect single features which are OOD opposed to single samples.
Existing synthetic non-linear classification problems from the group will be used and extended to benchmark the proposed approach.

Required skills: Python, machine learning, statistics
Optional skills: experience with deep learning frameworks
Anticipated duration: 6 months
Contact: Stefan Haufe

Breast cancer risk prediction

Target group: MSc Students in Computer Science or related fields

Short description: This project will investigate the adaption of a recent deep neural network approach for breast cancer detection from digital mammography image data to the related task of breast cancer risk prediction.

Background: Breast cancer is one of the leading cancer-related causes of death among women. To reduce breast cancer mortality, mammography is successfully applied in practice to monitor the evolution of breast tissues over longer periods up to direct cancer detection. In digital mammography, the breast of a patient gets compressed between paddles and high-resolution images from multiple views are generated, using X-rays. Patients of certain risk profiles (mature age, family history, etc.) periodically undergo digital mammography within a preventive medical care routine (at a frequency of 2-3 years), known as breast cancer screening.

Typically, at least two independent radiologists examine the mammography images, trying to identify suspicious regions within the breast and classifying these into benign or malignant states. If a majority of radiologists agrees on malignant findings, the patient undergoes a recall procedure, including a biopsy of the suspicious tissue to diagnose cancer (or no cancer). More recently, AI has been used to support image evaluation, showing on par up to superior performance in a multiple-reader scenario, when replacing one radiologist with AI, while simultaneously reducing the radiologists’ workload.

In [1], a deep convolutional neural network architecture is deployed that operates in two stages: First, a high-capacity model predicts benign or malignancy on a pixel level from local image batches, encoding local information. Then, a lower-capacity model takes the whole images as well as the ‘heat maps’ obtained from the first stage to predict cancer of the whole breast.

Task: While breast cancer screening as described above reduces mortality through early detection and treatment of breast cancer, we would like to investigate the related task of breast cancer risk prediction, Cancer risk prediction tackles the issue of emerging cancer between two episodes of breast cancer screening, known as ‘interval cancer’. An assessment of high cancer risk in a pre-cancer state may, e.g., result in an increased screening frequency of a patient on an individual level to reduce the occurrence or severity of interval cancer.

Adapting the modeling approach of [1], where they train the pixel-level model of the first stage on proven cancer cases, matching the mammography images to their biopsy results, we propose to look into interval cancer cases instead, matching the mammography images of the last pre-cancer episode to the biopsy results from the post-cancer episode.

This master thesis is about elaborating the details of the adapted model for cancer risk prediction, including the necessary changes on the output layer as well as training of the second stage model, adaption of pre-training, and the investigation of alignment issues when combining the information of two detached screening episodes.

 

[1] Wu, Nan, et al. “Deep neural networks improve radiologists’ performance in breast cancer screening.” IEEE transactions on medical imaging 39.4 (2019): 1184-1194.

Required skills: Python, machine learning, statistics

Optional skills: experience with deep learning frameworks, medical image analysis

Anticipated duration: 6 months

Contact: Danny Panknin 

Development of novel techniques to interpret nonlinear prediction models (MSc)

Target group: MSc Students in Computer Science or related fields

Short description: This project will develop and implement novel techniques to “explain” specific classes of non-linear prediction models.

Background: As machine learning and artificial intelligence methods are increasingly used in sensitive applications, a need for such methods to be interpretable to humans has arisen, leading to the formation of the field of “explainable AI” (XAI). However, most XAI methods do not address a well-defined problem and are hence difficult to benchmark. The UNIML group has started to provide problem definitions, benchmarks and performance metrics for assessing “explanation performance”. This project will propose novel techniques to derived explanations and interpretations from nonlinear models. In particular, we will be concerned with kernel methods and/or deep neural networks. Existing non-linear benchmark problems from the group will be used to benchmark the proposed approach and guide their further refinement.

Required skills: Python, machine learning, statistics

Optional skills: experience with deep learning frameworks

Anticipated duration: 6 months

ContactStefan Haufe

Design of benchmark data to validate explainable artificial intelligence (MSc)

Target group: MSc Students in Computer Science or related fields

Short description: This project will develop synthetic ground-truth data to benchmark and validate explainable artificial intelligence methods using generative deep learning models.

Background: As machine learning and artificial intelligence methods are increasingly used in sensitive applications, a need for such methods to be interpretable to humans has arisen, leading to the formation of the field of “explainable AI” (XAI). However, most XAI methods do not address a well-defined problem and are hence difficult to benchmark. The UNIML group has started to provide problem definitions and performance metrics for assessing “explanation performance”. This project will design and validate realistic yet well-defined ground-truth data to benchmark XAI approaches according to the developed definitions and criteria. To this end, we will use state-of-the-art generative models such as generative adversarial and diffusion models. The focus will be on natural and medical images.

Required skills: Python, machine learning, statistics
Optional skills: experience with deep learning frameworks

Anticipated duration: 6 months

ContactStefan Haufe

Assigned Topics

Investigation of strategies to extend centroid-based deep clustering models to continuous learning. (MSc thesis, Wang Wang, 2024)

Short description: This project will investigate strategies to extend centroid-based deep clustering models to continuous learning.

Background: Clustering is one of the most fundamental but difficult tasks in machine learning. Indeed, one expects a model to learn something meaningful to our human experience without any help or supervision. Gaussian mixture models (GMM) are a classic framework that groups data given the Euclidean distance to some anchor points. Advancements in ML allow us to learn a GMM using a neural network (NN), which opens the door to parametric non-linear partitions of the input space. This thesis aims to extend the deep clustering framework to continuous learning. That is to study how to update a trained neural-GMM to cluster “properly” new data without forgetting what it has already learned.

We will build upon the so-called Clustering Module (CM) [1], which learns a GMM using a 2-layer autoencoder. The model relies on a Dirichlet prior to control cluster assignments. The work will consist of developing strategies to ensure the stability of existing clusters and/or centroids and manipulating the Dirichlet distribution to “open” new clusters for the unseen data, if necessary.

[1] https://arxiv.org/abs/2012.03740

Required skills: Python, machine learning, statistics

Optional skills: experience with deep learning frameworks, bayesian statistics

Anticipated duration: 6 months

Contact: Ahcène Boubekki

Prediction of invasive blood pressure from non-invasive vital signs in critical care (MSc Hichem Dhouib, 2024)

Target group: MSc Students in Computer Science and related fields, possibly Medical PhD students

Short description: Prediction of invasive blood pressure from non-invasive vital signs in critical care

Background: In intensive care, timely access to high-resolution patient data is critical to optimizing patient outcomes. Currently, the monitoring of patient parameters such as vital signs in intensive care units (ICUs) often relies on manual assessments and technologies that provide data at relatively low resolutions. To address this limitation, Charité – Universitätsmedizin Berlin is in the process of implementing the Philips Data Warehouse Connect System across more than 1,000 ICU beds. This system extension will enable the acquisition of vital signs – including heart rate, invasive blood pressure, oxygen saturation and electrocardiography (ECG) – at a high resolution of up to 500 Hz. A major challenge in current practice is the need for invasive methods to obtain reliable continuous blood pressure monitoring, which typically requires the placement of an arterial catheter. This project aims to explore alternative, less invasive techniques using high-fidelity data from photoplethysmography (PPG) or ECG devices. The goal is to develop a robust methodology that can accurately derive invasive blood pressure readings from these high-resolution data sets. This initiative not only promises to improve the safety and comfort of ICU patients by reducing the need for invasive procedures, but also aims to improve the accuracy of continuous non-invasive blood pressure monitoring. The project will involve the application of advanced signal processing techniques, machine learning algorithms, and potentially the development of new calibration models that correlate PPG and ECG data with invasive blood pressure values. In addition, this research will contribute to the broader field of digital health by facilitating real-time, high-accuracy patient monitoring. The results of this project could lead to significant advances in ICU management and treatment protocols, ultimately leading to better clinical outcomes and improved patient outcomes.

Required skills: Python programming, experience with machine learning/deep learning, strong interest in medical applications

Further desired skills: Experience with medical time series data, signal processing skills

Anticipated start date: July 2024

Anticipated duration: 6 months

ContactStefan Haufe, Akira-Sebastian Poncette

Investigating the claims and theoretical justification in Explainable Artificial Intelligence (XAI) literature. (MSc Charlotte Bodenmüller, 2024)

Target group: BSc and MSc Students in Computer Science or related fields

Short description: A systematic review into the claims made by authors in the field of XAI.

Background: The field of ‘explainable’ artificial intelligence (XAI) has produced highly acclaimed methods that seek to make the decisions of complex machine learning (ML) methods ‘understandable’ to humans, for example by attributing ‘importance’ scores to input features. Yet, a lack of formal underpinning leaves it unclear as to what conclusions can safely be drawn from the results of a given XAI method and has also so far hindered the theoretical verification and empirical validation of XAI methods, hindering the promised deployment into high-stakes domains such as medicine, law, and finance. Not only has there been a large surge in research into new XAI methods, but also secondary research reviewing many types of XAI methods as well as application papers attempting to deploy XAI into real-world scenarios.This project will form a systematic literature review of the field of XAI, analysing the claims made by authors of XAI methods, the theoretical justification they make for their method, and the empirical validation into the correctness of explanations produced, if any. Not only this, but this project will review secondary literature in order to study how such claims and justifications propagate through the field.

Required skills: machine learning, statistics

Optional skills: experience with explainable AI, keenness and proficiency in writing

Anticipated duration: 6 months

Contact: Benedict Clark, Stefan Haufe

Investigating the relationship between power and functional connectivity of brain rhythms (Lab rotation, Godwin Tetteh, 2024)

Target group: MSc Students in Computational Neuroscience or related fields

Short description: This project will study the relationsship between the power of rhythmic brain signals and the functional connectivity (coherence, Granger causality) between such signals through theoretical analyses and simulations.

Background: While it is often observed that the synchronization of brain rhythms correlates with their strengths, the relationship between the two can be much more complex. Similarly, directed and undirected functional connectivity metrics can lead to seemingly inconsistent results. The purpose of this project is to derive simple examples that illustrative the complex ways in which power and connectivity can interact.

Required skills: Programming, signal processing

Optional skills: MATLAB, Python

Anticipated duration: 6 months

Contact: Stefan Haufe

Enumeration of modes in generated data (BSc thesis, Iaroslava Novoselova 2024)

Target group: BSc/MSc Students in Computer Science or related fields

Short description: Find data patterns inside high dimensional time series data to detect bad generative model

Background:

The rise of generative models, such as MidJourney, has brought about significant advancements in the field of machine learning. These models have shown impressive capabilities in creating synthetic data with a wide range of possible applications, including data augmentation, data privacy, and data sharing. However, as generative models become more prevalent, they also raise important ethical and social issues that need to be carefully considered.

 

In this bachelor’s thesis, we aim to investigate one important aspect of generative modeling: the enumeration of possible synthetic data modes or prototypes. This process involves identifying and describing the different variations of synthetic data that can be generated by a given generative model. By enumerating these modes, we can gain a better understanding of the types of synthetic data that can be produced, which can be useful as a quality criteria for synthetic data.

 

The thesis will consist of a few components: writing an expose, which explores the existing literature for possible solutions, either applying or adapting it to the time series data, and developing own method

Required skills: strong background in statistics/data analysis/machine learning

Anticipated duration: 3 months (or more, depending on deliverables)

Development and validation of an individual head modeling pipeline for MEG source localization (BSc thesis, Paul Eschenbach, 2024)

Target group: BSc Students in Computer Science or related fields

Short description: This project will develop an individual head modeling pipeline for magnetoencephalography (MEG), and will apply it for the purpose of localizing the sources of real MEG data.

Background: Electrical volume conductor modeling of the head is an important step when it comes to localizing brain sources magnetoencephalographic (MEG) measurements. Here it is important to take the individual anatomy of the subject’s head and its relative position in the MEG scanner into account. This project will develop an individual head modeling pipeline for the Yokogawa MEG system at the PTB, and will test it using real MEG data.

Required skills: Programming experience

Optional skills: MATLAB, Python, basic linear algebra

Anticipated duration: 3 months

Contact: Stefan Haufe

Towards robust metrics of amplitude-amplitude coupling between brain areas (Lab rotation, Elsa-Henriette Harms, 2024)

Target group: MSc Students in Computational Neuroscience or related fields

Short description: This project will conduct simulations to study the influence of source mixing on estimates of amplitude-amplitude coupling (AAC) between neural time series.

Background: The analysis of electrophysiological recordings of brain activity using electroencephalography (EEG) or similar techniques promises to shed light on the working principles of the brain. In particular, measures of interaction between neural time series may provide insight on how communication between different regions is implemented in the brain. One mechanism that has been proposed is through correlation of the envelopes of distinct brain rhythms (AAC). However, ubiquitous source mixing can induce spurious AAC. While remedies have been proposed, these can be demostrated to fail in counterexamples. This project aims to characterize the ability of different metrics of AAC to distinguish true from spurious across-site interaction. It will also aim to develop novel metrics based on antisymmetrizes higher order spectra.

Required skills: Programming, signal processing

Optional skills: MATLAB, Python

Anticipated duration: 6 months

ContactStefan Haufe

Characterizing dementia types using normative models of functional brain connectivity (MSc thesis, Erfan Baradarantohidi, 2024)

Target group: MSc Students in Neuroscience or related fields

Short description: This project will analyze several large magnetoencephalography datasets comprising data of patients diagnosed with different stages and types of dementia. Robust functional connectivity estimation pipelines will be used to compare patients to previously established normative data from healthy subjects in order to identify clinically relevant clusters of patients.

Background: Several devastating aging related neurological disorders such as Alzheimer’s disease and other dementias are currently incurable and their pathophysiology is not well understood. Brain communication patterns in these disorders are likely disturbed making functional brain connectivity (FC) analysis a promising tool to derive disease and disease stage specific biomarkers. Ideally, such direct markers of brain functioning could even be of prognostic value and inspire novel interventions. In this project, we will apply validated robust pipelines for directed and undirected FC estimation to large patient MEG datasets. Comparisons to previously established normative data will be used to identify spatially and spectrally resolved FC markers that are specific to diseases and disease stages.

Required skills: Matlab, signal processing, basic statistics, interest in the pathophysiology of neurological disorders
Optional skills: Experience with M/EEG data analysis including source reconstruction and functional connectivity estimation

Anticipated duration: 6+ months

ContactStefan Haufe

Investigating the effect of whitening on "AI explanation performance" (MSc, Stoyan Karastoyanov, 2024)

Target group: MSc Students in Computer Science or related fields

Short description: This project will study the effects of various whitening and orthogonalization transforms of the input data on the “explanation performance” of so called “explainable AI” methods.

Background: As machine learning and artificial intelligence methods are increasingly used in sensitive applications, a need for such methods to be interpretable to humans has arisen, leading to the formation of the field of “explainable AI” (XAI). However, most XAI methods do not address a well-defined problem and are hence difficult to benchmark. The UNIML group has started to provide problem definitions, benchmarks and performance metrics for assessing “explanation performance”. This project will explore the ability of whitening transforms to improve the performance of popular XAI methods.

Required skills: Python, machine learning

Optional skills: experience with deep learning and XAI frameworks

Anticipated duration: 6 months

ContactStefan Haufe

Comparison of FEM and BEM models for EEG forward and inverse modeling (BSc thesis, Héctor Castaños, 2024)

Target group: BSc Students in Computer Science or related fields

Short description: This project will integrate an existing finite element (FEM) modeling pipeline (ROAST) into the open source package Brainstorm for electroencephalographic (EEG) data analysis. This will make it possilbe to create accurate volume conductor models for brain source localization. The project will also quantitatively compare the obtained accuracy with that of standard boundary element method (BEM) modeling implemented in Brainstorm.

Background: Electrical volume conductor modeling of the head is an important step when it comes to modeling the effect of transcranial electric brain stimulation (TES) as well as localizing brain sources electroencephalographic (EEG) measurements. While TES modeling typically relies on detailed finite element (FEM) solvers, software packages for EEG inverse modeling typically offer only less accurate boundary-element (BEM) solvers. This project will make an existing FEM code (ROAST) accessible for EEG inverse modeling by integrating it into the open source package Brainstorm. This will allow for a direct quantitative comparison of FEM and BEM models in terms of EEG source localization accuracy.

Required skills: Programming experience

Optional skills: MATLAB, basic linear algebra

Anticipated duration: 3 months

ContactStefan Haufe

Gesture recognition and classification using wearable sensors (BSc thesis, Resit Berkay Bozkurt 2023)

Target group: BSc/MSc Students in Computer Science or related fields

Short description: Using sensors on your wearable device (e.g. Android phone, Arduino with gyro- and accelerometer), fuse the data from gyroscope and accelerometer and create a speller.

Background: Currently there is an increasing interest in healthy lifestyle. An accessible way to improve the lifestyle is to use mobile apps and sophisticated sensors of the smartphones to gather structured information about wellbeing. In this project students will develop an application that uses wearable sensors for recognition and tracking of human activities. The scope of the human activities will depend on the project duration and preparedness of students. The simplest example would be a gesture speller, holding smartphone. A more complicated instance would be tracking of behavioural patterns and activities (eating, sleeping, working, etc). The project consists of 3 main parts: recording of the dataset, data processing and analysis, and application delivery.

Required skills: basic programmingin OOP language and basic knowledge of operating systems

Optional skills: Android programming, signal analysis

Anticipated duration: 3 months (or more, depending on deliverables)

ContactRustam Zhumagambetov

More assigned topics without description

  • Quantifizierung und Differenzierung magnetischer Nanopartikel in biologischen Proben mittels Magnetpartikelspektroskopie unter Anwendung von maschinellen Lernalgorithmen (Bachelor Jose Ordonez Muller, 2024)
  • Design of benchmark data to validate explainable artificial intelligence (Master Luca Matteo Cornils, 2024)
  • A Deep Reinforcement Learning Approach to Cryptocurrency Portfolio Opimization (Bachelor Silas Steger, 2024)
  • Extension of a decomposition method for bispectra and application to EEG data (Master Tien Dung Nguyen, 2024)
  • Unsupervised Methods for German Medical Information Retrieval: Datasets, Models and Evaluation (Master Maurice Walny, 2024)
  • Machine Learning Approaches to Identify Comorbid Psychiatric Disorders in Children and Adolescents Based on Structural Magnetic Resonance Imaging (Master Damian Jaspar, 2024)
  • Estimating Signal Time-Delays under Mixed Noise Influence with Novel Cross- and Bispectrum Methods (Master Tin Jurhar, 2023)
  • Spectral and functional connectivity patterns of focal seizures from subdural electroencephalography (Master Margarita Sison, 2023)
  • Translating connectivity measures into Python (lab rotation Thomas Binns, 2022)
  • Interpretability of linear models in machine learning (lab rotation Celine Budding, 2022)
  • Assessing the Effect of MRI Defacing on Individual Forward and Inverse Modelling (lab rotation Angela Mitrovska, 2022)
  • Analysing the effect of age, gender and handedness on Healthy Brain Network resting state-EEG dataset using normative modeling (Master Nikita Agarwal, 2021)
  • Evaluating interpretability methods on structural brain MRI data with synthetic lesions (Master Celine Budding, 2021)
  • Bispectral delay estimation (lab rotation Ano Toro, 2021)
  • Machine learning modeling for a decision support system providing health professionals with critical information about COVID-19 positive patients on intensive care units at the Charité Berlin (Master Niklas Giesa, 2020)
  • Effect of head volume conduction on graph measures in source-reconstructed EEG networks (Master Subhi Arafat, 2019)