Manuel Burger, MSc

"Questions you cannot answer are usually far better for you than answers you cannot question." - Yuval Noah Harari

PhD Student

E-Mail
manuel.burger@get-your-addresses-elsewhere.inf.ethz.ch
Address
ETH Zürich
Department of Computer Science
Biomedical Informatics Group
Universitätsstrasse 6
8092 Zürich
Room
CAB F53.1

With my research I want to leverage the potential of machine learning to improve health care. My focus is on representation learning in the clinical and biomedical domain.

I have obtained my Bachelor's Degree in Computer Science, followed by a Master's Degree in Data Science at ETH Zürich. I have always been fascinated by the capabilities of modern computer hardware, and thus ventured into the world of HPC during my bachelor's degree. This led me to learn about my fascination for machine learning and the powerful applications we can develop by using the computing capabilities available to us. I started to focus my path towards data science in the biomedical and health care domain. My aim is to contribute to improved health care and I thus joined the Biomedical Informatics Group at ETH as a Ph.D. Student in 2022.

My fields of interest are centered around representation learning. I am especially excited about structural priors for representation learning on biomedical data. Learning from structure enables more flexible and expressive machine learning solutions, and at the same time develop more interpretable and robust models.

As long as it's biomedical data I like to think about:

  • Time-Series Modeling
  • Self-Supervised Learning; especially for GNNs and Time-Series
  • Knowledge Extraction and Representation Learning thereof
  • Transfer Learning in the Clinical Domain
  • Graph Neural Networks
  • Natural Language Processing

Find out more on my homepage manuelburger.ch

Abstract Electronic Health Record (EHR) datasets from Intensive Care Units (ICU) contain a diverse set of data modalities. While prior works have successfully leveraged multiple modalities in supervised settings, we apply advanced self-supervised multi-modal contrastive learning techniques to ICU data, specifically focusing on clinical notes and time-series for clinically relevant online prediction tasks. We introduce a loss function Multi-Modal Neighborhood Contrastive Loss (MM-NCL), a soft neighborhood function, and showcase the excellent linear probe and zero-shot performance of our approach.

Authors Fabian Baldenweg, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted TS4H ICLR Workshop

Link DOI

Abstract The rapid expansion of genomic sequence data calls for new methods to achieve robust sequence representations. Existing techniques often neglect intricate structural details, emphasizing mainly contextual information. To address this, we developed k-mer embeddings that merge contextual and structural string information by enhancing De Bruijn graphs with structural similarity connections. Subsequently, we crafted a self-supervised method based on Contrastive Learning that employs a heterogeneous Graph Convolutional Network encoder and constructs positive pairs based on node similarities. Our embeddings consistently outperform prior techniques for Edit Distance Approximation and Closest String Retrieval tasks.

Authors Kacper Kapusniak, Manuel Burger, Gunnar Rätsch, Amir Joudaki

Submitted NeurIPS 2023 Workshop: Frontiers in Graph Learning

Link DOI

Abstract Recent advances in deep learning architectures for sequence modeling have not fully transferred to tasks handling time-series from electronic health records. In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods. Recent findings in deep learning for tabular data are now surpassing these classical methods by better handling the severe heterogeneity of data input features. Given the similar level of feature heterogeneity exhibited by ICU time-series and motivated by these findings, we explore these novel methods' impact on clinical sequence modeling tasks. By jointly using such advances in deep learning for tabular data, our primary objective is to underscore the importance of step-wise embeddings in time-series modeling, which remain unexplored in machine learning methods for clinical data. On a variety of clinically relevant tasks from two large-scale ICU datasets, MIMIC-III and HiRID, our work provides an exhaustive analysis of state-of-the-art methods for tabular time-series as time-step embedding models, showing overall performance improvement. In particular, we evidence the importance of feature grouping in clinical time-series, with significant performance gains when considering features within predefined semantic groups in the step-wise embedding module.

Authors Rita Kuznetsova, Alizée Pace, Manuel Burger, Hugo Yèche, Gunnar Rätsch

Submitted ML4H 2023 (PMLR)

Link DOI

Abstract Clinicians are increasingly looking towards machine learning to gain insights about patient evolutions. We propose a novel approach named Multi-Modal UMLS Graph Learning (MMUGL) for learning meaningful representations of medical concepts using graph neural networks over knowledge graphs based on the unified medical language system. These representations are aggregated to represent entire patient visits and then fed into a sequence model to perform predictions at the granularity of multiple hospital visits of a patient. We improve performance by incorporating prior medical knowledge and considering multiple modalities. We compare our method to existing architectures proposed to learn representations at different granularities on the MIMIC-III dataset and show that our approach outperforms these methods. The results demonstrate the significance of multi-modal medical concept representations based on prior medical knowledge.

Authors Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted ML4H 2023 (PMLR)

Link DOI

Abstract Intensive Care Units (ICU) require comprehensive patient data integration for enhanced clinical outcome predictions, crucial for assessing patient conditions. Recent deep learning advances have utilized patient time series data, and fusion models have incorporated unstructured clinical reports, improving predictive performance. However, integrating established medical knowledge into these models has not yet been explored. The medical domain's data, rich in structural relationships, can be harnessed through knowledge graphs derived from clinical ontologies like the Unified Medical Language System (UMLS) for better predictions. Our proposed methodology integrates this knowledge with ICU data, improving clinical decision modeling. It combines graph representations with vital signs and clinical reports, enhancing performance, especially when data is missing. Additionally, our model includes an interpretability component to understand how knowledge graph nodes affect predictions.

Authors Samyak Jain, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted ML4H 2023 (Findings Track)

Link DOI

Abstract In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.

Authors Yurong Hu, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice

Link DOI