Manuel Burger, MSc

"Questions you cannot answer are usually far better for you than answers you cannot question." - Yuval Noah Harari

PhD Student

E-Mail
manuel.burger@get-your-addresses-elsewhere.inf.ethz.ch
Address
ETH Zürich
Department of Computer Science
Biomedical Informatics Group
Universitätsstrasse 6
8092 Zürich
Room
CAB F53.1

Driven to solve challenging health care problems with machine learning solutions.

My research centers on developing foundation models for healthcare and creating the large-scale data resources necessary to train them.

I am particularly interested in how foundation models can generalize across diverse clinical settings and enable new capabilities for understanding complex patient data. As part of this work, I have led the development of ICareFM, a foundation model for intensive care trained on hundreds of thousands of patient stays from hospitals worldwide.

A core aspect of my work involves bridging the gap between cutting-edge machine learning research and real-world clinical applications. I am actively involved as a machine learning lead in prospective clinical studies at Inselspital Bern, including the BEACON interventional trial, where we are deploying AI systems directly into ICUs to support clinical decision-making.

Looking forward, I am exploring how to make foundation models more accessible and interactive for clinicians through conversational AI interfaces — working toward digital patient twins that enable intuitive exploration of complex temporal health data.

I hold a Bachelor's degree in Computer Science and a Master's degree in Data Science from ETH Zürich, where I joined the Biomedical Informatics Group as a PhD student in 2022.

Find out more on my homepage manuelburger.ch

Abstract Acute hypoxemic respiratory failure (RF) occurs frequently in critically ill patients and is associated with substantial morbidity, mortality and resource use. We developed a comprehensive machine-learning–based monitoring system to support ICU physicians in managing RF through early detection, continuous monitoring, assessment of extubation readiness, and prediction of extubation failure (EF). In study patients, the model predicted 80% of RF events with 45% precision, identifying 65% of events more than 10 hours before, significantly outperforming standard clinical monitoring based on oxygenation index. The model was successfully validated in an external ICU cohort. We also demonstrated how predicted EF risk could help prevent extubation failure and unnecessarily prolonged ventilation. Lastly, we illustrated how prediction of RF risk, along with ventilator need and extubation readiness, helped ICU resource planning for mechanical ventilation. Our model predicted ICU-level ventilator demand 8–16 hours ahead, with a mean absolute error of 0.4 ventilators per 10 patients.

Authors Matthias Hüser, Xinrui Lyu, Martin Faltys, Alizée Pace, David Berger, Marine Hoche, Stephanie L. Hyland, Hugo Yèche, Manuel Burger, Tobias M. Merz, Gunnar Rätsch

Submitted npj Digital Medicine

Link

Abstract Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

Authors Fedor Sergeev, Manuel Burger, Polina Leshetkina, Vincent Fortuin, Gunnar Rätsch, Rita Kuznetsova

Submitted ML4H 2025 (PMLR)

Link

Abstract Intensive care departments generate vast multivariate time series data capturing the dynamic physiological states of critically ill patients. Despite advances in AI-driven clinical decision support, existing models remain limited. They are tailored to specific conditions or single institutions and require extensive adaptation for new settings. To make such generalization feasible, we introduce ICareFM, a novel foundation model for intensive care, trained on a harmonized dataset of unprecedented scale. The dataset contains 650,000 patient stays, accumulating more than 4,000 patient years of data, and over one billion measurements from hospitals in the US, several European countries, and China. ICareFM employs a novel self-supervised time-to-event objective that extracts robust patient representations from noisy, irregular, multivariate time series. As a result, ICareFM can generalize to new tasks and beyond its training distribution, a property we demonstrate through evaluations in a range of out-of-distribution scenarios, including transfer to unseen hospitals and zero-shot inference on previously unobserved tasks. ICareFM consistently outperforms conventional machine learning models and recent foundation model baselines, demonstrating strong generalization, improved data efficiency, and the ability to generate interpretable forecasts. These results establish ICareFM as a scalable and adaptable foundation model for critical care time series, enabling zero-shot clinical prediction and working towards the development of digital patient twins for precision medicine.

Authors Manuel Burger, Daphné Chopard, Malte Londschien, Fedor Sergeev, Hugo Yèche, Rita Kuznetsova, Martin Faltys, Eike Gerdes, Polina Leshetkina, Peter Bühlmann, Gunnar Rätsch

Submitted medRxiv

Link DOI

Abstract Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Authors Manuel Burger, Fedor Sergeev, Malte Londschien, Daphné Chopard, Hugo Yèche, Eike Gerdes, Polina Leshetkina, Alexander Morgenroth, Zeynep Babür, Jasmina Bogojeska, Martin Faltys, Rita Kuznetsova, Gunnar Rätsch

Submitted Best Paper @ NeurIPS AIM-FM Workshop 2024

Link DOI

Abstract Machine learning applications hold promise to aid clinicians in a wide range of clinical tasks, from diagnosis to prognosis, treatment, and patient monitoring. These potential applications are accompanied by a surge of ethical concerns surrounding the use of Machine Learning (ML) models in healthcare, especially regarding fairness and non-discrimination. While there is an increasing number of regulatory policies to ensure the ethical and safe integration of such systems, the translation from policies to practices remains an open challenge. Algorithmic frameworks, aiming to bridge this gap, should be tailored to the application to enable the translation from fundamental human-right principles into accurate statistical analysis, capturing the inherent complexity and risks associated with the system. In this work, we propose a set of fairness impartial checks especially adapted to ML early-warning systems in the medical context, comprising on top of standard fairness metrics, an analysis of clinical outcomes, and a screening of potential sources of bias in the pipeline. Our analysis is further fortified by the inclusion of event-based and prevalence-corrected metrics, as well as statistical tests to measure biases. Additionally, we emphasize the importance of considering subgroups beyond the conventional demographic attributes. Finally, to facilitate operationalization, we present an open-source tool FAMEWS to generate comprehensive fairness reports. These reports address the diverse needs and interests of the stakeholders involved in integrating ML into medical practice. The use of FAMEWS has the potential to reveal critical insights that might otherwise remain obscured. This can lead to improved model design, which in turn may translate into enhanced health outcomes.

Authors Marine Hoche, Olga Mineeva, Manuel Burger, Alessandro Blasimme, Gunnar Ratsch

Submitted Proceedings of Machine Learning Research

Link

Abstract This study advances Early Event Prediction (EEP) in healthcare through Dynamic Survival Analysis (DSA), offering a novel approach by integrating risk localization into alarm policies to enhance clinical event metrics. By adapting and evaluating DSA models against traditional EEP benchmarks, our research demonstrates their ability to match EEP models on a time-step level and significantly improve event-level metrics through a new alarm prioritization scheme (up to 11% AuPRC difference). This approach represents a significant step forward in predictive healthcare, providing a more nuanced and actionable framework for early event prediction and management.

Authors Hugo Yèche, Manuel Burger, Dinara Veshchezerova, Gunnar Rätsch

Submitted CHIL 2024

Link

Abstract Electronic Health Record (EHR) datasets from Intensive Care Units (ICU) contain a diverse set of data modalities. While prior works have successfully leveraged multiple modalities in supervised settings, we apply advanced self-supervised multi-modal contrastive learning techniques to ICU data, specifically focusing on clinical notes and time-series for clinically relevant online prediction tasks. We introduce a loss function Multi-Modal Neighborhood Contrastive Loss (MM-NCL), a soft neighborhood function, and showcase the excellent linear probe and zero-shot performance of our approach.

Authors Fabian Baldenweg, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted TS4H ICLR Workshop

Link DOI

Abstract The rapid expansion of genomic sequence data calls for new methods to achieve robust sequence representations. Existing techniques often neglect intricate structural details, emphasizing mainly contextual information. To address this, we developed k-mer embeddings that merge contextual and structural string information by enhancing De Bruijn graphs with structural similarity connections. Subsequently, we crafted a self-supervised method based on Contrastive Learning that employs a heterogeneous Graph Convolutional Network encoder and constructs positive pairs based on node similarities. Our embeddings consistently outperform prior techniques for Edit Distance Approximation and Closest String Retrieval tasks.

Authors Kacper Kapusniak, Manuel Burger, Gunnar Rätsch, Amir Joudaki

Submitted NeurIPS 2023 Workshop: Frontiers in Graph Learning

Link DOI

Abstract Recent advances in deep learning architectures for sequence modeling have not fully transferred to tasks handling time-series from electronic health records. In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods. Recent findings in deep learning for tabular data are now surpassing these classical methods by better handling the severe heterogeneity of data input features. Given the similar level of feature heterogeneity exhibited by ICU time-series and motivated by these findings, we explore these novel methods' impact on clinical sequence modeling tasks. By jointly using such advances in deep learning for tabular data, our primary objective is to underscore the importance of step-wise embeddings in time-series modeling, which remain unexplored in machine learning methods for clinical data. On a variety of clinically relevant tasks from two large-scale ICU datasets, MIMIC-III and HiRID, our work provides an exhaustive analysis of state-of-the-art methods for tabular time-series as time-step embedding models, showing overall performance improvement. In particular, we evidence the importance of feature grouping in clinical time-series, with significant performance gains when considering features within predefined semantic groups in the step-wise embedding module.

Authors Rita Kuznetsova, Alizée Pace, Manuel Burger, Hugo Yèche, Gunnar Rätsch

Submitted ML4H 2023 (PMLR)

Link DOI

Abstract Clinicians are increasingly looking towards machine learning to gain insights about patient evolutions. We propose a novel approach named Multi-Modal UMLS Graph Learning (MMUGL) for learning meaningful representations of medical concepts using graph neural networks over knowledge graphs based on the unified medical language system. These representations are aggregated to represent entire patient visits and then fed into a sequence model to perform predictions at the granularity of multiple hospital visits of a patient. We improve performance by incorporating prior medical knowledge and considering multiple modalities. We compare our method to existing architectures proposed to learn representations at different granularities on the MIMIC-III dataset and show that our approach outperforms these methods. The results demonstrate the significance of multi-modal medical concept representations based on prior medical knowledge.

Authors Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted ML4H 2023 (PMLR)

Link DOI

Abstract Intensive Care Units (ICU) require comprehensive patient data integration for enhanced clinical outcome predictions, crucial for assessing patient conditions. Recent deep learning advances have utilized patient time series data, and fusion models have incorporated unstructured clinical reports, improving predictive performance. However, integrating established medical knowledge into these models has not yet been explored. The medical domain's data, rich in structural relationships, can be harnessed through knowledge graphs derived from clinical ontologies like the Unified Medical Language System (UMLS) for better predictions. Our proposed methodology integrates this knowledge with ICU data, improving clinical decision modeling. It combines graph representations with vital signs and clinical reports, enhancing performance, especially when data is missing. Additionally, our model includes an interpretability component to understand how knowledge graph nodes affect predictions.

Authors Samyak Jain, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted ML4H 2023 (Findings Track)

Link DOI

Abstract In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.

Authors Yurong Hu, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Submitted NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice

Link DOI