Fedor Sergeev, MSc

"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." - John von Neumann

PhD Student

E-Mail
fedor.sergeev@get-your-addresses-elsewhere.inf.ethz.ch
Address
Department of Computer Science
Biomedical Informatics Group
Universitätstrasse 6
8092 Zürich
Room
CAB F53

Informing machine learning models with human insights for better performance, reliability, and interpretability

I did my BSc in Applied Mathematics and Physics at MIPT. I developed computational methods for simulations in physics under the supervision of Igor Petrov and Nikolay Khoklov. In parallel, I worked on applications of deep learning in high-energy physics at GSI and LAMBDA

In my MSc I studied Computational Sciences and Engineering EPFL, combining my interest in numerical and data-driven modeling. My thesis with Pascal Fua and Jonathan Donier was on physics-informed neural networks for modeling fluid flow. During my studies, I interned at startup companies Spiden and Neural Concept, working on synthetic data generation for medical spectroscopic data and 3D computer vision for advanced engineering, respectively.

I joined BMI lab in July 2023 to work on multimodal, representation and Bayesian deep learning on intensive care unit (ICU) data. I am also an ELLIS PhD student, co-supervised by Vincent Fortuin.

 

 

Abstract Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

Authors Fedor Sergeev, Manuel Burger, Polina Leshetkina, Vincent Fortuin, Gunnar Rätsch, Rita Kuznetsova

Submitted ML4H 2025 (PMLR)

Link

Abstract Intensive care departments generate vast multivariate time series data capturing the dynamic physiological states of critically ill patients. Despite advances in AI-driven clinical decision support, existing models remain limited. They are tailored to specific conditions or single institutions and require extensive adaptation for new settings. To make such generalization feasible, we introduce ICareFM, a novel foundation model for intensive care, trained on a harmonized dataset of unprecedented scale. The dataset contains 650,000 patient stays, accumulating more than 4,000 patient years of data, and over one billion measurements from hospitals in the US, several European countries, and China. ICareFM employs a novel self-supervised time-to-event objective that extracts robust patient representations from noisy, irregular, multivariate time series. As a result, ICareFM can generalize to new tasks and beyond its training distribution, a property we demonstrate through evaluations in a range of out-of-distribution scenarios, including transfer to unseen hospitals and zero-shot inference on previously unobserved tasks. ICareFM consistently outperforms conventional machine learning models and recent foundation model baselines, demonstrating strong generalization, improved data efficiency, and the ability to generate interpretable forecasts. These results establish ICareFM as a scalable and adaptable foundation model for critical care time series, enabling zero-shot clinical prediction and working towards the development of digital patient twins for precision medicine.

Authors Manuel Burger, Daphné Chopard, Malte Londschien, Fedor Sergeev, Hugo Yèche, Rita Kuznetsova, Martin Faltys, Eike Gerdes, Polina Leshetkina, Peter Bühlmann, Gunnar Rätsch

Submitted medRxiv

Link DOI

Abstract Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Authors Manuel Burger, Fedor Sergeev, Malte Londschien, Daphné Chopard, Hugo Yèche, Eike Gerdes, Polina Leshetkina, Alexander Morgenroth, Zeynep Babür, Jasmina Bogojeska, Martin Faltys, Rita Kuznetsova, Gunnar Rätsch

Submitted Best Paper @ NeurIPS AIM-FM Workshop 2024

Link DOI

Abstract Knowing which features of a multivariate time series to measure and when is a key task in medicine, wearables, and robotics. Better acquisition policies can reduce costs while maintaining or even improving the performance of downstream predictors. Inspired by the maximization of conditional mutual information, we propose an approach to train acquirers end-to-end using only the downstream loss. We show that our method outperforms random acquisition policy, matches a model with an unrestrained budget, but does not yet overtake a static acquisition strategy. We highlight the assumptions and outline avenues for future work.

Authors Fedor Sergeev, Paola Malsot, Gunnar Rätsch, Vincent Fortuin

Submitted SPIGM ICML Workshop

Link DOI