Fedor Sergeev, MSc

"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." - John von Neumann

PhD Student

E-Mail
fedor.sergeev@get-your-addresses-elsewhere.inf.ethz.ch
Address
Department of Computer Science
Biomedical Informatics Group
Universitätstrasse 6
8092 Zürich
Room
CAB F53

Informing machine learning models with human insights for better performance, reliability, and interpretability

I did my BSc in Applied Mathematics and Physics at MIPT. I developed computational methods for simulations in physics under the supervision of Igor Petrov and Nikolay Khoklov. In parallel, I worked on applications of deep learning in high-energy physics at GSI and LAMBDA

In my MSc I studied Computational Sciences and Engineering EPFL, combining my interest in numerical and data-driven modeling. My thesis with Pascal Fua and Jonathan Donier was on physics-informed neural networks for modeling fluid flow. During my studies, I interned at startup companies Spiden and Neural Concept, working on synthetic data generation for medical spectroscopic data and 3D computer vision for advanced engineering, respectively.

I joined BMI lab in July 2023 to work on multimodal, representation and Bayesian deep learning on intensive care unit (ICU) data. I am also an ELLIS PhD student, co-supervised by Vincent Fortuin.

 

 

Abstract Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

Authors Fedor Sergeev, Manuel Burger, Polina Leshetkina, Vincent Fortuin, Gunnar Rätsch, Rita Kuznetsova

Submitted ML4H 2025 (PMLR)

Link

Abstract Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Authors Manuel Burger, Fedor Sergeev, Malte Londschien, Daphné Chopard, Hugo Yèche, Eike Gerdes, Polina Leshetkina, Alexander Morgenroth, Zeynep Babür, Jasmina Bogojeska, Martin Faltys, Rita Kuznetsova, Gunnar Rätsch

Submitted Best Paper @ NeurIPS AIM-FM Workshop 2024

Link DOI

Abstract Knowing which features of a multivariate time series to measure and when is a key task in medicine, wearables, and robotics. Better acquisition policies can reduce costs while maintaining or even improving the performance of downstream predictors. Inspired by the maximization of conditional mutual information, we propose an approach to train acquirers end-to-end using only the downstream loss. We show that our method outperforms random acquisition policy, matches a model with an unrestrained budget, but does not yet overtake a static acquisition strategy. We highlight the assumptions and outline avenues for future work.

Authors Fedor Sergeev, Paola Malsot, Gunnar Rätsch, Vincent Fortuin

Submitted SPIGM ICML Workshop

Link DOI