Rita Kuznetsova, Ph.D.

Post Doc


My main research interests are the representation learning across domains and exploring the structure of the resulting representations.

I was a PhD candidate at Moscow Institute of Physics and Technology, also I acquired the bachelor and master degree at the same university. During this period I was also a part of the biggest and most known company in Russia and CIS for plagiarism detection (Antiplagiat company).  Throughout this time I was working on deep learning applications in natural language understanding and machine translation tasks. Also I was working on a combination of metric learning approach with probabilistic generative models. In 2018 I was hired as a postdoc in IBM Research Zurich lab where my main duties were to create unsupervised approaches for Named Entity Recognition and Relationship Extraction. Then I joined the Rätsch lab in October 2020.

Abstract Models that can predict adverse events ahead of time with low false-alarm rates are critical to the acceptance of decision support systems in the medical community. This challenging machine learning task remains typically treated as simple binary classification, with few bespoke methods proposed to leverage temporal dependency across samples. We propose Temporal Label Smoothing (TLS), a novel learning strategy that modulates smoothing strength as a function of proximity to the event of interest. This regularization technique reduces model confidence at the class boundary, where the signal is often noisy or uninformative, thus allowing training to focus on clinically informative data points away from this boundary region. From a theoretical perspective, we also show that our method can be framed as an extension of multi-horizon prediction, a learning heuristic proposed in other early prediction work. TLS empirically matches or outperforms considered competing methods on various early prediction benchmark tasks. In particular, our approach significantly improves performance on clinically-relevant metrics such as event recall at low false-alarm rates.

Authors Hugo Yèche, Alizée Pace, Gunnar Rätsch, Rita Kuznetsova

Link DOI

Abstract The recent success of machine learning methods applied to time series collected from Intensive Care Units (ICU) exposes the lack of standardized machine learning benchmarks for developing and comparing such methods. While raw datasets, such as MIMIC-IV or eICU, can be freely accessed on Physionet, the choice of tasks and pre-processing is often chosen ad-hoc for each publication, limiting comparability across publications. In this work, we aim to improve this situation by providing a benchmark covering a large spectrum of ICU-related tasks. Using the HiRID dataset, we define multiple clinically relevant tasks in collaboration with clinicians. In addition, we provide a reproducible end-to-end pipeline to construct both data and labels. Finally, we provide an in-depth analysis of current state-of-the-art sequence modeling methods, highlighting some limitations of deep learning approaches for this type of data. With this benchmark, we hope to give the research community the possibility of a fair comparison of their work.

Authors Hugo Yèche, Rita Kuznetsova, Marc Zimmermann, Matthias Hüser, Xinrui Lyu, Martin Faltys, Gunnar Rätsch

Submitted NeurIPS 2021 (Datasets and Benchmarks)