Cristóbal Esteban, MSc.

Post Doc

+41 44 632 23 74
ETH Zürich
Department of Computer Science
Biomedical Informatics Group Universitätsstrasse 6
8092 Zürich
CAB F52.1

I am interested in helping to cure diseases, improve people's health and extend the human lifespan by developing intelligent systems that can find and exploit complex dependencies in biomedical datasets.

I was a PhD candidate under the supervision of Prof. Volker Tresp at Ludwig Maximilians University of Munich. I also belonged to the Machine Intelligence group at Siemens AG in Munich. During this period I was also part of the EU Marie Curie International Training Network of machine learning for personalized medicine. Thanks to this network, I had the opportunity of visiting multiple research groups, including the Rätsch laboratory when it was based at Memorial Sloan Kettering Cancer Center in New York.

I have worked in several machine learning projects applied to medicine and I am especially interested in developing Deep Learning models, like Recurrent Neural Networks, and analyzing how to use them to find complex relationships in sequential multivariate biomedical datasets.

Abstract Generative Adversarial Networks (GANs) have shown remarkable success as a framework for training models to produce realistic-looking data. In this work, we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to produce realistic real-valued multi-dimensional time series, with an emphasis on their application to medical data. RGANs make use of recurrent neural networks in the generator and the discriminator. In the case of RCGANs, both of these RNNs are conditioned on auxiliary information. We demonstrate our models in a set of toy datasets, where we show visually and quantitatively (using sample likelihood and maximum mean discrepancy) that they can successfully generate realistic time-series. We also describe novel evaluation methods for GANs, where we generate a synthetic labelled training dataset, and evaluate on a real test set the performance of a model trained on the synthetic data, and vice-versa. We illustrate with these metrics that RCGANs can generate time-series data useful for supervised training, with only minor degradation in performance on real test data. This is demonstrated on digit classification from 'serialised' MNIST and by training an early warning system on a medical dataset of 17,000 patients from an intensive care unit. We further discuss and analyse the privacy concerns that may arise when using RCGANs to generate realistic synthetic medical time series data.

Authors Stephanie L Hyland, Cristobal Esteban, Gunnar Rätsch

Submitted arXiv