Vincent Fortuin, MSc
"The scientist is not a person who gives the right answers, he's one who asks the right questions." - Claude Lévi-Strauss
- fortuin@ inf.ethz.ch
- +41 44 633 66 87
Department of Computer Science
Biomedical Informatics Group
CAB F 39
- CAB F 39
I am interested in the interface between deep learning and probabilistic modeling. I am particularly keen to develop models that are more interpretable and data efficient, since these are two major requirements in the field of health care.
I did my undergraduate studies in Molecular Life Sciences at the University of Hamburg, where I worked on phylogeny inference for quickly mutating virus strains with Andrew Torda. I then went to ETH Zürich to study Computational Biology and Bioinformatics, in a joint program with the University of Zürich, with a focus on systems biology and machine learning. My master's thesis was about the application of deep learning to gene regulatory network inference under supervision of Manfred Claassen. During my studies I also spent some time in Jacob Hanna's group at the Weizmann Institute of Science, working on multiomics data analysis in stem cell research. Before joining the Biomedical Informatics group as a PhD student, I worked on deep learning applications in natural language understanding at Disney Research. During my PhD studies, I visited Stephan Mandt at the UC Irvine and completed an internship with Katja Hofmann at Microsoft Research Cambridge. I am supported by a PhD fellowship from the Swiss Data Science Center. Further information about my research activities can be found here.
Abstract Meta-learning can successfully acquire useful inductive biases from data, especially when a large number of meta-tasks are available. Yet, its generalization properties to unseen tasks are poorly understood. Particularly if the number of meta-tasks is small, this raises concerns for potential overfitting. We provide a theoretical analysis using the PAC-Bayesian framework and derive novel generalization bounds for meta-learning with unbounded loss functions and Bayesian base learners. Using these bounds, we develop a class of PAC-optimal meta-learning algorithms with performance guarantees and a principled meta-regularization. When instantiating our PAC-optimal hyper-posterior (PACOH) with Gaussian processes as base learners, the resulting approach consistently outperforms several popular meta-learning methods, both in terms of predictive accuracy and the quality of its uncertainty estimates.
Authors Jonas Rothfuss, Vincent Fortuin, Andreas Krause
Submitted arXiv Preprints
Abstract Generating interpretable visualizations from complex data is a common problem in many applications. Two key ingredients for tackling this issue are clustering and representation learning. However, current methods do not yet successfully combine the strengths of these two approaches. Existing representation learning models which rely on latent topological structure such as self-organising maps, exhibit markedly lower clustering performance compared to recent deep clustering methods. To close this performance gap, we (a) present a novel way to fit self-organizing maps with probabilistic cluster assignments (PSOM), (b) propose a new deep architecture for probabilistic clustering (DPSOM) using a VAE, and (c) extend our architecture for time-series clustering (T-DPSOM), which also allows forecasting in the latent space using LSTMs. We show that DPSOM achieves superior clustering performance compared to current deep clustering methods on MNIST/Fashion-MNIST, while maintaining the favourable visualization properties of SOMs. On medical time series, we show that T-DPSOM outperforms baseline methods in time series clustering and time series forecasting, while providing interpretable visualizations of patient state trajectories and uncertainty estimation.
Authors Laura Manduchi, Matthias Hüser, Julia Vogt, Gunnar Rätsch, Vincent Fortuin
Submitted arXiv Preprints
Abstract With a mortality rate of 5.4 million lives worldwide every year and a healthcare cost of more than 16 billion dollars in the USA alone, sepsis is one of the leading causes of hospital mortality and an increasing concern in the ageing western world. Recently, medical and technological advances have helped re-define the illness criteria of this disease, which is otherwise poorly understood by the medical society. Together with the rise of widely accessible Electronic Health Records, the advances in data mining and complex nonlinear algorithms are a promising avenue for the early detection of sepsis. This work contributes to the research effort in the field of automated sepsis detection with an open-access labelling of the medical MIMIC-III data set. Moreover, we propose MGP-AttTCN: a joint multitask Gaussian Process and attention-based deep learning model to early predict the occurrence of sepsis in an interpretable manner. We show that our model outperforms the current state-of-the-art and present evidence that different labelling heuristics lead to discrepancies in task difficulty.
Authors Margherita Rosnati, Vincent Fortuin
Submitted arXiv Preprints
Abstract Clustering high-dimensional data, such as images or biological measurements, is a long-standing problem and has been studied extensively. Recently, Deep Clustering gained popularity due to its flexibility in fitting the specific peculiarities of complex data. Here we introduce the Mixture-of-Experts Similarity Variational Autoencoder (MoE-Sim-VAE), a novel generative clustering model. The model can learn multi-modal distributions of high-dimensional data and use these to generate realistic data with high efficacy and efficiency. MoE-Sim-VAE is based on a Variational Autoencoder (VAE), where the decoder consists of a Mixture-of-Experts (MoE) architecture. This specific architecture allows for various modes of the data to be automatically learned by means of the experts. Additionally, we encourage the lower dimensional latent representation of our model to follow a Gaussian mixture distribution and to accurately represent the similarities between the data points. We assess the performance of our model on the MNIST benchmark data set and a challenging real-world task of defining cell subpopulations from mass cytometry (CyTOF) measurements on hundreds of different datasets. MoE-Sim-VAE exhibits superior clustering performance on all these tasks in comparison to the baselines as well as competitor methods and we show that the MoE architecture in the decoder reduces the computational cost of sampling specific data modes with high fidelity.
Authors Andreas Kopf, Vincent Fortuin, Vignesh Ram Somnath, Manfred Claassen
Submitted arXiv Preprints
Abstract Obtaining high-quality uncertainty estimates is essential for many applications of deep neural networks. In this paper, we theoretically justify a scheme for estimating uncertainties, based on sampling from a prior distribution. Crucially, the uncertainty estimates are shown to be conservative in the sense that they never underestimate a posterior uncertainty obtained by a hypothetical Bayesian algorithm. We also show concentration, implying that the uncertainty estimates converge to zero as we get more data. Uncertainty estimates obtained from random priors can be adapted to any deep network architecture and trained using standard supervised learning pipelines. We provide experimental evaluation of random priors on calibration and out-of-distribution detection on typical computer vision tasks, demonstrating that they outperform deep ensembles in practice.
Authors Kamil Ciosek, Vincent Fortuin, Ryota Tomioka, Katja Hofmann, Richard Turner
Submitted ICLR 2020
Abstract Metagenomic studies have increasingly utilized sequencing technologies in order to analyze DNA fragments found in environmental samples.One important step in this analysis is the taxonomic classification of the DNA fragments. Conventional read classification methods require large databases and vast amounts of memory to run, with recent deep learning methods suffering from very large model sizes. We therefore aim to develop a more memory-efficient technique for taxonomic classification. A task of particular interest is abundance estimation in metagenomic samples. Current attempts rely on classifying single DNA reads independently from each other and are therefore agnostic to co-occurence patterns between taxa. In this work, we also attempt to take these patterns into account. We develop a novel memory-efficient read classification technique, combining deep learning and locality-sensitive hashing. We show that this approach outperforms conventional mapping-based and other deep learning methods for single-read taxonomic classification when restricting all methods to a fixed memory footprint. Moreover, we formulate the task of abundance estimation as a Multiple Instance Learning (MIL) problem and we extend current deep learning architectures with two different types of permutation-invariant MIL pooling layers: a) deepsets and b) attention-based pooling. We illustrate that our architectures can exploit the co-occurrence of species in metagenomic read sets and outperform the single-read architectures in predicting the distribution over taxa at higher taxonomic ranks.
Authors Andreas Georgiou, Vincent Fortuin, Harun Mustafa, Gunnar Rätsch
Submitted arXiv Preprints
Abstract Multivariate time series with missing values are common in areas such as healthcare and finance, and have grown in number and complexity over the years. This raises the question whether deep learning methodologies can outperform classical data imputation methods in this domain. However, naive applications of deep learning fall short in giving reliable confidence estimates and lack interpretability. We propose a new deep sequential latent variable model for dimensionality reduction and data imputation. Our modeling assumption is simple and interpretable: the high dimensional time series has a lower-dimensional representation which evolves smoothly in time according to a Gaussian process. The non-linear dimensionality reduction in the presence of missing data is achieved using a VAE approach with a novel structured variational approximation. We demonstrate that our approach outperforms several classical and deep learning-based data imputation methods on high-dimensional data from the domains of computer vision and healthcare, while additionally improving the smoothness of the imputations and providing interpretable uncertainty estimates.
Authors Vincent Fortuin, Dmitry Baranchuk, Gunnar Rätsch, Stephan Mandt
Submitted AISTATS 2020
Abstract When fitting Bayesian machine learning models on scarce data, the main challenge is to obtain suitable prior knowledge and encode it into the model. Recent advances in meta-learning offer powerful methods for extracting such prior knowledge from data acquired in related tasks. When it comes to meta-learning in Gaussian process models, approaches in this setting have mostly focused on learning the kernel function of the prior, but not on learning its mean function. In this work, we explore meta-learning the mean function of a Gaussian process prior. We present analytical and empirical evidence that mean function learning can be useful in the meta-learning setting, discuss the risk of overfitting, and draw connections to other meta-learning approaches, such as model agnostic meta-learning and functional PCA.
Authors Vincent Fortuin, Heiko Strathmann, Gunnar Rätsch
Submitted arXiv Preprints
Abstract High-dimensional time series are common in many domains. Since human cognition is not optimized to work well in high-dimensional spaces, these areas could benefit from interpretable low-dimensional representations. However, most representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. This framework allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. We introduce a new way to overcome the non-differentiability in discrete representation learning and present a gradient-based version of the traditional self-organizing map algorithm that is more performant than the original. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the representation space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model in terms of clustering performance and interpretability on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application on the eICU data set. Our learned representations compare favorably with competitor methods and facilitate downstream tasks on the real world data.
Authors Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, Gunnar Rätsch
Submitted ICLR 2019
Abstract Neural Processes (NPs) are a class of neural latent variable models that combine desirable properties of Gaussian Processes (GPs) and neural networks. Like GPs, NPs define distributions over functions and are able to estimate the uncertainty in their predictions. Like neural networks, NPs are computationally efficient during training and prediction time. In this paper, we establish an explicit theoretical connection between NPs and GPs. In particular, we show that, under certain conditions, NPs are mathematically equivalent to GPs with deep kernels. This result further elucidates the relationship between GPs and NPs and makes previously derived theoretical insights about GPs applicable to NPs. Furthermore, it suggests a novel approach to learning expressive GP covariance functions applicable across different prediction tasks by training a deep kernel GP on a set of datasets.
Authors Tim G. J. Rudner, Vincent Fortuin, Yee Whye Teh, Yarin Gal
Submitted Bayesian Deep Learning workshop @NeurIPS 2018
Abstract Kernel methods on discrete domains have shown great promise for many challenging tasks, e.g., on biological sequence data as well as on molecular structures. Scalable kernel methods like support vector machines offer good predictive performances but they often do not provide uncertainty estimates. In contrast, probabilistic kernel methods like Gaussian Processes offer uncertainty estimates in addition to good predictive performance but fall short in terms of scalability. We present the first sparse Gaussian Process approximation framework on discrete input domains. Our framework achieves good predictive performance as well as uncertainty estimates using different discrete optimization techniques. We present competitive results comparing our framework to support vector machine and full Gaussian Process baselines on synthetic data as well as on challenging real-world DNA sequence data.
Authors Vincent Fortuin, Gideon Dresdner, Heiko Strathmann, Gunnar Rätsch
Submitted arXiv Preprints
Abstract The reconstruction of gene regulatory networks from time resolved gene expression measurements is a key challenge in systems biology with applications in health and disease. While the most popular network inference methods are based on unsupervised learning approaches, supervised learning methods have proven their potential for superior reconstruction performance. However, obtaining the appropriate volume of informative training data constitutes a key limitation for the success of such methods. Here, we introduce a supervised learning approach to detect gene-gene regulation based on exclusively synthetic training data, termed surrogate learning, and show its performance for synthetic and experimental time-series. We systematically investigate different simulation configurations of biologically representative time-series of transcripts and augmentation of the data with a measurement model. We compare the resulting synthetic datasets to experimental data, and evaluate classifiers trained on them for detection of gene-gene regulation from experimental time-series. For classifiers, we consider hybrid convolutional recurrent neural networks, random forests and logistic regression, and evaluate the reconstruction performance of different simulation settings, data pre-processing and classifiers. When training and test time-courses are generated from the same distribution, we find that the largest tested neural network architecture achieves the best performance of 0.448 +/- 0.047 (mean +/- std) in maximally achievable F1 score over all datasets outperforming random forests by 32.4 % +/- 14 % (mean +/- std). Reconstruction performance is sensitive to discrepancies between synthetic training and test data, highlighting the importance of matching training and test data domains. For an experimental gene expression dataset from E.coli, we find that training data generated with measurement model, multi-gene perturbations, but without data standardization is best suited for training classifiers for network reconstruction from the experimental test data. We further demonstrate superiority to multiple unsupervised, state-of-the-art methods for networks comprising 20 genes of the experimental data from E.coli (average AUPR best supervised = 0.22 vs best unsupervised = 0.07). We expect the proposed surrogate learning approach to be broadly applicable. It alleviates the requirement for large, difficult to attain volumes of experimental training data and instead relies on easily accessible synthetic data. Successful application for new experimental conditions and other data types is only limited by the automatable and scalable process of designing simulations which generate suitable synthetic data.
Authors Stefan Ganscha, Vincent Fortuin, Max Horn, Eirini Arvaniti, Manfred Claassen
Submitted bioRxiv Preprints
Abstract We present a novel approach to modeling stories using recurrent neural networks. Different story features are extracted using natural language processing techniques and used to encode the stories as sequences. These sequences can be learned by deep neural networks, in order to predict the next story events. The predictions can be used as an inspiration for writers who experience a writer's block. We further assist writers in their creative process by generating visualizations of the character interactions in the story. We show that suggestions from our model are rated as highly as the real scenes from a set of films and that our visualizations can help people in gaining deeper story understanding.
Authors Vincent Fortuin, Romann M. Weber, Sasha Schriber, Diana Wotruba, Markus Gross
Submitted AAAI 2018