Matthias Hüser, MSc. ETH Computer Science
"Reality is that which, when you stop believing in it, doesn’t go away." -- Philip K. Dick (1928-1982)
- matthias.hueser@ inf.ethz.ch
- +41 44 632 23 71
Department of Computer Science
Biomedical Informatics Group
- CAB F53.1
I am broadly interested in Machine Learning and Signal Processing for Healthcare, in particular the application of Deep Learning and Bayesian non-parametric methods to EHR and genomic data.
Before joining the Rätsch Lab, I studied Computer Science at ETH Zürich (MSc) and Computing at Imperial College London (BEng). Previously I have worked on forecasting intracranial hypertension and cerebral hypoxia events using high-frequency ICU data.
Abstract Intensive-care clinicians are presented with large quantities of measurements from multiple monitoring systems. The limited ability of humans to process complex information hinders early recognition of patient deterioration, and high numbers of monitoring alarms lead to alarm fatigue. We used machine learning to develop an early-warning system that integrates measurements from multiple organ systems using a high-resolution database with 240 patient-years of data. It predicts 90% of circulatory-failure events in the test set, with 82% identified more than 2 h in advance, resulting in an area under the receiver operating characteristic curve of 0.94 and an area under the precision-recall curve of 0.63. On average, the system raises 0.05 alarms per patient and hour. The model was externally validated in an independent patient cohort. Our model provides early identification of patients at risk for circulatory failure with a much lower false-alarm rate than conventional threshold-based systems.
Authors Stephanie L. Hyland, Martin Faltys, Matthias Hüser, Xinrui Lyu, Thomas Gumbsch, Cristóbal Esteban, Christian Bock, Max Horn, Michael Moor, Bastian Rieck, Marc Zimmermann, Dean Bodenham, Karsten Borgwardt, Gunnar Rätsch & Tobias M. Merz
Submitted Nature Medicine
Abstract Objective: Acute intracranial hypertension is an important risk factor of secondary brain damage after traumatic brain injury. Hypertensive episodes are often diagnosed reactively, leading to late detection and lost time for intervention planning. A pro-active approach that predicts critical events several hours ahead of time could assist in directing attention to patients at risk. Approach: We developed a prediction framework that forecasts onsets of acute intracranial hypertension in the next 8 hours. It jointly uses cerebral auto-regulation indices, spectral energies and morphological pulse metrics to describe the neurological state of the patient. One-minute base windows were compressed by computing signal metrics, and then stored in a multi-scale history, from which physiological features were derived. Main results: Our model predicted events up to 8 hours in advance with alarm recall rates of 90% at a precision of 30% in the MIMIC- III waveform database, improving upon two baselines from the literature. We found that features derived from high-frequency waveforms substantially improved the prediction performance over simple statistical summaries of low-frequency time series, and each of the three feature classes contributed to the performance gain. The inclusion of long-term history up to 8 hours was especially important. Significance: Our results highlight the importance of information contained in high-frequency waveforms in the neurological intensive care unit. They could motivate future studies on pre-hypertensive patterns and the design of new alarm algorithms for critical events in the injured brain.
Authors Matthias Hüser, Adrian Kündig, Walter Karlen, Valeria De Luca, Martin Jaggi
Submitted Physiological Measurement
Abstract Generating interpretable visualizations from complex data is a common problem in many applications. Two key ingredients for tackling this issue are clustering and representation learning. However, current methods do not yet successfully combine the strengths of these two approaches. Existing representation learning models which rely on latent topological structure such as self-organising maps, exhibit markedly lower clustering performance compared to recent deep clustering methods. To close this performance gap, we (a) present a novel way to fit self-organizing maps with probabilistic cluster assignments (PSOM), (b) propose a new deep architecture for probabilistic clustering (DPSOM) using a VAE, and (c) extend our architecture for time-series clustering (T-DPSOM), which also allows forecasting in the latent space using LSTMs. We show that DPSOM achieves superior clustering performance compared to current deep clustering methods on MNIST/Fashion-MNIST, while maintaining the favourable visualization properties of SOMs. On medical time series, we show that T-DPSOM outperforms baseline methods in time series clustering and time series forecasting, while providing interpretable visualizations of patient state trajectories and uncertainty estimation.
Authors Laura Manduchi, Matthias Hüser, Julia Vogt, Gunnar Rätsch, Vincent Fortuin
Submitted arXiv Preprints
Abstract High-dimensional time series are common in many domains. Since human cognition is not optimized to work well in high-dimensional spaces, these areas could benefit from interpretable low-dimensional representations. However, most representation learning algorithms for time series data are difficult to interpret. This is due to non-intuitive mappings from data features to salient properties of the representation and non-smoothness over time. To address this problem, we propose a new representation learning framework building on ideas from interpretable discrete dimensionality reduction and deep generative modeling. This framework allows us to learn discrete representations of time series, which give rise to smooth and interpretable embeddings with superior clustering performance. We introduce a new way to overcome the non-differentiability in discrete representation learning and present a gradient-based version of the traditional self-organizing map algorithm that is more performant than the original. Furthermore, to allow for a probabilistic interpretation of our method, we integrate a Markov model in the representation space. This model uncovers the temporal transition structure, improves clustering performance even further and provides additional explanatory insights as well as a natural representation of uncertainty. We evaluate our model in terms of clustering performance and interpretability on static (Fashion-)MNIST data, a time series of linearly interpolated (Fashion-)MNIST images, a chaotic Lorenz attractor system with two macro states, as well as on a challenging real world medical time series application on the eICU data set. Our learned representations compare favorably with competitor methods and facilitate downstream tasks on the real world data.
Authors Vincent Fortuin, Matthias Hüser, Francesco Locatello, Heiko Strathmann, Gunnar Rätsch
Submitted ICLR 2019
Abstract In this work, we investigate unsupervised representation learning on medical time series, which bears the promise of leveraging copious amounts of existing unlabeled data in order to eventually assist clinical decision making. By evaluating on the prediction of clinically relevant outcomes, we show that in a practical setting, unsupervised representation learning can offer clear performance benefits over end-to-end supervised architectures. We experiment with using sequence-to-sequence (Seq2Seq) models in two different ways, as an autoencoder and as a forecaster, and show that the best performance is achieved by a forecasting Seq2Seq model with an integrated attention mechanism, proposed here for the first time in the setting of unsupervised learning for medical time series.
Authors Xinrui Lyu, Matthias Hüser, Stephanie L. Hyland, George Zerveas, Gunnar Rätsch
Submitted Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 - Spotlight
Abstract Our comprehensive analysis of alternative splicing across 32 The Cancer Genome Atlas cancer types from 8,705 patients detects alternative splicing events and tumor variants by reanalyzing RNA and whole-exome sequencing data. Tumors have up to 30% more alternative splicing events than normal samples. Association analysis of somatic variants with alternative splicing events confirmed known trans associations with variants in SF3B1 and U2AF1 and identified additional trans-acting variants (e.g., TADA1, PPP2R1A). Many tumors have thousands of alternative splicing events not detectable in normal samples; on average, we identified ≈930 exon-exon junctions (“neojunctions”) in tumors not typically found in GTEx normals. From Clinical Proteomic Tumor Analysis Consortium data available for breast and ovarian tumor samples, we confirmed ≈1.7 neojunction- and ≈0.6 single nucleotide variant-derived peptides per tumor sample that are also predicted major histocompatibility complex-I binders (“putative neoantigens”).
Authors Andre Kahles, Kjong-Van Lehmann, Nora C. Toussaint, Matthias Hüser, Stefan Stark, Timo Sachsenberg, Oliver Stegle, Oliver Kohlbacher, Chris Sander, Gunnar Rätsch, The Cancer Genome Atlas Research Network
Submitted Cancer Cell
Abstract The deterioration of organ function in ICU patients requires swift response to prevent further damage to vital systems. Focusing on the circulatory system, we build a model to predict if a patient’s state will deteriorate in the near future. We identify circulatory system dys- function using the combination of excess lactic acid in the blood and low mean arterial blood pressure or the presence of vasoactive drugs. Using an observational cohort of 45,000 patients from a Swiss ICU, we extract and process patient time series and identify periods of circulatory system dysfunction to develop an early warning system. We train a gra- dient boosting model to perform binary classification every five minutes on whether the patient will deteriorate during an increasingly large win- dow into the future, up to the duration of a shift (8 hours). The model achieves an AUROC between 0.952 and 0.919 across the prediction win- dows, and an AUPRC between 0.223 and 0.384 for events with positive prevalence between 0.014 and 0.042. We also show preliminary results from a recurrent neural network. These results show that contemporary machine learning approaches combined with careful preprocessing of raw data collected during routine care yield clinically useful predictions in near real time [Workshop Abstract]
Authors Stephanie Hyland, Matthias Hüser, Xinrui Lyu, Martin Faltys, Tobias Merz, Gunnar Rätsch
Submitted Proceedings of the First Joint Workshop on AI in Health
Authors Valeria De Luca, Matthias Hüser, Martin Jaggi, Walter Karlen, Emanuela Keller
Submitted 16th International Symposium on Intracranial Pressure and Neuromonitoring, Cambridge, MA, USA
Authors Matthias Hüser, Valeria De Luca, Martin Jaggi, Walter Karlen, Emanuela Keller
Submitted Vasospasm - 13th International Conference on Neurovascular Events after Subarachnoid Hemorrhage