Welcome to the Biomedical Informatics Lab of Prof. Dr. Gunnar Rätsch

The research in our group lies at the interface between methods research in Machine Learning, Genomics and Medical Informatics and relevant applications in biology and medicine.

We develop new analysis techniques that are capable of dealing with large amounts of medical and genomic data. These techniques aim to provide accurate predictions on the phenomenon at hand and to comprehensibly provide reasons for their prognoses, and thereby assist in gaining new biomedical insights.

Current research includes a) Machine Learning related to time-series analysis and iterative optimization algorithms, b) methods for transcriptome analyses to study transcriptome alterations in cancer, c) developing clinical decision support systems, in particular, for time series data from intensive care units, d) new graph genome algorithms to store and analyze very large sets of genomic sequences, and e) developing methods and resources for international sharing of genomic and clinical data, for instance, about variants in BRCA1/2.

Abstract Intensive-care clinicians are presented with large quantities of measurements from multiple monitoring systems. The limited ability of humans to process complex information hinders early recognition of patient deterioration, and high numbers of monitoring alarms lead to alarm fatigue. We used machine learning to develop an early-warning system that integrates measurements from multiple organ systems using a high-resolution database with 240 patient-years of data. It predicts 90% of circulatory-failure events in the test set, with 82% identified more than 2 h in advance, resulting in an area under the receiver operating characteristic curve of 0.94 and an area under the precision-recall curve of 0.63. On average, the system raises 0.05 alarms per patient and hour. The model was externally validated in an independent patient cohort. Our model provides early identification of patients at risk for circulatory failure with a much lower false-alarm rate than conventional threshold-based systems.

Authors Stephanie L. Hyland, Martin Faltys, Matthias Hüser, Xinrui Lyu, Thomas Gumbsch, Cristóbal Esteban, Christian Bock, Max Horn, Michael Moor, Bastian Rieck, Marc Zimmermann, Dean Bodenham, Karsten Borgwardt, Gunnar Rätsch & Tobias M. Merz

Submitted Nature Medicine


Abstract Transcript alterations often result from somatic changes in cancer genomes. Various forms of RNA alterations have been described in cancer, including overexpression, altered splicing and gene fusions; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed ‘bridged’ fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.

Authors PCAWG Transcriptome Core Group, Claudia Calabrese, Natalie R. Davidson, Deniz Demircioğlu, Nuno A. Fonseca, Yao He, André Kahles, Kjong-Van Lehmann, Fenglin Liu, Yuichi Shiraishi, Cameron M. Soulette, Lara Urban, Liliana Greger, Siliang Li, Dongbing Liu, Marc D. Perry, Qian Xiang, Fan Zhang, Junjun Zhang, Peter Bailey, Serap Erkek, Katherine A. Hoadley, Yong Hou, Matthew R. Huska, Helena Kilpinen, Jan O. Korbel, Maximillian G. Marin, Julia Markowski, Tannistha Nandi, Qiang Pan-Hammarström, Chandra Sekhar Pedamallu, Reiner Siebert, Stefan G. Stark, Hong Su, Patrick Tan, Sebastian M. Waszak, Christina Yung, Shida Zhu, Philip Awadalla, Chad J. Creighton, Matthew Meyerson, B. F. Francis Ouellette, Kui Wu, Huanming Yang, PCAWG Transcriptome Working Group, Alvis Brazma, Angela N. Brooks, Jonathan Göke, Gunnar Rätsch, Roland F. Schwarz, Oliver Stegle, Zemin Zhang & PCAWG Consortium- Show fewer authors Nature volume 578, pages129–136(2020)Cite this article

Submitted Nature

Link DOI

Abstract Objective: Acute intracranial hypertension is an important risk factor of secondary brain damage after traumatic brain injury. Hypertensive episodes are often diagnosed reactively, leading to late detection and lost time for intervention planning. A pro-active approach that predicts critical events several hours ahead of time could assist in directing attention to patients at risk. Approach: We developed a prediction framework that forecasts onsets of acute intracranial hypertension in the next 8 hours. It jointly uses cerebral auto-regulation indices, spectral energies and morphological pulse metrics to describe the neurological state of the patient. One-minute base windows were compressed by computing signal metrics, and then stored in a multi-scale history, from which physiological features were derived. Main results: Our model predicted events up to 8 hours in advance with alarm recall rates of 90% at a precision of 30% in the MIMIC- III waveform database, improving upon two baselines from the literature. We found that features derived from high-frequency waveforms substantially improved the prediction performance over simple statistical summaries of low-frequency time series, and each of the three feature classes contributed to the performance gain. The inclusion of long-term history up to 8 hours was especially important. Significance: Our results highlight the importance of information contained in high-frequency waveforms in the neurological intensive care unit. They could motivate future studies on pre-hypertensive patterns and the design of new alarm algorithms for critical events in the injured brain.

Authors Matthias Hüser, Adrian Kündig, Walter Karlen, Valeria De Luca, Martin Jaggi

Submitted Physiological Measurement

Link DOI

Abstract In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

Authors Francesco Locatello, Alp Yurtsever, Olivier Fercoq, Volkan Cevher

Submitted NeurIPS 2019

Link DOI

Abstract Generating interpretable visualizations from complex data is a common problem in many applications. Two key ingredients for tackling this issue are clustering and representation learning. However, current methods do not yet successfully combine the strengths of these two approaches. Existing representation learning models which rely on latent topological structure such as self-organising maps, exhibit markedly lower clustering performance compared to recent deep clustering methods. To close this performance gap, we (a) present a novel way to fit self-organizing maps with probabilistic cluster assignments (PSOM), (b) propose a new deep architecture for probabilistic clustering (DPSOM) using a VAE, and (c) extend our architecture for time-series clustering (T-DPSOM), which also allows forecasting in the latent space using LSTMs. We show that DPSOM achieves superior clustering performance compared to current deep clustering methods on MNIST/Fashion-MNIST, while maintaining the favourable visualization properties of SOMs. On medical time series, we show that T-DPSOM outperforms baseline methods in time series clustering and time series forecasting, while providing interpretable visualizations of patient state trajectories and uncertainty estimation.

Authors Laura Manduchi, Matthias Hüser, Julia Vogt, Gunnar Rätsch, Vincent Fortuin

Submitted arXiv Preprints