Welcome to the Biomedical Informatics Lab of Prof. Dr. Gunnar Rätsch

The research in our group lies at the interface between methods research in Machine Learning, Genomics and Medical Informatics and relevant applications in biology and medicine.

We develop new analysis techniques that are capable of dealing with large amounts of medical and genomic data. These techniques aim to provide accurate predictions on the phenomenon at hand and to comprehensibly provide reasons for their prognoses, and thereby assist in gaining new biomedical insights.

Current research includes a) Machine Learning related to time-series analysis and iterative optimization algorithms, b) methods for transcriptome analyses to study transcriptome alterations in cancer, c) developing clinical decision support systems, in particular, for time series data from intensive care units, d) new graph genome algorithms to store and analyze very large sets of genomic sequences, and e) developing methods and resources for international sharing of genomic and clinical data, for instance, about variants in BRCA1/2.

Abstract We call upon the research community to standardize efforts to use daily self-reported data about COVID-19 symptoms in the response to the pandemic and to form a collaborative consortium to maximize global gain while protecting participant privacy. The rapid and global spread of COVID-19 led the World Health Organization to declare it a pandemic on 11 March 2020. One factor contributing to the spread of the pandemic is the lack of information about who is infected, in large part because of the lack of testing. This facilitated the silent spread of the causative coronavirus (SARS-CoV-2), which led to delays in public-health and government responses and an explosion in cases. In countries that have tested more aggressively and that had the capacity to transparently share this data, such as South Korea and Singapore, the spread of disease has been greatly slowed1. Although efforts are underway around the world to substantially ramp up testing capacity, technology-driven approaches to collecting self-reported information can fill an immediate need and complement official diagnostic results. This type of approach has been used for tracking other diseases, notably influenza2. The information collected may include health status that is self-reported through surveys, including those from mobile apps; results of diagnostic laboratory tests; and other static and real-time geospatial data. The collection of privacy-protected information from volunteers about health status over time may enable researchers to leverage these data to predict, respond to and learn about the spread of COVID-19. Given the global nature of the disease, we aim to form an international consortium, tentatively named the ‘Coronavirus Census Collective’, to serve as a hub for amassing this type of data and to create a unified platform for global epidemiological data collection and analysis.

Authors Segal E, Zhang F, Lin X, King G, Shalem O, Shilo S, Allen WE, Alquaddoomi F, Altae-Tran H, Anders S, Balicer R, Bauman T, Bonilla X, Booman G, Chan AT, Cohen O, Coletti S, Davidson N, Dor Y, Drew DA, Elemento O, Evans G, Ewels P, Gale J, Gavrieli A, Geiger B, Grad YH, Greene CS, Hajirasouliha I, Jerala R, Kahles A, Kallioniemi O, Keshet A, Kocarev L, Landua G, Meir T, Muller A, Nguyen LH, Oresic M, Ovchinnikova S, Peterson H, Prodanova J, Rajagopal J, Rätsch G, Rossman H, Rung J, Sboner A, Sigaras A, Spector T, Steinherz R, Stevens I, Vilo J, Wilmes P.

Submitted Nature Medicine

Link DOI

Abstract Intensive-care clinicians are presented with large quantities of measurements from multiple monitoring systems. The limited ability of humans to process complex information hinders early recognition of patient deterioration, and high numbers of monitoring alarms lead to alarm fatigue. We used machine learning to develop an early-warning system that integrates measurements from multiple organ systems using a high-resolution database with 240 patient-years of data. It predicts 90% of circulatory-failure events in the test set, with 82% identified more than 2 h in advance, resulting in an area under the receiver operating characteristic curve of 0.94 and an area under the precision-recall curve of 0.63. On average, the system raises 0.05 alarms per patient and hour. The model was externally validated in an independent patient cohort. Our model provides early identification of patients at risk for circulatory failure with a much lower false-alarm rate than conventional threshold-based systems.

Authors Stephanie L. Hyland, Martin Faltys, Matthias Hüser, Xinrui Lyu, Thomas Gumbsch, Cristóbal Esteban, Christian Bock, Max Horn, Michael Moor, Bastian Rieck, Marc Zimmermann, Dean Bodenham, Karsten Borgwardt, Gunnar Rätsch & Tobias M. Merz

Submitted Nature Medicine

Link

Abstract Transcript alterations often result from somatic changes in cancer genomes. Various forms of RNA alterations have been described in cancer, including overexpression, altered splicing and gene fusions; however, it is difficult to attribute these to underlying genomic changes owing to heterogeneity among patients and tumour types, and the relatively small cohorts of patients for whom samples have been analysed by both transcriptome and whole-genome sequencing. Here we present, to our knowledge, the most comprehensive catalogue of cancer-associated gene alterations to date, obtained by characterizing tumour transcriptomes from 1,188 donors of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). Using matched whole-genome sequencing data, we associated several categories of RNA alterations with germline and somatic DNA alterations, and identified probable genetic mechanisms. Somatic copy-number alterations were the major drivers of variations in total gene and allele-specific expression. We identified 649 associations of somatic single-nucleotide variants with gene expression in cis, of which 68.4% involved associations with flanking non-coding regions of the gene. We found 1,900 splicing alterations associated with somatic mutations, including the formation of exons within introns in proximity to Alu elements. In addition, 82% of gene fusions were associated with structural variants, including 75 of a new class, termed ‘bridged’ fusions, in which a third genomic location bridges two genes. We observed transcriptomic alteration signatures that differ between cancer types and have associations with variations in DNA mutational signatures. This compendium of RNA alterations in the genomic context provides a rich resource for identifying genes and mechanisms that are functionally implicated in cancer.

Authors PCAWG Transcriptome Core Group, Claudia Calabrese, Natalie R. Davidson, Deniz Demircioğlu, Nuno A. Fonseca, Yao He, André Kahles, Kjong-Van Lehmann, Fenglin Liu, Yuichi Shiraishi, Cameron M. Soulette, Lara Urban, Liliana Greger, Siliang Li, Dongbing Liu, Marc D. Perry, Qian Xiang, Fan Zhang, Junjun Zhang, Peter Bailey, Serap Erkek, Katherine A. Hoadley, Yong Hou, Matthew R. Huska, Helena Kilpinen, Jan O. Korbel, Maximillian G. Marin, Julia Markowski, Tannistha Nandi, Qiang Pan-Hammarström, Chandra Sekhar Pedamallu, Reiner Siebert, Stefan G. Stark, Hong Su, Patrick Tan, Sebastian M. Waszak, Christina Yung, Shida Zhu, Philip Awadalla, Chad J. Creighton, Matthew Meyerson, B. F. Francis Ouellette, Kui Wu, Huanming Yang, PCAWG Transcriptome Working Group, Alvis Brazma, Angela N. Brooks, Jonathan Göke, Gunnar Rätsch, Roland F. Schwarz, Oliver Stegle, Zemin Zhang & PCAWG Consortium- Show fewer authors Nature volume 578, pages129–136(2020)Cite this article

Submitted Nature

Link DOI

Abstract Objective: Acute intracranial hypertension is an important risk factor of secondary brain damage after traumatic brain injury. Hypertensive episodes are often diagnosed reactively, leading to late detection and lost time for intervention planning. A pro-active approach that predicts critical events several hours ahead of time could assist in directing attention to patients at risk. Approach: We developed a prediction framework that forecasts onsets of acute intracranial hypertension in the next 8 hours. It jointly uses cerebral auto-regulation indices, spectral energies and morphological pulse metrics to describe the neurological state of the patient. One-minute base windows were compressed by computing signal metrics, and then stored in a multi-scale history, from which physiological features were derived. Main results: Our model predicted events up to 8 hours in advance with alarm recall rates of 90% at a precision of 30% in the MIMIC- III waveform database, improving upon two baselines from the literature. We found that features derived from high-frequency waveforms substantially improved the prediction performance over simple statistical summaries of low-frequency time series, and each of the three feature classes contributed to the performance gain. The inclusion of long-term history up to 8 hours was especially important. Significance: Our results highlight the importance of information contained in high-frequency waveforms in the neurological intensive care unit. They could motivate future studies on pre-hypertensive patterns and the design of new alarm algorithms for critical events in the injured brain.

Authors Matthias Hüser, Adrian Kündig, Walter Karlen, Valeria De Luca, Martin Jaggi

Submitted Physiological Measurement

Link DOI

Abstract In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

Authors Francesco Locatello, Alp Yurtsever, Olivier Fercoq, Volkan Cevher

Submitted NeurIPS 2019

Link DOI