## Welcome to the Biomedical Informatics Lab of Prof. Dr. Gunnar Rätsch

The research in our group lies at the interface between methods research in Machine Learning, Genomics and Medical Informatics and relevant applications in biology and medicine.

We develop new analysis techniques that are capable of dealing with large amounts of medical and genomic data. These techniques aim to provide accurate predictions on the phenomenon at hand and to comprehensibly provide reasons for their prognoses, and thereby assist in gaining new biomedical insights.

Current research includes a) Machine Learning related to time-series analysis and iterative optimization algorithms, b) methods for transcriptome analyses to study transcriptome alterations in cancer, c) developing clinical decision support systems, in particular, for time series data from intensive care units, d) new graph genome algorithms to store and analyze very large sets of genomic sequences, and e) developing methods and resources for international sharing of genomic and clinical data, for instance, about variants in BRCA1/2.

#### Francesco Locatello, Alp Yurtsever, Olivier Fercoq, Volkan Cevher Stochastic Conditional Gradient Method for Composite Convex Minimization NeurIPS 2019

Abstract In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

Authors Francesco Locatello, Alp Yurtsever, Olivier Fercoq, Volkan Cevher

Submitted NeurIPS 2019

#### Laura Manduchi, Matthias Hüser, Gunnar Rätsch, Vincent Fortuin Variational PSOM: Deep Probabilistic Clustering with Self-Organizing Maps arXiv Preprints

Abstract Generating visualizations and interpretations from high-dimensional data is a common problem in many fields. Two key approaches for tackling this problem are clustering and representation learning. There are very performant deep clustering models on the one hand and interpretable representation learning techniques, often relying on latent topological structures such as self-organizing maps, on the other hand. However, current methods do not yet successfully combine these two approaches. We present a new deep architecture for probabilistic clustering, VarPSOM, and its extension to time series data, VarTPSOM. We show that they achieve superior clustering performance compared to current deep clustering methods on static MNIST/Fashion-MNIST data as well as medical time series, while inducing an interpretable representation. Moreover, on the medical time series, VarTPSOM successfully predicts future trajectories in the original data space.

Authors Laura Manduchi, Matthias Hüser, Gunnar Rätsch, Vincent Fortuin

Submitted arXiv Preprints

#### Demircioğlu D, Cukuroglu E, Kindermans M, Nandi T, Calabrese C, Fonseca NA, Kahles A, Lehmann KV, Stegle O, Brazma A, Brooks AN, Rätsch G, Tan P, Göke J. A Pan-cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters. The Cell

Abstract Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. However, while a global change in transcription is recognized as a defining feature of cancer, the contribution of alternative promoters still remains largely unexplored. Here, we infer active promoters using RNA-seq data from 18,468 cancer and normal samples, demonstrating that alternative promoters are a major contributor to context-specific regulation of transcription. We find that promoters are deregulated across tissues, cancer types, and patients, affecting known cancer genes and novel candidates. For genes with independently regulated promoters, we demonstrate that promoter activity provides a more accurate predictor of patient survival than gene expression. Our study suggests that a dynamic landscape of active promoters shapes the cancer transcriptome, opening new diagnostic avenues and opportunities to further explore the interplay of regulatory mechanisms with transcriptional aberrations in cancer.

Authors Demircioğlu D, Cukuroglu E, Kindermans M, Nandi T, Calabrese C, Fonseca NA, Kahles A, Lehmann KV, Stegle O, Brazma A, Brooks AN, Rätsch G, Tan P, Göke J.

Submitted The Cell

#### Vincent Fortuin, Gunnar Rätsch, Stephan Mandt Multivariate Time Series Imputation with Variational Autoencoders arXiv Preprints

Abstract Multivariate time series with missing values are common in many areas, for instance in healthcare and finance. To face this problem, modern data imputation approaches should (a) be tailored to sequential data, (b) deal with high dimensional and complex data distributions, and (c) be based on the probabilistic modeling paradigm for interpretability and confidence assessment. However, many current approaches fall short in at least one of these aspects. Drawing on advances in deep learning and scalable probabilistic modeling, we propose a new deep sequential variational autoencoder approach for dimensionality reduction and data imputation. Temporal dependencies are modeled with a Gaussian process prior and a Cauchy kernel to reflect multi-scale dynamics in the latent space. We furthermore use a structured variational inference distribution that improves the scalability of the approach. We demonstrate that our model exhibits superior imputation performance on benchmark tasks and challenging real-world medical data.

Authors Vincent Fortuin, Gunnar Rätsch, Stephan Mandt

Submitted arXiv Preprints

#### Singh K, Lin J, Zhong Y, Burčul A, Mohan P, Jiang M, Sun L, Yong-Gonzalez V, Viale A, Cross JR, Hendrickson RC, Rätsch G, Ouyang Z, Wendel HG. c-MYC regulates mRNA translation efficiency and start-site selection in lymphoma. J Exp Med.

Abstract The oncogenic c-MYC (MYC) transcription factor has broad effects on gene expression and cell behavior. We show that MYC alters the efficiency and quality of mRNA translation into functional proteins. Specifically, MYC drives the translation of most protein components of the electron transport chain in lymphoma cells, and many of these effects are independent from proliferation. Specific interactions of MYC-sensitive RNA-binding proteins (e.g., SRSF1/RBM42) with 5'UTR sequence motifs mediate many of these changes. Moreover, we observe a striking shift in translation initiation site usage. For example, in low-MYC conditions, lymphoma cells initiate translation of the CD19 mRNA from a site in exon 5. This results in the truncation of all extracellular CD19 domains and facilitates escape from CD19-directed CAR-T cell therapy. Together, our findings reveal MYC effects on the translation of key metabolic enzymes and immune receptors in lymphoma cells.

Authors Singh K, Lin J, Zhong Y, Burčul A, Mohan P, Jiang M, Sun L, Yong-Gonzalez V, Viale A, Cross JR, Hendrickson RC, Rätsch G, Ouyang Z, Wendel HG.

Submitted J Exp Med.

Date 12 Jun 2019

Date 24 Apr 2019

Date 10 Jan 2019