## Welcome to the Biomedical Informatics Lab of Prof. Dr. Gunnar Rätsch

The research in our group lies at the interface between methods research in Machine Learning, Genomics and Medical Informatics and relevant applications in biology and medicine.

We develop new analysis techniques that are capable of dealing with large amounts of medical and genomic data. These techniques aim to provide accurate predictions on the phenomenon at hand and to comprehensibly provide reasons for their prognoses, and thereby assist in gaining new biomedical insights.

Current research includes a) Machine Learning related to time-series analysis and iterative optimization algorithms, b) methods for transcriptome analyses to study transcriptome alterations in cancer, c) developing clinical decision support systems, in particular, for time series data from intensive care units, d) new graph genome algorithms to store and analyze very large sets of genomic sequences, and e) developing methods and resources for international sharing of genomic and clinical data, for instance, about variants in BRCA1/2.

#### Francesco Locatello, Anant Raj, Sai Praneeth Reddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U Stich, Martin Jaggi Revisiting First-Order Convex Optimization Over Linear Spaces ICML 2018

Abstract Two popular examples of first-order optimization methods over linear spaces are coordinate descent and matching pursuit algorithms, with their randomized variants. While the former targets the optimization by moving along coordinates, the latter considers a generalized notion of directions. Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $\mathcal {O}(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives. As a byproduct of our affine invariant analysis of matching pursuit, our rates for steepest coordinate descent are the tightest known. Furthermore, we show the first accelerated convergence rate $\mathcal {O}(1/t^ 2)$ for matching pursuit on convex objectives.

Authors Francesco Locatello, Anant Raj, Sai Praneeth Reddy, Gunnar Rätsch, Bernhard Schölkopf, Sebastian U Stich, Martin Jaggi

Submitted ICML 2018

#### Alp Yurtsever, Olivier Fercoq, Francesco Locatello, Volkan Cevher A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming ICML 2018

Abstract We propose a conditional gradient framework for a composite convex minimization template with broad applications. Our approach combines the notions of smoothing and homotopy under the CGM framework, and provably achieves the optimal $\mathcal {O}(1/\sqrt {k})$ convergence rate. We demonstrate that the same rate holds if the linear subproblems are solved approximately with additive or multiplicative error. Specific applications of the framework include the non-smooth minimization, semidefinite programming, and minimization with linear inclusion constraints over a compact domain. We provide numerical evidence to demonstrate the benefits of the new framework.

Authors Alp Yurtsever, Olivier Fercoq, Francesco Locatello, Volkan Cevher

Submitted ICML 2018

#### Francesco Locatello, Damien Vincent, Ilya Tolstikhin, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf Clustering Meets Implicit Generative Models Arxiv

Abstract Clustering is a cornerstone of unsupervised learning which can be thought as disentangling the multiple generative mechanisms underlying the data. In this paper we introduce an algorithmic framework to train mixtures of implicit generative models which we instantiate for variational autoencoders. Relying on an additional set of discriminators, we propose a competitive procedure in which the models only need to approximate the portion of the data distribution from which they can produce realistic samples. As a byproduct, each model is simpler to train, and a clustering interpretation arises naturally from the partitioning of the training points among the models. We empirically show that our approach splits the training distribution in a reasonable way and increases the quality of the generated samples.

Authors Francesco Locatello, Damien Vincent, Ilya Tolstikhin, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf

Submitted Arxiv

#### Francesco Locatello, Rajiv Khanna, Joydeep Ghosh, Gunnar Rätsch Boosting Variational Inference: an Optimization Perspective AISTATS 2018

Abstract Variational Inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, Boosting Variational Inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm. Our analyses yields novel theoretical insights on the Boosting of Variational Inference regarding the sufficient conditions for convergence, explicit sublinear/linear rates, and algorithmic simplifications.

Authors Francesco Locatello, Rajiv Khanna, Joydeep Ghosh, Gunnar Rätsch

Submitted AISTATS 2018

#### Claudia Calabrese, Kjong-Van Lehmann, Lara Urban, Fenglin Liu, Serap Erkek, Nuno Fonseca, Andre Kahles, Leena Helena Kilpinen-Barrett, Julia Markowski, PCAWG-3, Sebastian Waszak, Jan Korbel, Zemin Zhang, Alvis Brazma, Gunnar Raetsch, Roland Schwarz, Oliver Stegle Assessing the Gene Regulatory Landscape in 1,188 Human Tumors bioRxiv

Abstract Cancer is characterised by somatic genetic variation, but the effect of the majority of non-coding somatic variants and the interface with the germline genome are still unknown. We analysed the whole genome and RNA-seq data from 1,188 human cancer patients as provided by the Pan-cancer Analysis of Whole Genomes (PCAWG) project to map cis expression quantitative trait loci of somatic and germline variation and to uncover the causes of allele-specific expression patterns in human cancers. The availability of the first large-scale dataset with both whole genome and gene expression data enabled us to uncover the effects of the non-coding variation on cancer. In addition to confirming known regulatory effects, we identified novel associations between somatic variation and expression dysregulation, in particular in distal regulatory elements. Finally, we uncovered links between somatic mutational signatures and gene expression changes, including TERT and LMO2, and we explained the inherited risk factors in APOBEC-related mutational processes. This work represents the first large-scale assessment of the effects of both germline and somatic genetic variation on gene expression in cancer and creates a valuable resource cataloguing these effects.

Authors Claudia Calabrese, Kjong-Van Lehmann, Lara Urban, Fenglin Liu, Serap Erkek, Nuno Fonseca, Andre Kahles, Leena Helena Kilpinen-Barrett, Julia Markowski, PCAWG-3, Sebastian Waszak, Jan Korbel, Zemin Zhang, Alvis Brazma, Gunnar Raetsch, Roland Schwarz, Oliver Stegle

Submitted bioRxiv

Date 28 Nov 2017

Date 23 Nov 2017

Date 22 Nov 2017