Gunnar Rätsch, Prof. Dr.

Interdisciplinary research is about forging collaborations across disciplinary and geographic borders.

Head

E-Mail
raetsch@get-your-addresses-elsewhere.inf.ethz.ch
Phone
+41 44 632 2036
Address
ETH Zürich
Department of Computer Science
Biomedical Informatics Group Universitätsstrasse 6
CAB F53.2
8092 Zürich
Room
CAB F53.2
twitter
@gxr

Data scientist Gunnar Rätsch develops and applies advanced data analysis and modeling techniques to data from deep molecular profiling, medical and health records, as well as images.

He earned his Ph.D. at the German National Laboratory for Information Technology under supervision of Klaus-Robert Müller and was a postdoc with Bob Williamson and Bernhard Schölkopf. He received the Max Planck Young and Independent Investigator award and was leading the group on Machine Learning in Genome Biology at the Friedrich Miescher Laboratory in Tübingen (2005-2011). In 2012, he joined Memorial Sloan Kettering Cancer Center as Associate Faculty. In May 2016, he and his lab moved to Zürich to join the Computer Science Department of ETH Zürich.


The Rätsch laboratory focuses on bridging medicine and biology with computer science. The group’s research interests are relatively broad as it covers an area from algorithmic computer science to biomedical application fields. On the one hand, this includes work on algorithms that can learn or extract insights from data, on the other hand it involves developing tools that we and others employ for the analysis of large genomic or medical data sets, often in collaboration with biologists and physicians. These tools aim to solve real-world biomedical problems. In short, the group advances the state-of-the-art in data science algorithms, turns them into commonly usable tools for specific applications, and then collaborate with biologists and physicians on life science problems. Along the way, we learn more and can go back to improve the algorithms.

Abstract Most human protein-coding genes are regulated by multiple, distinct promoters, suggesting that the choice of promoter is as important as its level of transcriptional activity. While the role of promoters as driver elements in cancer has been recognized, the contribution of alternative promoters to regulation of the cancer transcriptome remains largely unexplored. Here we show that active promoters can be identified using RNA-Seq data, enabling the analysis of promoter activity in more than 1,000 cancer samples with matched whole genome sequencing data. We find that alternative promoters are a major contributor to tissue-specific regulation of isoform expression and that alternative promoters are frequently deregulated in cancer, affecting known cancer-genes and novel candidates. Noncoding passenger mutations are enriched at promoters of genes with lower regulatory complexity, whereas noncoding driver mutations occur at genes with multiple promoters, often affecting the promoter that shows the highest level of activity. Together our study demonstrates that the landscape of active promoters shapes the cancer transcriptome, opening many opportunities to further explore the interplay of regulatory mechanism and noncoding somatic mutations with transcriptional aberrations in cancer.

Authors Deniz Demircioğlu, Martin Kindermans, Tannistha Nandi, Engin Cukuroglu, Claudia Calabrese, Nuno A. Fonseca, Andre Kahles, Kjong Lehmann, Oliver Stegle, PCAWG-3, PCAWG-Network, Alvis Brazma, Angela Brooks, Gunnar Rätsch, Patrick Tan, Jonathan Göke

Submitted bioRxiv

Link DOI

Abstract Variational Inference is a popular technique to approximate a possibly intractable Bayesian posterior with a more tractable one. Recently, Boosting Variational Inference has been proposed as a new paradigm to approximate the posterior by a mixture of densities by greedily adding components to the mixture. In the present work, we study the convergence properties of this approach from a modern optimization viewpoint by establishing connections to the classic Frank-Wolfe algorithm. Our analyses yields novel theoretical insights on the Boosting of Variational Inference regarding the sufficient conditions for convergence, explicit sublinear/linear rates, and algorithmic simplifications.

Authors Francesco Locatello, Rajiv Khanna, Joydeep Ghosh, Gunnar Rätsch

Submitted submitted

Link DOI

Abstract During rheumatoid arthritis (RA), Tumor Necrosis Factor (TNF) activates fibroblast-like synoviocytes (FLS) inducing in a temporal order a constellation of genes, which perpetuate synovial inflammation. Although the molecular mechanisms regulating TNF-induced transcription are well characterized, little is known about the impact of mRNA stability on gene expression and the impact of TNF on decay rates of mRNA transcripts in FLS. To address these issues we performed RNA sequencing and genome-wide analysis of the mRNA stabilome in RA FLS. We found that TNF induces a biphasic gene expression program: initially, the inducible transcriptome consists primarily of unstable transcripts but progressively switches and becomes dominated by very stable transcripts. This temporal switch is due to: a) TNF-induced prolonged stabilization of previously unstable transcripts that enables progressive transcript accumulation over days and b) sustained expression and late induction of very stable transcripts. TNF-induced mRNA stabilization in RA FLS occurs during the late phase of TNF response, is MAPK-dependent, and involves several genes with pathogenic potential such as IL6, CXCL1, CXCL3, CXCL8/IL8, CCL2, and PTGS2. These results provide the first insights into genome-wide regulation of mRNA stability in RA FLS and highlight the potential contribution of dynamic regulation of the mRNA stabilome by TNF to chronic synovitis.

Authors Loupasakis K, Kuo D, Sokhi UK, Sohn C, Syracuse B, Giannopoulou EG, Park SH, Kang H, Rätsch G, Ivashkiv LB, Kalliolias GD

Submitted PLoS One

Link DOI

Abstract Generative Adversarial Networks (GANs) have shown remarkable success as a framework for training models to produce realistic-looking data. In this work, we propose a Recurrent GAN (RGAN) and Recurrent Conditional GAN (RCGAN) to produce realistic real-valued multi-dimensional time series, with an emphasis on their application to medical data. RGANs make use of recurrent neural networks in the generator and the discriminator. In the case of RCGANs, both of these RNNs are conditioned on auxiliary information. We demonstrate our models in a set of toy datasets, where we show visually and quantitatively (using sample likelihood and maximum mean discrepancy) that they can successfully generate realistic time-series. We also describe novel evaluation methods for GANs, where we generate a synthetic labelled training dataset, and evaluate on a real test set the performance of a model trained on the synthetic data, and vice-versa. We illustrate with these metrics that RCGANs can generate time-series data useful for supervised training, with only minor degradation in performance on real test data. This is demonstrated on digit classification from 'serialised' MNIST and by training an early warning system on a medical dataset of 17,000 patients from an intensive care unit. We further discuss and analyse the privacy concerns that may arise when using RCGANs to generate realistic synthetic medical time series data.

Authors Stephanie L Hyland, Cristobal Esteban, Gunnar Rätsch

Submitted arXiv

Link

Abstract Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe (FW) algorithms regained popularity in recent years due to their simplicity, effectiveness and theoretical guarantees. MP and FW address optimization over the linear span and the convex hull of a set of atoms, respectively. In this paper, we consider the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms for which we give explicit convergence rates and demonstrate excellent empirical performance. In particular, we derive sublinear (O(1/t)) convergence on general smooth and convex objectives, and linear convergence (O(e−t)) on strongly convex objectives, in both cases for general sets of atoms. Furthermore, we establish a clear correspondence of our algorithms to known algorithms from the MP and FW literature. Our novel algorithms and analyses target general atom sets and general objective functions, and hence are directly applicable to a large variety of learning settings.

Authors Francesco Locatello, Michael Tschannen, Gunnar Rätsch, Martin Jaggi

Submitted NIPS 2017

Link DOI

Abstract To understand the population genetics of structural variants and their effects on phenotypes, we developed an approach to mapping structural variants that segregate in a population sequenced at low coverage. We avoid calling structural variants directly. Instead, the evidence for a potential structural variant at a locus is indicated by variation in the counts of short-reads that map anomalously to that locus. These structural variant traits are treated as quantitative traits and mapped genetically, analogously to a gene expression study. Association between a structural variant trait at one locus, and genotypes at a distant locus indicate the origin and target of a transposition. Using ultra-low-coverage (0.3×) population sequence data from 488 recombinant inbred Arabidopsis thaliana genomes, we identified 6502 segregating structural variants. Remarkably, 25% of these were transpositions. While many structural variants cannot be delineated precisely, we validated 83% of 44 predicted transposition breakpoints by polymerase chain reaction. We show that specific structural variants may be causative for quantitative trait loci for germination and resistance to infection by the fungus Albugo laibachii, isolate Nc14. Further we show that the phenotypic heritability attributable to read-mapping anomalies differs from, and, in the case of time to germination and bolting, exceeds that due to standard genetic variation. Genes within structural variants are also more likely to be silenced or dysregulated. This approach complements the prevalent strategy of structural variant discovery in fewer individuals sequenced at high coverage. It is generally applicable to large populations sequenced at low-coverage, and is particularly suited to mapping transpositions.

Authors Martha Imprialou, André Kahles, Joshua G. Steffen, Edward J. Osborne, Xiangchao Gan, Janne Lempe, Amarjit Bhomra, Eric Belfield, Anne Visscher, Robert Greenhalgh, Nicholas P Harberd, Richard Goram, Jotun Hein, Alexandre Robert-Seilaniantz, Jonathan Jones, Oliver Stegle, Paula Kover, Miltos Tsiantis, Magnus Nordborg, Gunnar Rätsch, Richard M. Clark andRichard Mott

Submitted Genetics

Link DOI

Authors Natalie R. Davidson, ; PanCancer Analysis of Whole Genomes 3 (PCAWG-3) for ICGC, Alvis Brazma, Angela N. Brooks, Claudia Calabrese, Nuno A. Fonseca, Jonathan Goke, Yao He, Xueda Hu, Andre Kahles, Kjong-Van Lehmann, Fenglin Liu, Gunnar Rätsch, Siliang Li, Roland F. Schwarz, Mingyu Yang, Zemin Zhang, Fan Zhang and Liangtao Zheng

Submitted Proceedings of the American Association for Cancer Research Annual Meeting 2017

Link DOI

Abstract We present SplashRNA, a sequential classifier to predict potent microRNA-based short hairpin RNAs (shRNAs). Trained on published and novel data sets, SplashRNA outperforms previous algorithms and reliably predicts the most efficient shRNAs for a given gene. Combined with an optimized miR-E backbone, >90% of high-scoring SplashRNA predictions trigger >85% protein knockdown when expressed from a single genomic integration. SplashRNA can significantly improve the accuracy of loss-of-function genetics studies and facilitates the generation of compact shRNA libraries.

Authors Pelossof R, Fairchild L, Huang CH, Widmer C, Sreedharan VT, Sinha N, Lai DY, Guan Y, Premsrirut PK, Tschaharganeh DF, Hoffmann T, Thapar V, Xiang Q, Garippa RJ, Rätsch G, Zuber J, Lowe SW, Leslie CS, Fellmann C

Submitted Nature Biotechnology

Link DOI

Abstract MOTIVATION:Deep sequencing based ribosome footprint profiling can provide novel insights into the regulatory mechanisms of protein translation. However, the observed ribosome profile is fundamentally confounded by transcriptional activity. In order to decipher principles of translation regulation, tools that can reliably detect changes in translation efficiency in case-control studies are needed. RESULTS: We present a statistical framework and an analysis tool, RiboDiff, to detect genes with changes in translation efficiency across experimental treatments. RiboDiff uses generalized linear models to estimate the over-dispersion of RNA-Seq and ribosome profiling measurements separately, and performs a statistical test for differential translation efficiency using both mRNA abundance and ribosome occupancy. AVAILABILITY AND IMPLEMENTATION: RiboDiff webpage http://bioweb.me/ribodiff Source code including scripts for preprocessing the FASTQ data are available at http://github.com/ratschlab/ribodiff CONTACTS: zhongy@cbio.mskcc.org or raetsch@inf.ethz.chSupplementary information: Supplementary data are available at Bioinformatics online.

Authors Zhong Y, Karaletsos T, Drewe P, Sreedharan VT, Kuo D, Singh K, Wendel HG, Rätsch G.

Submitted Bioinformatics

Link DOI

Abstract Plants use light as source of energy and information to detect diurnal rhythms and seasonal changes. Sensing changing light conditions is critical to adjust plant metabolism and to initiate developmental transitions. Here, we analyzed transcriptome-wide alterations in gene expression and alternative splicing (AS) of etiolated seedlings undergoing photomorphogenesis upon exposure to blue, red, or white light. Our analysis revealed massive transcriptome reprogramming as reflected by differential expression of ∼20% of all genes and changes in several hundred AS events. For more than 60% of all regulated AS events, light promoted the production of a presumably protein-coding variant at the expense of an mRNA with nonsense-mediated decay-triggering features. Accordingly, AS of the putative splicing factor REDUCED RED-LIGHT RESPONSES IN CRY1CRY2 BACKGROUND1, previously identified as a red light signaling component, was shifted to the functional variant under light. Downstream analyses of candidate AS events pointed at a role of photoreceptor signaling only in monochromatic but not in white light. Furthermore, we demonstrated similar AS changes upon light exposure and exogenous sugar supply, with a critical involvement of kinase signaling. We propose that AS is an integration point of signaling pathways that sense and transmit information regarding the energy availability in plants.

Authors Hartmann L, Drewe-Boß P, Wießner T, Wagner G, Geue S, Lee HC, Obermüller DM, Kahles A, Behr J, Sinz FH, Rätsch G, Wachter A

Submitted Plant Cell

Link DOI

Abstract Personal genomes carry inherent privacy risks and protecting privacy poses major social and technological challenges. We consider the case where a user searches for genetic information (e.g. an allele) on a server that stores a large genomic database and aims to receive allele-associated information. The user would like to keep the query and result private and the server the database.

Authors Kana Shimizu, Koji Nuida, Gunnar Rätsch

Submitted Bioinformatics (Oxford, England)

Link Pubmed DOI

Abstract Understanding the occurrence and regulation of alternative splicing (AS) is a key task towards explaining the regulatory processes that shape the complex transcriptomes of higher eukaryotes. With the advent of high-throughput sequencing of RNA (RNA-Seq), the diversity of AS transcripts could be measured at an unprecedented depth. Although the catalog of known AS events has grown ever since, novel transcripts are commonly observed when working with less well annotated organisms, in the context of disease, or within large populations. Whereas an identification of complete transcripts is technically challenging and computationally expensive, focusing on single splicing events as a proxy for transcriptome characteristics is fruitful and sufficient for a wide range of analyses.

Authors Andre Kahles, Cheng Soon Ong, Yi Zhong, Gunnar Rätsch

Submitted Bioinformatics (Oxford, England)

Link Pubmed DOI

Abstract Mapping high-throughput sequencing data to a reference genome is an essential step for most analysis pipelines aiming at the computational analysis of genome and transcriptome sequencing data. Breaking ties between equally well mapping locations poses a severe problem not only during the alignment phase but also has significant impact on the results of downstream analyses. We present the multi-mapper resolution (MMR) tool that infers optimal mapping locations from the coverage density of other mapped reads.

Authors Andre Kahles, Jonas Behr, Gunnar Rätsch

Submitted Bioinformatics (Oxford, England)

Link Pubmed DOI

Authors Stephanie L Hyland, Theofanis Karaletsos, Gunnar Rätsch

Submitted NIPS Workshop on Machine Learning for Healthcare, 2015

Link

Authors M Tauber, T Darrell, Marius Kloft, M Pontil, Gunnar Rätsch, E Rodner, C Lengauer, M Bolten, R D Falgout, O Schenk

Link

Authors Julia Vogt, Marius Kloft, Stefan Stark, S S Raman, S Prabhakaran, V Roth, Gunnar Rätsch

Submitted Machine Learning

Link DOI

Abstract We report a mechanism of translational control that is determined by a requirement for eIF4A RNA helicase activity and underlies the anticancer effects of Silvestrol and related compounds. Briefly, activation of cap-dependent translation contributes to T-cell leukemia (T-ALL) development and maintenance. Accordingly, inhibition of translation initiation factor eIF4A with Silvestrol produces powerful therapeutic effects against T-ALL in vivo. We used transcriptome-scale ribosome footprinting on Silvestrol-treated T-ALL cells to identify Silvestrol-sensitive transcripts and the hallmark features of eIF4A-dependent translation. These include a long 5 UTR and a 12-mer sequence motif that encodes a guanine quartet (CGG)4. RNA folding algorithms as well as experimental evidences pinpoint the (CGG)4 motif as a common site of RNA G-quadruplex structures within the 5 UTR. In T-ALL these structures mark approximately eighty highly Silvestrol-sensitive transcripts that include key oncogenes and transcription factors and contribute to the drug's anti-leukemic action. Hence, the eIF4A-dependent translation of G-quadruplex containing transcripts emerges as a gene-specific and therapeutically targetable mechanism of translational control.

Authors Kamini Singh, Andrew L Wolfe, Yi Zhong, Gunnar Rätsch, Hans Guido Wendel

Link DOI

Abstract Identifying discriminative motifs underlying the functionality and evolution of organisms is a major challenge in computational biology. Machine learning approaches such as support vector machines (SVMs) achieve state-of-the-art performances in genomic discrimination tasks, but--due to its black-box character--motifs underlying its decision function are largely unknown. As a remedy, positional oligomer importance matrices (POIMs) allow us to visualize the significance of position-specific subsequences. Although being a major step towards the explanation of trained SVM models, they suffer from the fact that their size grows exponentially in the length of the motif, which renders their manual inspection feasible only for comparably small motif sizes, typically k ≤ 5. In this work, we extend the work on positional oligomer importance matrices, by presenting a new machine-learning methodology, entitled motifPOIM, to extract the truly relevant motifs--regardless of their length and complexity--underlying the predictions of a trained SVM model. Our framework thereby considers the motifs as free parameters in a probabilistic model, a task which can be phrased as a non-convex optimization problem. The exponential dependence of the POIM size on the oligomer length poses a major numerical challenge, which we address by an efficient optimization framework that allows us to find possibly overlapping motifs consisting of up to hundreds of nucleotides. We demonstrate the efficacy of our approach on a synthetic data set as well as a real-world human splice site data set.

Authors Marina M C Vidovic, Nico Görnitz, Klaus Robert Müller, Gunnar Rätsch, Marius Kloft

Submitted PloS one

Link Pubmed DOI

Abstract Interferon-γ (IFN-gamma) primes macrophages for enhanced microbial killing and inflammatory activation by Toll-like receptors (TLRs), but little is known about the regulation of cell metabolism or mRNA translation during this priming. We found that IFN-γ regulated the metabolism and mRNA translation of human macrophages by targeting the kinases mTORC1 and MNK, both of which converge on the selective regulator of translation initiation eIF4E. Physiological downregulation of mTORC1 by IFN-γ was associated with autophagy and translational suppression of repressors of inflammation such as HES1. Genome-wide ribosome profiling in TLR2-stimulated macrophages showed that IFN-γ selectively modulated the macrophage translatome to promote inflammation, further reprogram metabolic pathways and modulate protein synthesis. These results show that IFN-γ-mediated metabolic reprogramming and translational regulation are key components of classical inflammatory macrophage activation.

Authors Xiaodi Su, Yingpu Yu, Yi Zhong, Eugenia G Giannopoulou, Xiaoyu Hu, Hui Liu, Justin R Cross, Gunnar Rätsch, Charles M Rice, Lionel B Ivashkiv

Submitted Nature immunology

Link Pubmed DOI

Abstract Epigenome modulation potentially provides a mechanism for organisms to adapt, within and between generations. However, neither the extent to which this occurs, nor the mechanisms involved are known. Here we investigate DNA methylation variation in Swedish Arabidopsis thaliana accessions grown at two different temperatures. Environmental effects were limited to transposons, where CHH methylation was found to increase with temperature. Genome-wide association studies (GWAS) revealed that the extensive CHH methylation variation was strongly associated with genetic variants in both cis and trans, including a major trans-association close to the DNA methyltransferase CMT2. Unlike CHH methylation, CpG gene body methylation (GBM) was not affected by growth temperature, but was instead correlated with the latitude of origin. Accessions from colder regions had higher levels of GBM for a significant fraction of the genome, and this was associated with increased transcription for the genes affected. GWAS revealed that this effect was largely due to trans-acting loci, many of which showed evidence of local adaptation.

Authors Manu J Dubin, Pei Zhang, Dazhe Meng, Marie Stanislas Remigereau, Edward J Osborne, Francesco Paolo Casale, Philipp Drewe, Andre Kahles, Geraldine Jean, Bjarni Vilhjalmsson, Joanna Jagoda, Selen Irez, Viktor Voronin, Qiang Song, Quan Long, Gunnar Rätsch, Oliver Stegle, Richard M Clark, Magnus Nordborg

Submitted eLife

Link Pubmed DOI

Abstract We present a genome-wide analysis of splicing patterns of 282 kidney renal clear cell carcinoma patients in which we integrate data from whole-exome sequencing of tumor and normal samples, RNA-seq and copy number variation. We proposed a scoring mechanism to compare splicing patterns in tumor samples to normal samples in order to rank and detect tumor-specific isoforms that have a potential for new biomarkers. We identified a subset of genes that show introns only observable in tumor but not in normal samples, ENCODE and GEUVADIS samples. In order to improve our understanding of the underlying genetic mechanisms of splicing variation we performed a large-scale association analysis to find links between somatic or germline variants with alternative splicing events. We identified 915 cis- and trans-splicing quantitative trait loci (sQTL) associated with changes in splicing patterns. Some of these sQTL have previously been associated with being susceptibility loci for cancer and other diseases. Our analysis also allowed us to identify the function of several COSMIC variants showing significant association with changes in alternative splicing. This demonstrates the potential significance of variants affecting alternative splicing events and yields insights into the mechanisms related to an array of disease phenotypes.

Authors Kjong Van Lehmann, Andre Kahles, Cyriac Kandoth, William Lee, Nikolaus Schultz, Oliver Stegle, Gunnar Rätsch

Submitted Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Link Pubmed

Authors Xinghua Lou, Marius Kloft, Gunnar Rätsch, F A Hamprecht

Link

Abstract Analysis of microscopy images can provide insight into many biological processes. One particularly challenging problem is cellular nuclear segmentation in highly anisotropic and noisy 3D image data. Manually localizing and segmenting each and every cellular nucleus is very time-consuming, which remains a bottleneck in large-scale biological experiments. In this work, we present a tool for automated segmentation of cellular nuclei from 3D fluorescent microscopic data. Our tool is based on state-of-the-art image processing and machine learning techniques and provides a user-friendly graphical user interface. We show that our tool is as accurate as manual annotation and greatly reduces the time for the registration.

Authors Christian K Widmer, Stephanie Heinrich, Philipp Drewe, Xinghua Lou, Shefali Umrania, Gunnar Rätsch

Submitted Signal, image and video processing

Link Pubmed DOI

Abstract Alternative splicing is an essential mechanism for increasing transcriptome and proteome diversity in eukaryotes. Particularly in multicellular eukaryotes, this mechanism is involved in the regulation of developmental and physiological processes like growth, differentiation and signal transduction.

Authors Arash Kianianmomeni, Cheng Soon Ong, Gunnar Rätsch, Armin Hallmann

Submitted BMC genomics

Link Pubmed DOI

Abstract Intraspecific genetic incompatibilities prevent the assembly of specific alleles into single genotypes and influence genome- and species-wide patterns of sequence variation. A common incompatibility in plants is hybrid necrosis, characterized by autoimmune responses due to epistatic interactions between natural genetic variants. By systematically testing thousands of F1 hybrids of Arabidopsis thaliana strains, we identified a small number of incompatibility hot spots in the genome, often in regions densely populated by nucleotide-binding domain and leucine-rich repeat (NLR) immune receptor genes. In several cases, these immune receptor loci interact with each other, suggestive of conflict within the immune system. A particularly dangerous locus is a highly variable cluster of NLR genes, DM2, which causes multiple independent incompatibilities with genes that encode a range of biochemical functions, including NLRs. Our findings suggest that deleterious interactions of immune receptors limit the combinations of favorable disease resistance alleles accessible to plant genomes.

Authors Eunyoung Chae, Kirsten Bomblies, Sang Tae Kim, Darya Karelina, Maricris Zaidem, Stephan Ossowski, Carmen Martin Pizarro, Roosa A E Laitinen, Beth A Rowan, Hezi Tenenboim, Sarah Lechner, Monika Demar, Anette Habring Müller, Christa Lanz, Gunnar Rätsch, Detlef Weigel

Submitted Cell

Link Pubmed DOI

Abstract Transcription factor (TF) DNA sequence preferences direct their regulatory activity, but are currently known for only ∼1% of eukaryotic TFs. Broadly sampling DNA-binding domain (DBD) types from multiple eukaryotic clades, we determined DNA sequence preferences for >1,000 TFs encompassing 54 different DBD classes from 131 diverse eukaryotes. We find that closely related DBDs almost always have very similar DNA sequence preferences, enabling inference of motifs for ∼34% of the ∼170,000 known or predicted eukaryotic TFs. Sequences matching both measured and inferred motifs are enriched in chromatin immunoprecipitation sequencing (ChIP-seq) peaks and upstream of transcription start sites in diverse eukaryotic lineages. SNPs defining expression quantitative trait loci in Arabidopsis promoters are also enriched for predicted TF binding sites. Importantly, our motif "library" can be used to identify specific TFs whose binding may be altered by human disease risk alleles. These data present a powerful resource for mapping transcriptional networks across eukaryotes.

Authors Matthew T Weirauch, Ally Yang, Mihai Albu, Atina G Cote, Alejandro Montenegro Montero, Philipp Drewe, Hamed S Najafabadi, Samuel A Lambert, Ishminder Mann, Kate Cook, Hong Zheng, Alejandra Goity, Harm van Bakel, Jean Claude Lozano, Mary Galli, Mathew G Lewsey, Eryong Huang, Tuhin Mukherjee, Xiaoting Chen, John S Reece Hoyes, Sridhar Govindarajan, Gad Shaulsky, Albertha J M Walhout, Francois Yves Bouget, Gunnar Rätsch, Luis F Larrondo, Joseph R Ecker, Timothy R Hughes

Submitted Cell

Link Pubmed DOI

Abstract The translational control of oncoprotein expression is implicated in many cancers. Here we report an eIF4A RNA helicase-dependent mechanism of translational control that contributes to oncogenesis and underlies the anticancer effects of silvestrol and related compounds. For example, eIF4A promotes T-cell acute lymphoblastic leukaemia development in vivo and is required for leukaemia maintenance. Accordingly, inhibition of eIF4A with silvestrol has powerful therapeutic effects against murine and human leukaemic cells in vitro and in vivo. We use transcriptome-scale ribosome footprinting to identify the hallmarks of eIF4A-dependent transcripts. These include 5' untranslated region (UTR) sequences such as the 12-nucleotide guanine quartet (CGG)4 motif that can form RNA G-quadruplex structures. Notably, among the most eIF4A-dependent and silvestrol-sensitive transcripts are a number of oncogenes, superenhancer-associated transcription factors, and epigenetic regulators. Hence, the 5' UTRs of select cancer genes harbour a targetable requirement for the eIF4A RNA helicase.

Authors Andrew L Wolfe, Kamini Singh, Yi Zhong, Philipp Drewe, Vinagolu K Rajasekhar, Viraj R Sanghvi, Konstantinos J Mavrakis, Man Jiang, Justine E Roderick, Joni Van der Meulen, Jonathan H Schatz, Christina M Rodrigo, Chunying Zhao, Pieter Rondou, Elisa de Stanchina, Julie Teruya Feldstein, Michelle A Kelliher, Frank Speleman, John A Porco, Jerry Pelletier, Gunnar Rätsch, Hans Guido Wendel

Submitted Nature

Link Pubmed DOI

Abstract We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.

Authors Vipin T Sreedharan, Sebastian J Schultheiss, Geraldine Jean, Andre Kahles, Regina Bohnert, Philipp Drewe, Pramod Mudrakarta, Nico Görnitz, Georg Zeller, Gunnar Rätsch

Submitted Bioinformatics (Oxford, England)

Link Pubmed DOI

Abstract The intestinal microbiota is a microbial ecosystem of crucial importance to human health. Understanding how the microbiota confers resistance against enteric pathogens and how antibiotics disrupt that resistance is key to the prevention and cure of intestinal infections. We present a novel method to infer microbial community ecology directly from time-resolved metagenomics. This method extends generalized Lotka-Volterra dynamics to account for external perturbations. Data from recent experiments on antibiotic-mediated Clostridium difficile infection is analyzed to quantify microbial interactions, commensal-pathogen interactions, and the effect of the antibiotic on the community. Stability analysis reveals that the microbiota is intrinsically stable, explaining how antibiotic perturbations and C. difficile inoculation can produce catastrophic shifts that persist even after removal of the perturbations. Importantly, the analysis suggests a subnetwork of bacterial groups implicated in protection against C. difficile. Due to its generality, our method can be applied to any high-resolution ecological time-series data to infer community structure and response to external stimuli.

Authors Richard R Stein, Vanni Bucci, Nora C Toussaint, Charlie G Buffie, Gunnar Rätsch, Eric G Pamer, Chris Sander, Joao B Xavier

Submitted PLoS computational biology

Link Pubmed DOI

Abstract High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.

Authors Par G Engstrom, Tamara Steijger, Botond Sipos, Gregory R Grant, Andre Kahles, Gunnar Rätsch, Nick Goldman, Tim J Hubbard, Jennifer Harrow, Roderic Guigo, Paul Bertone

Submitted Nature methods

Link Pubmed DOI

Abstract The nonsense-mediated decay (NMD) surveillance pathway can recognize erroneous transcripts and physiological mRNAs, such as precursor mRNA alternative splicing (AS) variants. Currently, information on the global extent of coupled AS and NMD remains scarce and even absent for any plant species. To address this, we conducted transcriptome-wide splicing studies using Arabidopsis thaliana mutants in the NMD factor homologs UP FRAMESHIFT1 (UPF1) and UPF3 as well as wild-type samples treated with the translation inhibitor cycloheximide. Our analyses revealed that at least 17.4% of all multi-exon, protein-coding genes produce splicing variants that are targeted by NMD. Moreover, we provide evidence that UPF1 and UPF3 act in a translation-independent mRNA decay pathway. Importantly, 92.3% of the NMD-responsive mRNAs exhibit classical NMD-eliciting features, supporting their authenticity as direct targets. Genes generating NMD-sensitive AS variants function in diverse biological processes, including signaling and protein modification, for which NaCl stress-modulated AS-NMD was found. Besides mRNAs, numerous noncoding RNAs and transcripts derived from intergenic regions were shown to be NMD responsive. In summary, we provide evidence for a major function of AS-coupled NMD in shaping the Arabidopsis transcriptome, having fundamental implications in gene regulation and quality control of transcript processing.

Authors Gabriele Drechsel, Andre Kahles, Anil K Kesarwani, Eva Stauffer, Jonas Behr, Philipp Drewe, Gunnar Rätsch, Andreas Wachter

Submitted The Plant cell

Link Pubmed DOI

Abstract High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction.

Authors Jonas Behr, Andre Kahles, Yi Zhong, Vipin T Sreedharan, Philipp Drewe, Gunnar Rätsch

Submitted Bioinformatics (Oxford, England)

Link Pubmed DOI

Abstract Deep transcriptome sequencing (RNA-Seq) has become a vital tool for studying the state of cells in the context of varying environments, genotypes and other factors. RNA-Seq profiling data enable identification of novel isoforms, quantification of known isoforms and detection of changes in transcriptional or RNA-processing activity. Existing approaches to detect differential isoform abundance between samples either require a complete isoform annotation or fall short in providing statistically robust and calibrated significance estimates. Here, we propose a suite of statistical tests to address these open needs: a parametric test that uses known isoform annotations to detect changes in relative isoform abundance and a non-parametric test that detects differential read coverages and can be applied when isoform annotations are not available. Both methods account for the discrete nature of read counts and the inherent biological variability. We demonstrate that these tests compare favorably to previous methods, both in terms of accuracy and statistical calibrations. We use these techniques to analyze RNA-Seq libraries from Arabidopsis thaliana and Drosophila melanogaster. The identified differential RNA processing events were consistent with RT-qPCR measurements and previous studies. The proposed toolkit is available from http://bioweb.me/rdiff and enables in-depth analyses of transcriptomes, with or without available isoform annotation.

Authors Philipp Drewe, Oliver Stegle, Lisa Hartmann, Andre Kahles, Regina Bohnert, Andreas Wachter, Karsten Borgwardt, Gunnar Rätsch

Submitted Nucleic acids research

Link Pubmed DOI

Abstract Using a variety of techniques including Topic Modeling, PCA and Bi-clustering, we explore electronic patient records in the form of unstructured clinical notes and genetic mutation test results. Our ultimate goal is to gain insight into a unique body of clinical data, specifically regarding the topics discussed within the note content and relationships between patient clinical notes and their underlying genetics.

Authors K R Chan, Xinghua Lou, Theo Karaletsos, C Crosbie, S Gardos, D Artz, Gunnar Rätsch

Submitted ICDM Workshop on Biological Data Mining and its Applications in Healthcare

Link DOI

Abstract CD45 encodes a trans-membrane protein-tyrosine phosphatase expressed in diverse cells of the immune system. By combinatorial use of three variable exons 4-6, isoforms are generated that differ in their extracellular domain, thereby modulating phosphatase activity and immune response. Alternative splicing of these CD45 exons involves two heterogeneous ribonucleoproteins, hnRNP L and its cell-type specific paralog hnRNP L-like (LL). To address the complex combinatorial splicing of exons 4-6, we investigated hnRNP L/LL protein expression in human B-cells in relation to CD45 splicing patterns, applying RNA-Seq. In addition, mutational and RNA-binding analyses were carried out in HeLa cells. We conclude that hnRNP LL functions as the major CD45 splicing repressor, with two CA elements in exon 6 as its primary target. In exon 4, one element is targeted by both hnRNP L and LL. In contrast, exon 5 was never repressed on its own and only co-regulated with exons 4 and 6. Stable L/LL interaction requires CD45 RNA, specifically exons 4 and 6. We propose a novel model of combinatorial alternative splicing: HnRNP L and LL cooperate on the CD45 pre-mRNA, bridging exons 4 and 6 and looping out exon 5, thereby achieving full repression of the three variable exons.

Authors Marco Preussner, Silke Schreiner, Lee Hsueh Hung, Martina Porstner, Hans Martin Jack, Vladimir Benes, Gunnar Rätsch, Albrecht Bindereif

Submitted Nucleic Acids Res

Link DOI

Abstract Deep sequencing of transcriptomes allows quantitative and qualitative analysis of many RNA species in a sample, with parallel comparison of expression levels, splicing variants, natural antisense transcripts, RNA editing and transcriptional start and stop sites the ideal goal. By computational modeling, we show how libraries of multiple insert sizes combined with strand-specific, paired-end (SS-PE) sequencing can increase the information gained on alternative splicing, especially in higher eukaryotes. Despite the benefits of gaining SS-PE data with paired ends of varying distance, the standard Illumina protocol allows only non-strand-specific, paired-end sequencing with a single insert size. Here, we modify the Illumina RNA ligation protocol to allow SS-PE sequencing by using a custom pre-adenylated 3' adaptor. We generate parallel libraries with differing insert sizes to aid deconvolution of alternative splicing events and to characterize the extent and distribution of natural antisense transcription in C. elegans. Despite stringent requirements for detection of alternative splicing, our data increases the number of intron retention and exon skipping events annotated in the Wormbase genome annotations by 127% and 121%, respectively. We show that parallel libraries with a range of insert sizes increase transcriptomic information gained by sequencing and that by current established benchmarks our protocol gives competitive results with respect to library quality.

Authors Lisa M Smith, Lisa Hartmann, Philipp Drewe, Regina Bohnert, Andre Kahles, Christa Lanz, Gunnar Rätsch

Submitted RNA biology

Link Pubmed DOI

For the rest of the publications see publications page.