Computational Genomics and Transcriptomics

Being an active contributor to the field since over 10 years, the group has gained a rich experience beginning with the analysis of whole-genome microarray data and contributed numerous methods to the field, like 

  • methods for microarray probe normalization,
  • segmentation technique to identify new genes from tiling array data,
  • classification strategies to identify alternative splicing events based on tiling array data.

With high-throughput DNA and RNA sequencing techniques, we improved existing analysis approaches and developed new methodologies to address the opportunities and challenges of short read sequences. Such new methods include:

  • alignment algorithms for mapping short sequence reads over exon boundaries
  • methodology for promoter recognition and analysis of regulatory motifs
  • reconstruction and quantification of transcript isoforms by mixed integer programming
  • differential analysis of transcript isoform expression with and without annotation 

The application of our techniques has facilitated important discoveries related to the regulation of gene expression, RNA processing, and genome evolution, which led to a considerably improved understanding of when and how genes are expressed and processed. (See publications for examples).

Cancer Genomics / Transcriptomics

The group actively contributes to international cancer research consortia, including The Cancer Genome Atlas and the International Cancer Genome Consortium. Our research mostly focusses on the investigation of aberrations in transcriptional regulation, specifically alternative splicing, and its relationship to underlying alterations of the somatic genotype.

We employ techniques from statistical genetics and modern machine learning to identify loci that have a potentially causative effect and drive cancer progression. Further, the group derives models to describe tumor evolution and asses tumor heterogeneity.

We generate compendia of molecular phenotypes based on RNA-Seq data, including expression measurements and characterization of alternative splicing events. Statistical methods are being employed to find genes that show significant differences between tumor and normal populations and look for functional consequences. The group has a particularly interested in gaining biological insights about splicing mechanisms in cancer as well as technical insights into understanding the effect of heterogeneity and clonal evolution onto association analysis.

Spliced Alignment for High Throughput Sequencing Data

mRNA sequence alignment, although a very well-studied problem, is still a challenging task. We developed the open-source PALMapper to address the challenges of sequencing errors, alternative splicing and micro-exons. It’s based on a versatile alignment engine that can sensitively align spliced reads in almost every setting. Possible sequencing errors are handled by a learnt error model that is incorporated into the spliced alignment dynamic program. The identification of new exon-exon junctions as well as alignment over several exon junctions is incorporated. PALMapper further allows for variation aware alignment, considering known variations to the reference genome (SNPs, insertions and deletions). Through an efficient implementation in C and C++, PALMapper can align up to 10 million reads per hour for a human-size (3 Gb) genome. [ Download ]  [ Tutorial  ]

Identification and quantification of alternative splicing events

Alternative splicing is a major contributing factor to the transcriptome complexity in higher eukaryotes. Different transcript isoforms that result from alternative splicing play important roles in gene regulation and development, as evidenced by their dysregulation in disease states. Despite its importance, only a fraction of the landscape of alternative splicing is known, leaving many aspects to be elucidated. SplAdder is a program that augments existing gene annotation with evidence from RNA-Sequencing to identify all alternative splicing events that are possible and supported by the data. These events are quantified and can be used for downstream analysis such as data visualization or differential expression analysis.

We have used the SplAdder pipeline in a collaboration project with researchers from the ZMBP in Tübingen to study the nonsense-mediated decay (NMD) surveillance pathway, which can recognize erroneous transcripts and physiological mRNAs, such as precursor mRNA alternative splicing variants. We conducted transcriptome-wide splicing studies using Arabidopsis thaliana mutants in the NMD factor homologs UPF1 and UPF3 as well as wild-type samples treated with the translation inhibitor cycloheximide. Our analyses revealed that at least 17.4% of all multiple-exon, protein-coding genes produce splicing variants that are targeted by NMD. We provide evidence for a major function of alternative splicing-coupled NMD in shaping the Arabidopsis transcriptome, having fundamental implications in gene regulation and quality control of transcript processing.

Deconvolution of Transcript-Isoform expression levels

High-throughput sequencing technologies open exciting new approaches to transcriptome profiling. For the important task of inferring transcript abundances from RNA-Seq data, we developed a new technique called rQuant that’s based on quadratic programming. Our method estimates biases introduced by experimental settings and is a powerful tool to reveal and quantify novel (alternative) transcripts. It is available as standalone open-source implementation as well as a web service rQuant.web. Download ]

Transcript-Isoform Reconstruction and Expression Estimation

MiTie is a novel, open-source novel framework for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a small subset of transcript isoforms collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MiTie can

  • take advantage of known transcripts,
  • reconstruct and quantiy transcripts simultaneously in multiple samples,
  • resolve the location of multi-mapping reads

It is designed for genome- and assembly-based transcriptome reconstruction and its performance compares with well with methods such as Cufflinks. [ Download ]

Differential Expression Testing of RNA Isoforms

rDiff is an open-source tool for accurate detection of differential RNA processing from RNA-Seq data. It implements two statistical tests to detect changes of the RNA processing between two samples such as alternative splicing, ribosome occupancy or RNA decay. rDiff.parametric is a powerful test, which can be applied for well annotated organisms to detect changes in the relative abundance of isoforms. rDiff.nonparametric is an alternative when the annotation is incomplete or missing. [ Download ]