Computational Transcriptomics

The group has gained a rich experience beginning with the analysis of whole-genome microarray data and contributed numerous methods to the field, like

  • methods for microarray probe normalization,
  • segmentation technique to identify new genes from tiling array data,
  • classification strategies to identify alternative splicing events based on tiling array data.

With high-throughput DNA and RNA sequencing techniques, we improved existing analysis approaches and developed new methodologies to address the opportunities and challenges of short read sequences. This new methods include:

  • alignment algorithms for mapping short sequence reads over exon boundaries
  • methodology for promoter recognition and analysis of regulatory motifs
  • reconstruction and quantification of transcript isoforms by mixed integer programming
  • differential analysis of transcript isoform expression with and without annotation

The application of our techniques has facilitated important discoveries related to the regulation of gene expression, RNA processing, and genome evolution, which led to a considerably improved understanding of when and how genes are expressed and processed. (See publications for examples).

 

Spliced Alignment for High Throughput Sequencing Data

mRNA sequence alignment, although a very well-studied problem, is still a challenging task. We developed the open-source PALMapper to address the challenges of sequencing errors, alternative splicing and micro-exons. It’s based on a versatile alignment engine that can sensitively align spliced reads in almost every setting. Possible sequencing errors are handled by a learnt error model that is incorporated into the spliced alignment dynamic program. The identification of new exon-exon junctions as well as alignment over several exon junctions is incorporated. PALMapper further allows for variation aware alignment, considering known variations to the reference genome (SNPs, insertions and deletions). Through an efficient implementation in C and C++, PALMapper can align up to 10 million reads per hour for a human-size (3 Gb) genome.[ Download ]  [ Tutorial  ] alignment

 

Identification and quantification of alternative splicing events

Alternative splicing is a major contributing factor to the transcriptome complexity in higher eukaryotes. Different transcript isoforms that result from alternative splicing play important roles in gene regulation and development, as evidenced by their dysregulation in disease states. Despite its importance, only a fraction of the landscape of alternative splicing is known, leaving many aspects to be elucidated. SplAdder is a program that augments existing gene annotation with evidence from RNA-Sequencing to identify all alternative splicing events that are possible and supported by the data. These events are quantified and can be used for downstream analysis such as data visualization or differential expression analysis. spladder

 

Deconvolution of Transcript-Isoform expression levels

High-throughput sequencing technologies open exciting new approaches to transcriptome profiling. For the important task of inferring transcript abundances from RNA-Seq data, we developed a new technique called rQuant that’s based on quadratic programming. Our method estimates biases introduced by experimental settings and is a powerful tool to reveal and quantify novel (alternative) transcripts. It is available as standalone open-source implementation as well as a web service rQuant.web.

Download ] [ Web Page ]

rquant

 

Transcript-Isoform Reconstruction and Expression Estimation

MiTie is a novel, open-source novel framework for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a small subset of transcript isoforms collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MiTie can

  • take advantage of known transcripts,
  • reconstruct and quantiy transcripts simultaneously in multiple samples,
  • resolve the location of multi-mapping reads

It is designed for genome- and assembly-based transcriptome reconstruction and its performance compares with well with methods such as Cufflinks.

Download ] [ Web Page ]

mitie

Differential Expression Testing of RNA Isoforms

rDiff is an open-source tool for accurate detection of differential RNA processing from RNA-Seq data. It implements two statistical tests to detect changes of the RNA processing between two samples such as alternative splicing, ribosome occupancy or RNA decay. rDiff.parametric is a powerful test, which can be applied for well annotated organisms to detect changes in the relative abundance of isoforms. rDiff.nonparametric is an alternative when the annotation is incomplete or missing.

Download ] [ Web Page ]

rdiff