KIRMES: Kernel-based Identification of Regulatory Modules in Euchromatic Sequences

This is the companion website for the paper "Kernel-based Identification of Regulatory Modules in Euchromatic Sequences" by Sebastian J. Schultheiss, Wolfgang Busch, Jan Lohmann, Oliver Kohlbacher, and Gunnar Rätsch.


Motivation: Understanding transcriptional regulation is one of the main challenges in computational biology. An important problem is the identification of transcription factor binding sites in promoter regions of potential transcription factor target genes. It is typically approached by position weight matrix-based motif identification algorithms using Gibbs sampling or heuristics for extending seed oligos. Such algorithms succeed in identifying single, relatively well conserved binding sites, but tend to fail when it comes to the identification of combinations of several degenerate binding sites as those often found in cis-regulatory modules.

Results: We propose a new algorithm that combines the benefits of existing motif finding with the ones of Support Vector Machines (SVMs) to find degenerate motifs in order to improve the modeling of regulatory modules. In experiments on microarray data from Arabidopsis thaliana we were able to show that the newly developed strategy significantly improves the recognition of transcription factor targets.

The pdf file of the published GCB 2008 proceedings paper can be found here.

Supplementary material:

  • Appendix to the GCB paper explaining the conservation information and the microarray experiments

  • Supplementary material to the Bioinformatics submission

  • Slides to the talk given at the German Conference on Bioinformatics 2008 in Dresden, Germany on September 10, 2008.

  • Links to the microarray experiments we used as described in the paper. The experiments were uploaded to the EBI ArrayExpress repository:

    • E-MEXP-98: Transcription profiling of heat stress response in Arabidopsis wild type and hsf1x3 double knockout
    • E-MEXP-432: Transcription profiling of inducible overexpression of Arabidopsis meristem regulators by AlcR/AlcA system in continuous light
  • Yeast ChIP-on-chip FASTAs used in the comparison to the PRIORITY Gibbs sampler (positives and negatives) as obtained from the extensive study by Harbison et al., thanks to Raluca Gordân

  • FASTA files for the Arabidopsis thaliana microarray experiments from the paper

  • Complete Python source code: Installation instructions are here, the program is available for download here in a newer version: Galaxy compatibility and shogun-0.10.0 compatibility.

  • Web service with KIRMES' functionality is being integrated into FML's Galaxy instance is available.