Bootstrapping the Alternative Splicing Annotation of Newly Sequenced Genomes

While increasingly more genomes are becoming available every year, many genomes have rather poor EST coverage. The small number of ESTs not only means that the identification of genes has to rely mostly on computational gene finders, but also that almost no alternative splicing events can be identified as most introns are only covered by a single EST sequence. Hence, the initial alternative splicing annotation is a particularly challenging problem that we will consider in this work. We discuss several methods of identifying candidate regions that are highly likely to contain alternative splicing events for further experimental analysis. We compare two approaches for discovery of exon skipping events in Caenorhabditis remanei using known events in C. elegans, a closely related and well studied organism: (a) by finding close homologs in C. remanei of alternatively spliced genes in C. elegans and (b) by learning about discriminative characteristics of alternatively spliced genes in C. elegans using Support Vector Machines (SVMs) [1], and using the trained SVM to predict on C. remanei. We evaluate the success of the methods by experimentally analyzing considered regions (by RT-PCR and sequencing). After having obtained a reasonably sized set of known alternative splicing events we propose a method called active learning as similarly used in drug discovery [2]. The idea is to iteratively predict alternative splicing and to perform a few biological validation experiments. In a simulation study on C. elegans we show that using this method we can significantly reduce the number of experiments needed to identify a reasonably large number of confirmed alternative splicing events.

REFERENCES

    1. Rätsch, S. Sonnenburg and B. Schölkopf, RASE: Recognition of alternatively Spliced Exons in C. elegans. Bioinformatics, 21(Suppl. 1): i369-i377, 2005.
  1. M.K. Warmuth, J. Liao, G. Rätsch, M. Mathieson, S. Putta, and C. Lemmen, Active Learning with SVMs in the Drug Discovery Process. J Chem Inf Comput Sci, 43(2), 667 -673, 2003.