Optimal Spliced Alignments of Short Sequence Reads
|Dec. 20, 2010:||training QPALMA is explained in PALMapper tutorial paper|
|Dec. 20, 2010:||
QPALMA version 0.9.3 released
This release only contains the training module of QPALMA: alignment and filtering modules are now integrated in PALMapper (new and complete documentation about how to train QPALMA)
New pretrained models
New options: possibility to train with unspliced reads or without splice site predictions
|May 8, 2010:||
QPALMA version 0.9.2 released (release candidate 1)
The whole process to train QPALMA from artificial reads is implemented in a bash script (see documentation)
Pretrained models have been added
|April 16, 2010:||The QPALMA alignment module is available as part of PALMapper|
|Mar. 17, 2010:||Splice site predictions for more than ten organisms available.|
|Mar. 11, 2010:||The fusion of GenomeMapper & QPALMA (Palmapper) is available on request (details)|
|Mar. 1, 2010:||QPALMA is available on the FML galaxy server|
|July 13, 2009:||QPALMA version 0.9.1 released (much easier to use and faster).|
|Oct. 15, 2008:||QPALMA version 0.9 released.|
|Sep. 21, 2008:||Paper describing QPALMA published in Bioinformatics and presented at ECCB'08|
This is the main site of the QPalma project. QPalma is an alignment tool targeted to align spliced reads produced by Next Generation sequencing platforms such as Illumina Solexa or 454.
The paper can be downloaded here.
Our method uses a training set of spliced reads with quality information and known alignments. It uses a large margin approach similar to support vector machines to estimate its parameters to maximize alignment accuracy. Information from the several sources be properly combined to achieve a higher prediction accuracy. Currently QPalma can incorporate base quality values of the read data and predicted splice site scores (for example from the splice site prediction project).
- Next generation sequencing technologies open exciting new possibilities for genome and transcriptome sequencing. While reads produced by these technologies are relatively short and error prone compared to the Sanger method their throughput is several magnitudes higher. To utilize such reads for transcriptome sequencing and gene structure identification, one needs to be able to accurately align the sequence reads over intron boundaries. This represents a significant challenge given their short length and inherent high error rate.
- We present a novel approach, called QPALMA, for computing accurate spliced alignments which takes advantage of the read's quality information as well as computational splice site predictions. Our method uses a training set of spliced reads with quality information and known alignments. It uses a large margin approach similar to support vector machines to estimate its parameters to maximize alignment accuracy. In computational experiments, we illustrate that the quality information as well as the splice site predictions help to improve the alignment quality. Finally, to facilitate mapping of massive amounts of sequencing data typically generated by the new technologies, we have combined our method with a fast mapping pipeline based on enhanced suffix arrays. Our algorithms were optimized and tested using reads produced with the Illumina Genome Analyzer for the model plant Arabidopsis thaliana.
QPalma aligns short reads to the genomic sequences in an optimal way according to its underlying algorithm and trained parameters. It creates an alignment using dynamic programming (written in C++), and returns the alignment in a blat-like format. The algorithms computes optimal local alignments, so if no alignment has been found it is because no alignment got a sufficiently high alignment score.
The original read data we used for the evaluation described the paper can be found on our ftp server.
You can access the data via: