RNA Secondary Structure Prediction using Large Margin Methods

by Fabio De Bona, Cheng Soon Ong, Alexander Zien, and Gunnar Rätsch.

Abstract

Motivation: We consider the problem of RNA secondary structure prediction. In this setting one is interested in predicting a planar graph from a given nucleotide sequence. Even though there exists already a lot of algorithms dealing with this problem one can not consider it to be solved since even the current (2008) best approaches achieve error rates in the number of predicted pairs of about 25%. The prediction problem itself is well-established and the majority of computational approaches that have been proposed so far consists of two components:

  • A dynamic programming algorithm for decoding the structure, and
  • a parameter inference method.

While the dynamic programming algorithm is quite similar for all of these methods parameters are usually obtained either by using experimentally estimated thermodynamical data or via probabilistic inference. For more than 20 years thermodynamics based models where the most prominent and most successfully applied models to the RNA secondary structure prediction problem [Zuker03]. A recently [Do06] proposed probabilistic approach used a conditional maximum likelihood scheme for parameter inference. This model could outperform existing thermodynamic models significantly. Encouraged by these recent improvements we will use a large margin method related to Support Vector Machines to attack the prediction problem. The central idea is to find a parameter vector that assigns highest score to correct and lower score to incorrect structures.

A poster presented at ISMB 2007 can be found here.

References

[Do06] Do C.B. and Woods D.A. and Batzoglou S., CONTRAfold: RNA Secondary Structure Prediction without Energy-Based Models. Bioinformatics 22(14), 2006.
[Tsochantaridis04] Ioannis Tsochantaridis and Thomas Hofmann and Thorsten Joachims and Yasemin Altun, Support Vector Machine Learning for Interdependent and Structured Output Spaces, Proceedings of the 16th International Conference on Machine Learning, 2004
[Zuker03] Michael Zuker, Lectures on RNA Secondary Structure Prediction, 2003, http://www.bioinfo.rpi.edu/~zukerm/lectures/RNAfold-html
[CVXOPT] Joachim Dahl and Lieven Vandenberghe, CVXOPT: A Python Package for Convex Optimization, http://www.ee.ucla.edu/~vandenbe/cvxopt/