Easysvm

News

April 28, 2011: New version of EasySVM released.

This page is dedicated to the EasySVM package. It provides a set of tools based on the Shogun toolbox allowing to train and test SVMs in a simple way. This toolbox is integrated in our Galaxy server.

The updated release (easysvm-0.3.3.tar.bz2) can be downloaded here. Other releases can be found here.

Installation

For a global install, for which you need root permissions

python setup.py install

For a local install

python setup.py install --prefix=$HOME

See distutils-help.txt for more details.

Dependencies

Usage

A simple example

In the following very simple example we generate a two-dimensional data set with two Gaussian-distributed classes (60% positive examples, width of distribution 1.3):

python scripts/datagen.py cloud 1000 2 0.6 1.3 cloud.arff
python scripts/easysvm.py modelsel 5 0.1,1,10 gauss 0.1,1,10 arff cloud.arff modelsel-cloud.txt
python scripts/easysvm.py cv 5 10 gauss 1 arff cloud.arff cv-cloud.txt

Tutorial examples

Many examples of using easysvm are discussed in a tutorial paper. The results of the paper can by reproduced by a script in tutorial-example.py. Execute it in the data directory:

cd data
python ../splicesites/tutorial_example.py

The output of this script can be downloaded here: tutorial-example.out

Galaxy interface

The following command line arguments are what is behind the galaxy interface, which is available as a web service from http://galaxy.raetschlab.org/

There are three types of data creation methods:

datagen.py motif arff gattaca 10 50 10-15 0.1 tttt 100 50 15 0.1 testmotif1.arff
datagen.py cloud 100 3 0.6 1.3 testcloud1.arff
datagen.py motif arff gattaca 100 50 10-15 0.1 tttt 1000 50 15 0.1 testmotif2.arff
datagen.py cloud 1000 3 0.6 1.3 testcloud2.arff

datagen.py motif fasta gattaca 10 50 10-15 0.1 testmotifpos.fasta
datagen.py motif fasta tttt 100 50 15 0.1 testmotifneg.fasta
datagen.py motif fasta gattaca 100 50 10-15 0.1 tm1.fasta
datagen.py motif fasta tttt 1000 50 15 0.1 tm2.fasta
cat tm1.fasta tm2.fasta > testmotiftest.fasta
rm tm1.fasta tm2.fasta

Cross validation and evaluation on a independent validation set:

easysvm.py cv 5 10 gauss 0.6 arff testcloud1.arff cv_cloud.txt
easysvm.py eval cv_cloud.txt arff testcloud1.arff cv_cloud_eval.txt roc roc_cloud_cv.png
easysvm.py cv 5 10 wd 10 2 arff testmotif1.arff cv_motif.txt dna R
easysvm.py eval cv_motif.txt arff testmotif1.arff cv_motif_eval.txt roc roc_motif_cv.png

Predict on a test set:

easysvm.py pred 10 gauss 0.6 arff testcloud1.arff testcloud2.arff pred_cloud.txt
easysvm.py pred 10 linear arff testcloud1.arff testcloud2.arff pred_cloud.txt
easysvm.py pred 10 poly 3 true true arff testcloud1.arff testcloud2.arff pred_cloud.txt
easysvm.py pred 10 wd 10 2 arff testmotif1.arff testmotif2.arff pred_motif.txt dna R
easysvm.py pred 10 localalign arff testmotif1.arff testmotif2.arff pred_motif.txt dna R
easysvm.py pred 10 localimprove 10 1 1 arff testmotif1.arff testmotif2.arff pred_motif.txt dna R

For some kernels, investigate the importance of different motives:

easysvm.py poim 10 6 wd 10 2 arff testmotif1.arff poims.png dna R

We also support the fasta format:

easysvm.py cv 5 10 wd 10 2 fasta testmotifpos.fasta testmotifneg.fasta cv_motif.txt dna R
easysvm.py eval cv_motif.txt fasta testmotifpos.fasta testmotifneg.fasta cv_motif_eval.txt roc roc_motif_cv.png
easysvm.py pred 10 wd 10 2 fasta testmotifpos.fasta testmotifneg.fasta testmotiftest.fasta pred_motif.txt dna R
easysvm.py poim 10 6 wd 10 2 fasta testmotifpos.fasta testmotifneg.fasta poims.png dna R

License

GPLv3

All programs in this collection are free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.