Oqtans Tools

tools_table

Table 1: The software packages integrated into Oqtans, with their input and output file formats. For file format abbreviations, see Table [t2]. Packages with an (a) are currently being updated. Tools for which one of the authors developed a wrapper for Galaxy integration are indicated with a (b), while tool wrappers developed by others are marked with (c). Methods that are developed by one of the authors indicated with a (d).
Name and Reference Input Output

Read Mapping
   
PALMapper[1] (b)(d) Index, Reference Genome, FASTQ BAM
Bowtie 1 & 2[2] (c) Index, Reference Genome, FASTQ SAM
BWA[3] (c) Index, Reference Genome, FASTQ SAM
TopHat 1 & 2[4] (c) FASTA/Q, Index SAM, WIG, BED
STAR[5] (b) FASTA/Q, STAR Genome Index BAM, BED

Gene and Transcript Prediction
   
Cufflinks[6] (c) SAM/BAM, (GFF3) GTF
mTIM[7] (a)(b)(d) FASTA, BAM, SPF GFF3
Scripture[8] (a)(b) SAM/BAM GTF
SplAdder (in preparation) (a)(b)(d) FASTA, GFF3, BAM GFF3
Trinity[9] (b) FASTQ FASTA

Quantitative Analysis
   
rQuant[10] (b)(d) GFF3, BAM GFF3
rDiff[11](b)(d) GFF3, BAM TAB (Gene Names)
Cuffdiff[6] (c) SAM/BAM, (GFF3) GTF
DESeq[12] (b) GFF3, BAM TAB (Gene Names)
DESeq2[12] (b) GFF3, BAM TAB (Gene Names)
DEXSeq[13] (b) GFF3, BAM TAB (Gene Names)
edgeR[14] (b) GFF3, BAM TAB (Gene Names)
Genesetter (b)(d) TAB (Gene Names) PNG, TAB (Percentages)
TopGO[15] (b) TAB (Gene Names) PDF

Machine Learning-based Sequence Analysis
   
KIRMES[16] (b)(d) FASTA PNG, PWM, TAB, HTML
ASP[17] (b)(d) FASTA GTF
ARTS[18] (a)(b)(d) FASTA GTF
EasySVM[19] (b)(d) FASTA, ARFF, TAB TAB (Classifications), PNG
Shogun[20] (b)(d) TAB, Labels TAB (Classifications)

Pre- and Postprocessing, File Format Utilities
   
GFF toolkit[21] (b) GFF, GFF3, GTF GFF3
SAMtools[22] (c) SAM, BAM SAM, BAM
RNA-geeq (manuscript in preparation) (a)(b)(d) GFF3, BAM BAM, TAB (Score Matrix)

WebLogo[23] (b)

PWM PNG



Table 2: File formats used by the tools described in Table [t1] .
Extension Stands for Format Used for
ARFF Attribute-Relation File Format Tabular Databases
BAM Binary SAM Binary Sequence alignment
BED Browser Extensible Data Tabular Sequence annotation
FASTA FAST-All Text Biological sequences
FASTQ FASTA Quality Text Sequence reads with a quality score per base
GFF(3) Generic Feature File (version 3) Tabular Sequence annotation
GTF Gene Transfer Format Tabular Sequence annotation
HTML Hypertext Markup Language Text Documents with text and graphics
PDF Portable Document Format Binary Layouted text and image data
PNG Portable Network Graphics Binary Image data
PWM Position Weight Matrix Tabular Sequence motifs, e.g. binding sites
SAM Sequence Alignment/Map Tabular Sequence alignment
SPF Signal Predictor Format Binary Trained signal predictors from machine learning methods
TAB Tabular Values Tabular Tabular data, columns separated by a character
WIG Wiggle Tabular Dense continuous data


Bibliography

[1]
Jean, G., Kahles, A., Sreedharan, V. T., De Bona, F., and Rätsch, G. (2010).
RNA-seq read alignments with PALMapper.
Curr Protoc Bioinformatics, 32(11), 6.1-6.37.

[2]
Langmead, B., Trapnell, C., Pop, M., and Salzberg, S. L. (2009).
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
Genome Biol, 10(3).

[3]
Li, H. and Durbin, R. (2010).
Fast and accurate long-read alignment with Burrows-Wheeler transform.
Bioinformatics, 26(5), 589-595.

[4]
Trapnell, C., Pachter, L., and Salzberg, S. L. (2009).
TopHat: discovering splice junctions with RNA-seq.
Bioinformatics, 25(9), 1105-1111.

[5]
A. Dobin et al (2012).
STAR: ultrafast universal RNA-seq aligner.
Bioinformatics, doi: 10.1093/bioinformatics/bts635

[6]
Roberts, A., Pimentel, H., Trapnell, C., and Pachter, L. (2011).
Identification of novel transcripts in annotated genomes using RNA-seq.
Bioinformatics, 27, btr355.

[7]
Görnitz, N., Zeller, G., Behr, J., Kahles, A., Mudrakarta, P., Sonnenburg, S., and Rätsch, G. (2011).
mTiM: margin-based transcript mapping from RNA-seq.
In C. Alkan, editor, RECOMB Sattelite Workshop on Massively Parallel Sequencing, volume 12, London, UK. BMC Bioinformatics.

[8]
Guttman, M., Garber, M., Levin, J. Z., Donaghey, J., Robinson, J., Adiconis, X., Fan, L., Koziol, M. J., Gnirke, A., Nusbaum, C., Rinn, J. L., Lander, E. S., and Regev, A. (2010).
Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas.
Nat Biotechnol, 28(5), 503-10.

[9]
Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., Adiconis, X., Fan, L., Raychowdhury, R., Zeng, Q., Chen, Z., Mauceli, E., Hacohen, N., Gnirke, A., Rhind, N., di Palma, F., Birren, B. W., Nusbaum, C., Lindblad-Toh, K., Friedman, N., and Regev, A. (2011).
Full-length transcriptome assembly from rna-seq data without a reference genome.
Nat Biotechnol, 29(7), 644-52.

[10]
Bohnert, R. and Rätsch, G. (2010).
rQuant.web: a tool for RNA-seq-based transcript quantitation.
Nucleic Acids Res, 38(Web Server issue), 348-351.

[11]
Stegle, O., Drewe, P., Bohnert, R., Borgwardt, K., and Rätsch, G. (2010).
Statistical tests for detecting differential rna-transcript expression from read counts.
Nature Precedings, 4437, 1.

[12]
Anders, S. and Huber, W. (2010).
Differential expression analysis for sequence count data.
Genome Biology, 11(10), R106.

[13]
Anders S, Reyes A, Huber W. (2012).
Detecting differential usage of exons from RNA-seq data.
Genome Research, 10.1101/gr.133744.111.

[14]
Robinson MD, McCarthy DJ, Smyth GK.(2010).
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
Bioinformatics, 26(1):139-40.

[15]
Alexa, A., Rahnenführer, J., and Lengauer, T. (2006).
Improved scoring of functional groups from gene expression data by decorrelating go graph structure.
Bioinformatics, 22(13), 1600-1607.

[16]
Schultheiss, S. J., Busch, W., Lohmann, J. U., Kohlbacher, O., and Rätsch, G. (2009).
KIRMES: kernel-based identification of regulatory modules in euchromatic sequences.
Bioinformatics, 25(16), 2126-2133.

[17]
Sonnenburg, S., Schweikert, G., Philips, P., Behr, J., and Rätsch, G. (2007).
Accurate splice site prediction using support vector machines.
BMC Bioinformatics, 8(Suppl. 10), S7.

[18]
Sonnenburg, S., Zien, A., and Rätsch, G. (2006).
Arts: accurate recognition of transcription starts in human.
Bioinformatics, 22(14), e472-80.

[19]
Ben-Hur, A., Ong, C. S., Sonnenburg, S., Schölkopf, B., and Rätsch, G. (2008).
Support vector machines and kernels for computational biology.
PLoS Comput Biol, 4(10), e1000173.

[20]
Sonnenburg, S., Rätsch, G., Henschel, S., Widmer, C., Behr, J., Zien, A., de Bona, F., Binder, A., Gehl, C., and Franc, V. (2010).
The shogun machine learning toolbox.
J Mach Learn Res, 99, 1799-1802.

[21]
Sreedharan, V. T., Behr, J., Bohnert, R., Schultheiss, S. J., and Rätsch, G. (2011).
A toolkit for pre-processing genome annotations in generic feature format.
In N. Harris and P. Rice, editors, Bioinformatics Open Source Conference, volume 12, page 47, Vienna, Austria. Open Bioinformatics Foundation.

[22]
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., and Durbin, R. (2009).
The sequence alignment/map format and samtools.
Bioinformatics, 25(16), 2078.

[23]
Crooks, G. E., Hon, G., Chandonia, J. M., and Brenner, S. E. (2004).
Weblogo: a sequence logo generator.
Genome Res, 14(6), 1188-1190.