rnaSPAdes is a tool for de novo transcriptome assembly from RNA-Seq data and is suitable for all kind of organisms. rnaSPAdes is a part of SPAdes package since version 3.9. Information about SPAdes download, requirements, installation and basic options can be found in SPAdes manual. Below you may find information about differences between SPAdes and rnaSPAdes.
To run rnaSPAdes use
rnaspades.py [options] -o <output_dir>
or
spades.py --rna [options] -o <output_dir>
Note that we assume that SPAdes installation directory is added to the PATH
variable (provide full path to rnaSPAdes executable otherwise: <rnaspades installation dir>/rnaspades.py
).
Here are several notes regarding options :
--careful
and --cov-cutoff
options.--meta
, --sc
and --plasmid
.
rnaSPAdes outputs only one FASTA file named transcripts.fasta
. The corresponding file with paths in the assembly_graph.fastg
is transcripts.paths
.
Contigs/scaffolds names in rnaSPAdes output FASTA files have the following format:
rnaQUAST may be used for transcriptome assembly quality assessment for model organisms when reference genome and gene database are available. rnaQUAST also includes BUSCO and GeneMarkS-T tools for de novo evaluation.
If you use rnaSPAdes in your research, please include main SPAdes paper Bankevich, Nurk et al., 2012 in your reference list. Paper on rnaSPAdes is to be submitted.
Your comments, bug reports, and suggestions are very welcomed. They will help us to further improve rnaSPAdes.
If you have any troubles running rnaSPAdes, please send us
Address for communications: spades.support@cab.spbu.ru.
>NODE_97_length_6237_cov_11.9819_g8_i2
Similarly to SPAdes, 97
is the number of the transcript, 6237
is its sequence length in nucleotides and 11.9819
is the k-mer coverage. Note that the k-mer coverage is always lower than the read (per-base) coverage. g8_i2
correspond to the gene number 8 and isoform number 2 within this gene. Transcripts with the same gene number are presumably received from same or somewhat similar (e.g. paralogous) genes. Note, that the prediction is based on the presence of shared sequences in the transcripts and is very approximate.
3 Assembly evaluation
4 Citation
5 Feedback and bug reports
params.txt
and spades.log
from the directory <output_dir>
.