This report was generated in /Users/kartoffel/assemblycomparator2/tests/E._faecium_plasmids/output_asscom2

Sample overview

Table 1: Overview of the samples analysed in this run.

Assembly statistics

Table 2: Assembly statistics is provided by assembly-stats. N50 indicates the length of the smallest contig that (together with the longer contigs) covers at least half of the genome. GC is the average for the whole genome - sd(GC) is the variation amongst contigs.

Lengths of contigs:

Fig. 1: Lengths of fasta records for each sample. The colors have no intrinsic meaning, other than highlighting the varying sizes and number of records in each assembly.

Genome annotation

Table 3: Overview of the number of different gene types called by Prokka. Computed using the Prokka genome annotator.

Species

Table 4: Listing of the 3 highest species hits for each sample. Species identification is provided by Kraken 2. The percentages indicate the proportion of sequences (equivalent to fasta records) mapping to the given taxonomic level.

Using the /Users/kartoffel/databases/kraken2/k2_standard_8gb_20201202 database.

MLST

Table 5: Table of MLST results. Multi locus sequence typing provided by mlst. The mlst software incorporates components of the PubMLST database.

How to customize the mlst-analysis

Mlst automatically detects the best scheme for typing, one sample at a time. If you don’t agree with the automatic detection, you can enforce a single scheme across all samples by (re)running assemblycomparator2 with the trailing command-line argument: --config mlst_scheme=hpylori -R mlst report. Replace hpylori with the mlst scheme you wish to use. You can find a full list of available schemes in the output directory in “mlst/mlst_schemes.txt”.

Resistance

NCBI Resistance

Table 6: Table of NCBI Resistance gene calls. Resistance genes provided by NCBI AMRFinder via Abricate.

Card Resistance genes

Table 7: Table of Card Resistance gene calls. Resistance genes provided by The Comprehensive Antibiotic Resistance Database (Card) via abricate.

Plasmidfinder calls

Table 8: Table of PlasmidFinder plasmid calls. Note that plasmidfinder uses short substrings for recognizing the plasmids, and not necessarily longer sequences i.e. complete plasmid sequences. Plasmid calls provided by PlasmidFinder via abricate.

VFDB calls

Table 9: Table of VFDB virulence factor calls. Virulence factor calls provided by The virulence factor database (VFDB) via abricate: An integrated and comprehensive online resource for curating information about virulence factors of bacterial pathogens.

Pan and Core genome

Roary the pan genome pipeline computes the number of orthologous genes in a number of core/pan spectrum partitions.

The core genome denotes the genes which are conserved between all samples (intersection), whereas the pan genome is the union of all genes across all samples.

Table 10: Distribution of genes in different core/pan spectrum partitions.


Fig. 2: Visual distribution of genes in the different samples. The genes are ordered with respect to the number of samples sharing them.

Core genome phylogeny

Fig. 3: Phylogenetic NJ-tree from the concatenated core genome of all samples. The core genome phylogeny is provided with FastTree with the generalized time-reversible (GTR) model of nucleotide evolution. The plotted tree is not rooted.

Table 11: Pairwise distances between the samples in the core genome. SNP-distances between the core gene alignment. Calculated with snp-dists. The shown table is symmetrical around the diagonal.

Fig. 4: Pairwise distances between the samples in the core genome.


Mashtree phylogeny

Mashtree is developed by Lee S. Katz and extends the minhash distance technique. If the core genome of the samples is small, absent, or otherwise not representative of the collection of assemblies you wish to compare, mashtree might be informative on the overall taxonomic clustering. Please be aware that the tree is not rooted.

Fig. 5: Approximation of a phylogenetic tree calculated with mashtree.


The assemblycomparator2 pipeline and report is developed by Oliver Hansen & Carl M. Kobel