This report was generated using the results located at /glittertind/home/carl/asscom2/tests/strachan_campylo/results_ac2 using the installation at /glittertind/home/carl/asscom2.
Table 1: Overview of the samples analysed in this batch. Because mashtree has run, the samples are arranged by the order of the mashtree output.
Here is an overview of the number of result files that have been
found for each analysis. A report section is only rendered if relevant
result files are present for that analysis. Each section can be
triggered to run by calling assemblycomparator2 with a trailing
--until <section>
Table 2: Overview of sections that are rendered in this report. “n / expected” shows the number of analysis files versus how many are expected to be present. Sections are only rendered if relevant files exist. Analyses that perform comparisons between samples generally only output one set of results independent on the number of input files
rule assembly_stats
Table 3: Assembly statistics is provided by assembly-stats. N50 indicates the length of the smallest contig that (together with the longer contigs) covers at least half of the genome.
rule sequence_lengths
Fig. 1: Visualization of the length of each fasta record for each sample. The colors show the mean GC content for each record (contig).
rule busco
Table 4: Table of BUSCO “BUSCO estimates the completeness and redundancy of processed genomic data based on universal single-copy orthologs.”. The following columns are printed as percents [%]: C: Complete, S: Complete and single-copy, D: Complete and duplicated, F: Fragmented, M: Missing, n: Total BUSCO groups searched. For each sample, only the best lineage match (in terms of completeness) is shown.
Fig. 2: BUSCO results visualized. Legend: S: Complete and single-copy; D: Complete and duplicated; F: Fragmented; M: Missing. For each sample, only the best lineage match (in terms of completeness) is shown.
rule kraken2
Table 6: Kraken2 results. For each sample, only the best hit is shown. Taxonomical identification is provided by Kraken 2. The percentages indicate the number of fragments that are covered by the respective clade.
rule gtdbtk
GTDB uses several public
repositories with reference sequences and assigns the most likely name
by measuring the average nucleotide identity (ANI) and relative
evolutionary divergence (RED).
Table 7: Species classification provided by the GTDB-tk classify_wf workflow.
rule mlst
Table 8: Table of MLST (Multi Locus Sequence Typing) results. Called with mlst which incorporates components of the PubMLST database.
Mlst automatically detects the best scheme for typing, one sample at
a time. If you don’t agree with the automatic detection, you can enforce
a single scheme across all samples by (re)running assemblycomparator2
with the added command-line argument:
--config mlst_scheme=hpylori --forcerun mlst
. Replace
hpylori with the mlst scheme you wish to use. You can find a
full list of available schemes in the
“results_ac2/mlst/mlst_schemes.txt”.
rule abricate
Using Abricate, the
assemblies are scanned for known resistance genes in the ncbi, card,
plasmidfinder and vfdb antimicrobial resistance databases.
Table 10: Table of VFDB virulence factor calls: “An integrated and comprehensive online resource for curating information about virulence factors of bacterial pathogens”.
rule prokka
Table 11: Overview of the number of different gene types. Called using the Prokka genome annotator.
rule kegg_pathway
For each genome the prokka-prodigal
called amino-acid sequences are searched in the Uniref100-KO database.
This is the same database that CheckM2 uses. For the results produced
for this analysis, the alignment criteria are stricter (>=85%
coverage and >=50% identity). Using clusterProfilers “enricher”
function, Benjamini-Hochberg adjusted p-values for the pathway
enrichment for the called genes is computed.
Fig. 3: Summary of the KEGG-ortholog based pathway enrichment analysis results. The KEGG pathway hierarchy consists of a number of pathway-classes that are listed on the vertical axis. n denotes the number of pathways from that class, that are significally enriched in each sample.
Table 12: Results from the KEGG-ortholog based pathway enrichment analysis produced with clusterProfiler::enricher. Only significant results are shown. The KOs can be entered directly into KEGG mapper search by setting mode to “Reference”.
rule roary
Roary the pan
genome pipeline computes the number of orthologous genes in a number of
core/pan spectrum partitions.
The core genome denotes the genes which are conserved between all samples (intersection), whereas the pan genome is the union of all genes across all samples.
Table 13: Distribution of genes in different core/pan spectrum partitions.
Fig. 4: Genes shared between samples. Each vertical line represents a gene, and all lines have the same width regardless of the size of the gene. The genes are colored by the number of samples sharing them.
rule snp_dists
Counts the number of differences between
any pair of samples on the core genome produced by roary. SNP distances
do not approximate the evolutionary distance as they are not adjusted
for different probabilities for transitions and transversions etc.
Rather, they give a ballpark indication of the difference between the
samples. Note that the number of SNP distances is highly sensitive to
the core/pan genome size ratio.
Table 14: Pairwise SNP distances between all samples.
Fig. 5: Pairwise SNP distances between all samples. The color
indicates the relative distance for the pair when considering the index
positions of a phylogenetic tree resembling the samples which is
produced with mashtree. The index positions in a phylogenetic tree can
be haphazard, but will always correlate with kinship.
rule mashtree
Mashtree computes an
approximation of ANI using the minhash distance measure. On these
distances, a phylogenetic tree is then created using the
neighbor-joining algorithm. The plotted tree is not rooted.
Fig. 6: Approximation of a phylogenetic tree calculated with mashtree. The horizontal axis is equivalent to 1-ANI.
assemblycomparator2 v2.5.4 genomes to report pipeline. Copyright (C) 2019-2023 Carl M. Kobel GNU GPL v3