MicroHapulator Report

Report generated at {{date}},
using MicroHapulator version {{mhpl8rversion}}.

Table of Contents

    {% block table_of_contents %} {% endblock %}

The data and statistics from which this report is populated are available in their entirety in the MicroHapulator working directory. Per-sample results and full-resolution graphics are available in each analysis/{samplename} subdirectory in the working directory. The entire working directory is large and not convenient for sharing. Accordingly, graphics, JavaScript, and CSS assets are copied to the report/ directory, which can then be compressed in a ZIP archive that should be better suited for e.g. email transfer.

Read QA/QC

QC reports for the input reads are generated using FastQC and compiled into a single report with MultiQC. A link to the MultiQC report is provided below.

NOTE: FastQC was designed for QC of whole-genome shotgun NGS reads prior to genome asssembly. A QC warning or failure for some modules (such as per-base sequence content or sequence duplication levels) may or may not be a concern with MH reads. Interpret results with care!

Click here to open MultiQC report in a new tab

{% if read_length_table is none %} {% block read_len_fig %} {% endblock %} {% else %} {% block read_len_table %} {% endblock %} {% endif %} {% block read_filter_stats %} {% endblock %} {% block read_merge_stats %} {% endblock %}

Read Mapping

{% block read_map_header %} {% endblock %}

Table 3.1: Read mapping metrics.

{% for sample, stats in mapping_summary.items() %} {% endfor %}
Sample Filtered Reads Mapped Reads Mapping Rate Chi-square
{{ sample }} {{ stats.total_reads }} {{ stats.mapped_reads }} {{ stats.mapping_rate }} {{ stats.chi_square }}

The reported chi-square statistic is a measure of read coverage imbalance between markers, and can be compared among samples sequenced using the sample panel: the minimum value of 0 represents perfectly uniform coverage across markers, while the maximum value of D occurs when all reads map to a single marker (D represents the degrees of freedom, or the number of markers minus 1). A visual representation of interlocus balance is shown in Figure 3.4.

Using mapping information, each read is assigned to one of four categories as follows.

Figure 3.2: Bar graph showing the number of reads for each sample, broken down into four categories: on-target, off-target, contaminant, and repetitive.

Table 3.3: The total number of reads mapped to each marker, and the subset of those reads marked as repetitive, broken down by sample. Table columns are sortable, and marker names link to a marker detail page.

{% for sample in samples %} {% endfor %} {% for sample in samples %} {% endfor %} {% for marker, marker_data in repetitive_reads_by_marker.items() %} {% for sample in samples %} {% endfor %} {% endfor %}
{{sample}} {{sample}}
MarkerReads Repetitive
{{marker}}{{ "{:,d}".format(marker_data[sample].mapped) }} {{ "{:,d}".format(marker_data[sample].repetitive) }}
{% for sample in samples %} {% endfor %}

Figure 3.4: Histograms showing the interlocus balance for each sample.

Haplotype Calling

Haplotypes are called empirically using mhpl8r type as follows. MicroHapulator examines each aligned read to determine its suitability for haplotype calling: this is a typing event. If the read alignment spans all SNPs of interest, the typing event is successful and a haplotype call is made. If not, the typing event is failed and no haplotype call is made. (Note that if more than one marker is defined at a given locus, MicroHapulator can attempt multiple typing events per read. In this case the number of Attempted Typing Events will exceed the number of Mapped Reads.) Collectively, the tallies of each observed haplotype represent a typing result for each sample. The typing rate is calculated as the number of successful typing events divided by the total number of attempted typing events.

Table 4.1: Read typing metrics.

{% for sample, stats in typing_summary.items() %} {% endfor %}
Sample Mapped Reads Attempted Typing Events Successful Typing Events Typing Success Rate
{{ sample }} {{ mapping_summary[sample].mapped_reads }} {{ stats.attempted }} {{ stats.successful }} {{ stats.typing_rate }}

Table 4.2: Typing rate of each individual marker, broken down by sample. Table columns are sortable, and marker names link to a marker detail page.

{% for sample in typing_summary.keys() %} {% endfor %} {% for markername in markernames %} {% for summary in typing_summary.values() %} {% endfor %} {% endfor %}
Marker{{sample}}
{{ markername }}{{ summary.marker_typing_rate(markername) }}%

Genotype Calling

Two types of thresholds are applied to each typing result using mhpl8r filter to discriminate between true MH alleles (haplotypes) and false alleles resulting from sequencing error or other artifacts. A static detection threshold, based on a fixed number of reads, is used to filter out low-level noise. A dynamic analytical threshold, based on a percentage of the total reads at the locus (after removing alleles that fail the detection threshold), accounts for fluctuations in the depth of coverage between loci, samples, and runs, and can filter out higher-level noise in most cases. The haplotype tallies, after all filters have been applied, represents the genotype call for that sample.

Table 5.1: Detection thresholds and analytical thresholds for each marker. Table columns are sortable, and marker names link to a marker detail page.

{% for marker in markernames %} {% endfor %}
Marker Detection Analytical
{{ marker }} {{ thresholds.get(marker)[0] }} {{ "{:.1f}".format(thresholds.get(marker)[1] * 100) }}%

For single-source samples, we expect the two alleles at heterozygous loci to have roughly even abundance. The following plots show the relative abundance of the major and minor allele for each marker with a heterozygous genotype (markers are sorted by absolute combined abundance, which is printed above each pair of allele counts). For known DNA mixtures, these plots can be safely ignored. But for suspected single-source samples, if there is substantial imbalance between major and minor allele counts at numerous loci, the sample should be examined more closely for the presence of a minor DNA contributor.

{% for sample in samples %} {% endfor %}

Figure 5.2: Bar graphs showing heterozygote balance for all samples.