Summaries
When we are interested in assessing how well each locus behaves in reporting microhaplotypes, it is time to turn to the “Filter Analysis” panel choice. First, go ahead and choose Criteria Cutoff –> Global Scope. This view gives you histograms of the read depths and allele balance ratios (the filtering choices in effect appear as dashed red lines), and, below that, a tabular perspective on the outcome of different filtering choices on the total number of haplotypes recorded in the data set. The field selection choices that are present are applied here. So, if ALL loci are selected, then the histograms include reads from all the haplotypes at all the loci, and the tabular summary counts up the total number of haplotypes typed and the total number of individuals typed at all loci. If you select just one locus, the results reflect that one locus, and the histogram results are further broken down by different haplotypes within that locus.
With that in mind, choose the first locus in the dropdown menu: Plate_1_A11_Sat_GE820299_consensus
under “> field selection” and see how that changes the broad summary. Note that across a broad range of the two filtering options (minimum read depth and minimal allelic balance) there are eight haplotypes in total discovered.
In order to search for individuals or loci that have more high-read-depth alleles that one would want, you can choose Criteria Cutoff –> Quality Profiling. This view shows individuals or loci that have more than \(n\) alleles that pass the filters. For example, if you are dealing with diploids, then you would set \(n\) to 2 (in the “+read criteria: Top n Alleles” option). Then any individuals or loci that had more than 2 alleles that satisfied the minimum read depth and allele balance criteria would be noted here. This is good way to look out for contaminated samples or loci that amplify paralogous regions.
To see how the inferred haplotypes look in terms of haplotype frequencies, and also how the genotypes look in terms of Hardy-Weinberg equilibrium, you can choose Genotype Call –> Summaries. This view consists of four figures. The first, in the upper left simply shows the frequencies (and the total read depth) of different haplotypes. The plot in the upper right shows the relationship between the observed frequency of different genotypes and the expected frequencies under Hardy-Weinberg equilibrium. The individuals used in creating these summaries depends on which “Group” is chosen. In this case we have “ALL” chosen, and that is fine because the two groups are essentially identical, genetically. However, if we were dealing with groups that were genetically differentiated, we would not want to assess conformance to Hardy-Weinberg proportions of a mixture of those different groups! In such a case it is worthwhile to look at one group at a time.
The expected number of different genotypes is shown by the outlines of circles and the observed number by the filled, colored circles. Green are homozygotes, orange are heterozygotes, and it should be relatively self-explanatory. There is not a scale, but if you click in the center of any of the genotypes with observed (non-zero) counts, you will be told (in the upper left of the panel) what the expected and observed numbers were for that genotype. These plots are not meant to provide a defensible test of departures from Hardy-Weinberg equilibrium, but do allow the user to diagnose loci that are grotesquely far out of Hardy-Weinberg equilibrium.
Below the haplotype frequencies and HW conformance plots you will find a simple bubble plot expressing haplotype frequencies in the different groups. In the case of the example data there are only two different groups and they have very similar allele frequencies. This plot becomes more useful when one is comparing allele frequencies across many different groups.
Finally, you may need to scroll down to see the final figure in this display. It is a representation of the haplotype sequences, their frequencies, and the positions of the variants within them along each amplicon.
Allele Biplots
Choosing Genotype Call –> AR Refinement takes you to a very informative screen. It is described above in the vignette. Read through the section that describes it and then try playing with the sliders to move the four different lines around the plot and see the effect on whether genotypes get called or not.
Note that you can use the blue lasso tool (upper right corner of the scatter plot) to select a lot of points, whose values will then be revealed in a table below. The red lasso can be used to de-select points and you can check the box “keeps pt selection between loci” to maintain focus on those points as you move from locus to locus. This can be very useful for identifying individuals that show aberrant read depths across multiple loci.