This is the location for the HTRX tool that was firstly proposed by
Barrie, W.,
Yang, Y., Irving-Pease, E.K. et al. Elevated genetic risk for multiple
sclerosis emerged in steppe pastoralist populations. Nature 625, 321–328
(2024).
and then illustrated in detail by
Yang
Y, Lawson DJ. HTRX: an R package for learning non-contiguous haplotypes
associated with a phenotype. Bioinformatics Advances 3.1 (2023):
vbad038.
Authors:
Yaoling Yang
(yaoling.yang@bristol.ac.uk)
Daniel Lawson
(dan.lawson@bristol.ac.uk)
License: GPL-3
Haplotype Trend Regression with eXtra flexibility (HTRX) searches a pre-defined set of SNPs for haplotype patterns that include single nucleotide polymorphisms (SNPs) and non-contiguous haplotypes.
We search over all possible templates which give a value for each SNP being ‘0’ or ‘1’, reflecting whether the reference allele of each SNP is present or absent, or an ‘X’ meaning either value is allowed.
We used a two-stage procedure to select the best HTRX model (function
“do_cv”).
Stage 1: select candidate models;
Stage 2: select the best model using 10-fold cross-validation.
Longer haplotypes are important for discovering interactions. However, there are \(3^k-1\) haplotypes in HTRX if the region contains \(k\) SNPs, making it unrealistic for regions with large numbers of SNPs. To address this issue, we proposed “cumulative HTRX” (function “do_cumulative_htrx”) that enables HTRX to run on longer haplotypes, i.e. haplotypes which include at least 7 SNPs (we recommend). Besides, we provide a parameter “max_int” which controls the maximum number of SNPs that can interact.
::install_github("https://github.com/YaolingYang/HTRX") devtools
This package is also available from CRAN. You can install it by
install.packages("HTRX")
A tutorial of package HTRX can be found in vignettes/HTRX_vignette.pdf
library(HTRX)
## use dataset "example_hap1", "example_hap2" and "example_data_nosnp"
## "example_hap1" and "example_hap2" are both genomes of 8 SNPs for 5,000 individuals (diploid data)
## "example_data_nosnp" is an example dataset which contains the outcome (binary), sex, age and 18 PCs
## visualise the covariates data
head(HTRX::example_data_nosnp)
## visualise the genotype data for the first genome
head(HTRX::example_hap1)
## we perform HTRX on the first 4 SNPs
## we first generate all the haplotype data, as defined by HTRX
=make_htrx(HTRX::example_hap1[,1:4],HTRX::example_hap2[,1:4])
HTRX_matrix
## If the data is haploid, please set
## HTRX_matrix=make_htrx(HTRX::example_hap1[,1:4],HTRX::example_hap1[,1:4])
## next compute the maximum number of independent features
=htrx_max(nsnp=4)
featurecap
## then perform HTRX using 2-step cross-validation
## to compute additional variance explained by haplotypes
## If you want to compute total variance explained, please set gain=FALSE
<- do_cv(HTRX::example_data_nosnp,
htrx_results train_proportion=0.5,
HTRX_matrix,sim_times=3,featurecap=featurecap,usebinary=1,
method="stratified",criteria="BIC",
gain=TRUE,runparallel=FALSE)
## If we want to compute the total variance explained
## we can set gain=FALSE in the above example
## we perform cumulative HTRX on all the 8 SNPs using 2-step cross-validation
## to compute additional variance explained by haplotypes
## If the data is haploid, please set hap2=HTRX::example_hap1
## If you want to compute total variance explained, please set gain=FALSE
## For Linux/MAC users, we strongly encourage you to set runparallel=TRUE
<- do_cumulative_htrx(data_nosnp=HTRX::example_data_nosnp,
cumu_htrx_results hap1=HTRX::example_hap1,
hap2=HTRX::example_hap2,
train_proportion=0.5,sim_times=1,
featurecap=6,usebinary=1,
randomorder=TRUE,method="stratified",
criteria="BIC",runparallel=FALSE)