Returns p-values from the DTT using a linear feature importance measure.
linear_crt(H, H_parent, anc, Y, beta, group, d, adjust = NULL, test_idx = 1:nrow(H), family = "gaussian", n_reps = 100, genotypes = TRUE, verbose = FALSE, parallel = FALSE, recomb_thresh = 1e-08, f_stat = FALSE)
H | an (2n x p) matrix of the subject haplotypes. It is assumed rows 1,2 belong to subject 1, rows 3,4 belong to subject 2, etc. |
---|---|
H_parent | and (n_2 x p) matrix of the parental haplotypes |
anc | a (2n x 2) table of ancestries. anc[i, 1] and and[i, 2] give the rows of H_parent corresponding to the parents of row i of the haplotypes H |
Y | observed responses, a vector of length n or a matrix of dimension (n x k) |
beta | feature importance directions, fit on from an ind data set. A vector of length p+1 or a matrix of dimension ((p+1) x k). e.g. The output of get_beta_glment. The first coefficient is assumed to be an intercept. |
group | a list of of indices of the group to test. Should be a continuous region, such as 10:20. |
d | a vector of length p of genentic distances |
adjust | a vector of length n or a matrix of dimension (n x k), giving the contribution of the other chromosomes to the likelihood. I.e. adjust = X the fitted coefficients. |
test_idx | a set of indices (between 1 and 2n) used to compute the test statistics |
family | type of regression. Either "gaussian" or "binomial". If "guassian", correlation is used as a feature importance statistic. If "binomial", logistic log-likelihood is used as a feature importance statistic. |
n_reps | number of repitions of the CRT to carry out |
genotypes | Defaults to TRUE. If FALSE, all rows are assumed to be independent (i.e. no two haplotypes are from the same individual). |
verbose | if TRUE, prints various diagnostic messages to console |
parallel | requires doMC to be registered (default: FALSE) |
recomb_thresh | threshold of probability of recombination events to ignore. Lower values will have potentially higher power at increased computational cost. |
f_stat | whether or not to use the f-statistic for continuous responses. The f-statistic may be much slower for groups with many nonzero coefficients. |
A vector of length k, a p-value for each column of Y.