Returns p-values from the DTT using a linear feature importance measure.
linear_crt(H, H_parent, anc, Y, beta, group, d, adjust = NULL, test_idx = 1:nrow(H), family = "gaussian", n_reps = 100, genotypes = TRUE, verbose = FALSE, parallel = FALSE, recomb_thresh = 1e-08, f_stat = FALSE)
| H | an (2n x p) matrix of the subject haplotypes. It is assumed rows 1,2 belong to subject 1, rows 3,4 belong to subject 2, etc. |
|---|---|
| H_parent | and (n_2 x p) matrix of the parental haplotypes |
| anc | a (2n x 2) table of ancestries. anc[i, 1] and and[i, 2] give the rows of H_parent corresponding to the parents of row i of the haplotypes H |
| Y | observed responses, a vector of length n or a matrix of dimension (n x k) |
| beta | feature importance directions, fit on from an ind data set. A vector of length p+1 or a matrix of dimension ((p+1) x k). e.g. The output of get_beta_glment. The first coefficient is assumed to be an intercept. |
| group | a list of of indices of the group to test. Should be a continuous region, such as 10:20. |
| d | a vector of length p of genentic distances |
| adjust | a vector of length n or a matrix of dimension (n x k), giving the contribution of the other chromosomes to the likelihood. I.e. adjust = X the fitted coefficients. |
| test_idx | a set of indices (between 1 and 2n) used to compute the test statistics |
| family | type of regression. Either "gaussian" or "binomial". If "guassian", correlation is used as a feature importance statistic. If "binomial", logistic log-likelihood is used as a feature importance statistic. |
| n_reps | number of repitions of the CRT to carry out |
| genotypes | Defaults to TRUE. If FALSE, all rows are assumed to be independent (i.e. no two haplotypes are from the same individual). |
| verbose | if TRUE, prints various diagnostic messages to console |
| parallel | requires doMC to be registered (default: FALSE) |
| recomb_thresh | threshold of probability of recombination events to ignore. Lower values will have potentially higher power at increased computational cost. |
| f_stat | whether or not to use the f-statistic for continuous responses. The f-statistic may be much slower for groups with many nonzero coefficients. |
A vector of length k, a p-value for each column of Y.