Returns p-values from the DTT using a linear feature importance measure.

linear_crt(H, H_parent, anc, Y, beta, group, d, adjust = NULL,
  test_idx = 1:nrow(H), family = "gaussian", n_reps = 100,
  genotypes = TRUE, verbose = FALSE, parallel = FALSE,
  recomb_thresh = 1e-08, f_stat = FALSE)

Arguments

H

an (2n x p) matrix of the subject haplotypes. It is assumed rows 1,2 belong to subject 1, rows 3,4 belong to subject 2, etc.

H_parent

and (n_2 x p) matrix of the parental haplotypes

anc

a (2n x 2) table of ancestries. anc[i, 1] and and[i, 2] give the rows of H_parent corresponding to the parents of row i of the haplotypes H

Y

observed responses, a vector of length n or a matrix of dimension (n x k)

beta

feature importance directions, fit on from an ind data set. A vector of length p+1 or a matrix of dimension ((p+1) x k). e.g. The output of get_beta_glment. The first coefficient is assumed to be an intercept.

group

a list of of indices of the group to test. Should be a continuous region, such as 10:20.

d

a vector of length p of genentic distances

adjust

a vector of length n or a matrix of dimension (n x k), giving the contribution of the other chromosomes to the likelihood. I.e. adjust = X the fitted coefficients.

test_idx

a set of indices (between 1 and 2n) used to compute the test statistics

family

type of regression. Either "gaussian" or "binomial". If "guassian", correlation is used as a feature importance statistic. If "binomial", logistic log-likelihood is used as a feature importance statistic.

n_reps

number of repitions of the CRT to carry out

genotypes

Defaults to TRUE. If FALSE, all rows are assumed to be independent (i.e. no two haplotypes are from the same individual).

verbose

if TRUE, prints various diagnostic messages to console

parallel

requires doMC to be registered (default: FALSE)

recomb_thresh

threshold of probability of recombination events to ignore. Lower values will have potentially higher power at increased computational cost.

f_stat

whether or not to use the f-statistic for continuous responses. The f-statistic may be much slower for groups with many nonzero coefficients.

Value

A vector of length k, a p-value for each column of Y.