Returns p-values from the modified DTT using a linear feature importance measure. The p-values are guaranteed to be independent.

linear_crt_indep(H, H_parent, anc, Y, beta, groups, d, adjust = NULL,
  test_idx = 1:nrow(H), family = "gaussian", n_reps = 100,
  genotypes = TRUE, verbose = FALSE, parallel = FALSE,
  recomb_thresh = 1e-08, f_stat = TRUE)

Arguments

H

an (2n x p) matrix of the subject haplotypes. It is assumed rows 1,2 belong to subject 1, rows 3,4 belong to subject 2, etc.

H_parent

and (n_2 x p) matrix of the parental haplotypes

anc

a (2n x 2) table of ancestries. anc[i, 1] and and[i, 2] give the rows of H_parent corresponding to the parents of row i of the haplotypes H

Y

observed responses, a vector of length n or a matrix of dimension (n x k)

beta

feature importance directions, fit on from an ind data set. A vector of length p+1 or a matrix of dimension ((p+1) x k). e.g. The output of get_beta_glment. The first coefficient is assumed to be an intercept.

groups

A list of of groups indices of the group to test. Each element should be a continuous region, e.g. list(10:20, 21:30).

d

a vector of length p of genentic distances

adjust

a vector of length n or a matrix of dimension (n x k), giving the contribution of the other chromosomes to the likelihood. I.e. adjust = X the fitted coefficients.

test_idx

a set of indices (between 1 and 2n) used to compute the test statistics

family

type of regression. Either "gaussian" or "binomial". If "guassian", correlation is used as a feature importance statistic. If "binomial", logistic log-likelihood is used as a feature importance statistic.

n_reps

number of repitions of the CRT to carry out

genotypes

Defaults to TRUE. If FALSE, all rows are assumed to be independent (i.e. no two haplotypes are from the same individual).

verbose

if TRUE, prints various diagnostic messages to console

parallel

requires doMC to be registered (default: FALSE)

recomb_thresh

threshold of probability of recombination events to ignore. Lower values will have potentially higher power at increased computational cost.

f_stat

whether or not to use the f-statistic for continuous responses. The f-statistic may be much slower for groups with many nonzero coefficients.

Value

A matrix with k rows and length(groups) columns. Entry (i, j) is a p-value for group j using column i of Y as a response.