DTT with independence, using a fixed linear statistic

Returns p-values from the modified DTT using a linear feature importance measure. The p-values are guaranteed to be independent.

linear_crt_indep(H, H_parent, anc, Y, beta, groups, d, adjust = NULL,
  test_idx = 1:nrow(H), family = "gaussian", n_reps = 100,
  genotypes = TRUE, verbose = FALSE, parallel = FALSE,
  recomb_thresh = 1e-08, f_stat = TRUE)

Arguments

H	an (2n x p) matrix of the subject haplotypes. It is assumed rows 1,2 belong to subject 1, rows 3,4 belong to subject 2, etc.
H_parent	and (n_2 x p) matrix of the parental haplotypes
anc	a (2n x 2) table of ancestries. anc[i, 1] and and[i, 2] give the rows of H_parent corresponding to the parents of row i of the haplotypes H
Y	observed responses, a vector of length n or a matrix of dimension (n x k)
beta	feature importance directions, fit on from an ind data set. A vector of length p+1 or a matrix of dimension ((p+1) x k). e.g. The output of get_beta_glment. The first coefficient is assumed to be an intercept.
groups	A list of of groups indices of the group to test. Each element should be a continuous region, e.g. list(10:20, 21:30).
d	a vector of length p of genentic distances
adjust	a vector of length n or a matrix of dimension (n x k), giving the contribution of the other chromosomes to the likelihood. I.e. adjust = X the fitted coefficients.
test_idx	a set of indices (between 1 and 2n) used to compute the test statistics
family	type of regression. Either "gaussian" or "binomial". If "guassian", correlation is used as a feature importance statistic. If "binomial", logistic log-likelihood is used as a feature importance statistic.
n_reps	number of repitions of the CRT to carry out
genotypes	Defaults to TRUE. If FALSE, all rows are assumed to be independent (i.e. no two haplotypes are from the same individual).
verbose	if TRUE, prints various diagnostic messages to console
parallel	requires doMC to be registered (default: FALSE)
recomb_thresh	threshold of probability of recombination events to ignore. Lower values will have potentially higher power at increased computational cost.
f_stat	whether or not to use the f-statistic for continuous responses. The f-statistic may be much slower for groups with many nonzero coefficients.

Value

A matrix with k rows and length(groups) columns. Entry (i, j) is a p-value for group j using column i of Y as a response.

Arguments

Value

Contents