# Using the Knockoff Filter with a Fixed Design Matrix

#### 2017-09-28

The MFKnockoffs package can also be used to perform controlled variable selection with a fixed design matrix, assuming a linear regression model for the response. In this sense, MFKnockoffs is a superset of the original knockoffs package.

set.seed(1234)
# Problem parameters
n = 1000          # number of observations
p = 300           # number of variables
k = 30            # number of variables with nonzero coefficients
amplitude = 4.5   # signal amplitude (for noise level = 1)

# Generate the variables from a multivariate normal distribution
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)

# Generate the response from a linear model
nonzero = sample(p, k)
beta = amplitude * (1:p %in% nonzero) / sqrt(n)
y.sample = function(X) X %*% beta + rnorm(n)
y = y.sample(X)

## First examples

In order to create fixed-design knockoffs, we call MFKnockoffs.filter with the parameter statistic equal to MFKnockoffs.stat.glmnet_lambda_difference. Moreover, since not all statistics are valid with fixed-design knockoffs, we use MFKnockoffs.stat.glmnet_lambda_difference instead of the default one (which is based on cross-validation).

library(MFKnockoffs)
result = MFKnockoffs.filter(X, y, knockoffs = MFKnockoffs.create.fixed, statistic = MFKnockoffs.stat.glmnet_lambda_difference)

We can display the results with

print(result)
## Call:
## MFKnockoffs.filter(X = X, y = y, knockoffs = MFKnockoffs.create.fixed,
##     statistic = MFKnockoffs.stat.glmnet_lambda_difference)
##
## Selected variables:
##  [1]  39  54  55  60  61  66  69  82  98 103 105 119 124 131 136 146 154
## [18] 155 160 165 197 230 231 255 269 273 274 277 284 285 295

The default value for the target false discovery rate is 0.1. In this experiment the false discovery proportion is

fdp = function(selected) sum(beta[selected] == 0) / max(1, length(selected))
fdp(result\$selected)
## [1] 0.1290323