Multivariate Normal Variance Mixtures

Erik Hintz, Marius Hofert and Christiane Lemieux

2018-10-07

library(nvmix)
library(RColorBrewer)
library(lattice)
doPDF <- FALSE

1 Introduction

The R package nvmix provides functionality for (multivariate) normal variance mixture distributions, including normal and Student’s t distributions. A random vector \(\mathbf{X}=(X_1,\dots,X_d)\) follows a normal variance mixture, in notation \(\mathbf{X}\sim NVM_d(\mathbf{\mu},\Sigma,F_W)\), if, in distribution, \[\begin{align*} \mathbf{X}=\mathbf{\mu}+\sqrt{W}A\mathbf{Z}, \end{align*}\] where \(\mathbf{\mu}\in\mathbb{R}^d\) denotes the location (vector), \(\Sigma=AA^\top\) for \(A\in\mathbb{R}^{d\times k}\) denotes the scale (matrix) (a covariance matrix), and the mixture variable \(W\sim F_W\) is a non-negative random variable independent of \(\mathbf{Z}\sim N_k(\mathbf{0},I_k)\) (where \(I_k\in\mathbb{R}^{k\times k}\) denotes the identity matrix). Note that both the Student’s \(t\) distribution with degrees of freedom parameter \(\nu>0\) and the normal distribution are normal variance mixtures; in the former case, \(W\sim IG(\nu/2, \nu/2)\) (inverse gamma) and in the latter case \(W\) is almost surely constant (taken as \(1\) so that \(\Sigma\) is the covariance matrix of \(\mathbf{X}\) in this case).

Note that both dnvmix() and pnvmix() currently require \(\Sigma\) to be positive definite. In this case one can take \(A\in\mathbb{R}^{d\times d}\) to be the (lower triangular) Cholesky factor \(L\) of \(\Sigma\) such that \(LL^\top=\Sigma\). This corresponds to the argument factor in those functions. In rnvmix(), factor is of the general form as \(A\) above.

For most functions in the package, the quantile function of \(W\) needs to be provided which is (here) defined as \[\begin{align*} F_W^\leftarrow(u)=\inf\{w\in[0,\infty):F_W(w)\ge u\},\quad u\in[0,1]. \end{align*}\]

2 Evaluating the distribution function

An important but difficult task is to evaluate the (cumulative) distribution function of a normal variance mixture distribution, so \[\begin{align*} F(\mathbf{x})=P(\mathbf{X}\le\mathbf{x})=P(X_1\le x_1,\dots,X_d\le x_d),\quad \mathbf{x}\in\mathbb{R}^d; \end{align*}\] note that pnvmix() currently accepts only positive definite scale matrices. To this end, the function pnvmix() internally approximates the \(d\)-dimensional integral using a randomized Quasi Monte Carlo (RQMC) method. Due to the random nature, the result depends (slightly) on the seed .Random.seed.

2.1 Exponential mixture distribution

As a first example, consider a normal variance mixture with exponential mixture variable \(W\). We illustrate two approaches how to use pnvmix() to approximate \(P(\mathbf{a} < \mathbf{X} \le \mathbf{b})\) for randomly chosen \(\mathbf{a}\le\mathbf{b}\).

## [1] 0.5213515
## attr(,"error")
## [1] 0.0008107188
## attr(,"numiter")
## [1] 3
## [1] 0.5213515
## attr(,"error")
## [1] 0.0008107188
## attr(,"numiter")
## [1] 3

We see that the results coincide.

If higher precision of the computed probabilities is desired, this can be accomplished by changing the argument abstol (which defaults to 1e-3) at the expense of a higher run time.

## [1] 0.5211875
## attr(,"error")
## [1] 6.095715e-06
## attr(,"numiter")
## [1] 12

2.2 Three-point mixture distribution

As a next example, consider a normal variance mixture where \(W\) is discrete. This time, we are interested in computing the one-sided probabability \(P(\mathbf{X}\le\mathbf{b})=F(\mathbf{b})\) for \(\mathbf{b}\) as constructed before.

## [1] 0.8966543
## attr(,"error")
## [1] 0.0006650097
## attr(,"numiter")
## [1] 1

This could have also been obtained as follows but we would have called pNorm() (so pnvmix()) three times then.

2.3 The wrappers pNorm() and pStudent()

For the two special cases of Student’s \(t\) distribution and the normal distribution, pnorm() and pStudent() are user-friendly wrappers of pnvmix(). Note that pStudent() works for any degree of freedom parameter \(\nu>0\) (not necessarily integer) – to the best of our knowledge, this functionality was not available in R at the time of development of this package.

2.4 The effect of algorithm-specific parameters

The function pnvmix() (and thus the wrappers pStudent() and pNorm()) give the user the possibility to change algorithm-specific parameters. A few of them are (for others see ?pnvmix):

Let us now illustrate the effect of method and precond on the performance of pnvmix() with mix = 'inverse.gamma'. To this end we use the wrapper pStudent(). We set abstol = NULL so that the algorithm runs until the number of function evaluations exceeds fun.eval[2]. We do this for different values of fun.eval[2] in order to get an idea of the speed of convergence. We also compute the regression coefficients which act as a summary measure of the speed of convergence.

## Setup
df <- 0.1 # degrees of freedom (very heavy-tailed here)
maxiter <- 9 # note: i iterations require 3 * 2^8 * 2^i function evaluations
max.fun.evals <- 3 * 2^8 * 2^seq(from = 2, to = maxiter, by = 1)
errors <- matrix(, ncol = length(max.fun.evals), nrow = 4)
nms <- c("Sobol  with preconditioning", "Sobol  w/o  preconditioning",
         "PRNG with preconditioning", "PRNG w/o  preconditioning")
rownames(errors) <- nms

## Computing the errors
## Note:
## - resetting the seed leads to a fairer comparison here
## - set 'verbose' to 0 or FALSE to avoid warnings which inevitably occur
##   due to 'abstol = NULL'
for(i in seq_along(max.fun.evals)) {
    N.max <- max.fun.evals[i]
    ## Sobol with preconditioning
    set.seed(42)
    errors[nms[1],i] <-
        attr(pStudent(b, lower = a, scale = P,  df = df,
                      method = "sobol", precond = TRUE, abstol = NULL,
                      fun.eval = c(2^6, N.max), verbose = FALSE), "error")
    ## Sobol without preconditioning
    set.seed(42)
    errors[nms[2],i] <-
        attr(pStudent(b, lower = a, scale = P,  df = df,
                      method = "sobol", precond = FALSE, abstol = NULL,
                      fun.eval = c(2^6, N.max), verbose = FALSE), "error")
    ## PRNG with preconditioning
    set.seed(42)
    errors[nms[3],i] <-
        attr(pStudent(b, lower = a, scale = P,  df = df,
                      method = "PRNG", precond = TRUE, abstol = NULL,
                      fun.eval = c(2^6, N.max), verbose = FALSE), "error")
    ## PRNG without preconditioning
    set.seed(42)
    errors[nms[4],i] <-
        attr(pStudent(b, lower = a, scale = P,  df = df,
                      method = "PRNG", precond = FALSE, abstol = NULL,
                      fun.eval = c(2^6, N.max), verbose = FALSE), "error")
}

## Computing the regression coefficients
coeff <- apply(errors, 1, function(y) lm(log(y) ~ log(max.fun.evals))$coeff[2])
names(coeff) <- nms

## Plot
if(doPDF) pdf(file = (file <- "fig_pnvmix_error_comparison.pdf"),
              width = 7, height = 7)
pal <- colorRampPalette(c("#000000", brewer.pal(8, name = "Dark2")[c(7, 3, 5)]))
cols <- pal(4) # colors
plot(NA, log = "xy", xlim = range(max.fun.evals), ylim = range(errors),
     xlab = "Number of function evaluations", ylab = "Estimated error")
lgnd <- character(4)
for(k in 1:4) {
    lines(max.fun.evals, errors[k,], col = cols[k])
    lgnd[k] <- paste0(nms[k]," (",round(coeff[k], 2),")")
}
legend("topright", bty = "n", lty = rep(1, 4), col = rev(cols), legend = rev(lgnd))
if(doPDF) dev.off()

We can see that in this example Sobol’ outperforms PRNG and that the preconditioning helps significantly in reducing the error.

3 Evaluating the density function

Another important task is to evaluate the density function of a normal variance mixture. This is particularly important for likelihood-based methods. In general, the density is given in terms of a univariate integral which the function dnvmix() internally approximates using a randomized Quasi Monte Carlo (RQMC) method. Due to the random nature, the result slightly varies with .Random.seed. Note that if \(\Sigma\) is singular, the density does not exist.

Note the argument log in dnvmix(): Rather than estimating the density, the logarithmic density is estimated. Only if log = FALSE (the default), the actual density is returned. This is usually numerically more stable than estimating the density and then applying the logarithm to the computed density. Also note that for many applications, the log-density is the actual quantity of interest, for example, when computing the log-likelihood.

We give a small example where \(W\) is the three-point distribution from before.

x <- matrix(1:15/15, ncol = d) # evaluation points of the density
set.seed(1)
(d1 <- dnvmix(x, qmix = qW, scale = P)) # computed density values
## [1] 7.301123e-15 6.122256e-15 4.975161e-15
## attr(,"error")
## [1] 4.63071e-10
## attr(,"numiter")
## [1] 1
set.seed(1)
(d2 <- dnvmix(x, qmix = qW, scale = P, log = TRUE)) # log-density
## [1] -32.55075 -32.72685 -32.93432
## attr(,"error")
## [1] 4.63071e-10
## attr(,"numiter")
## [1] 1
stopifnot(all.equal(d1, exp(d2), check.attributes = FALSE)) # check

The error estimate computed by dnvmix() is the worst-case error for all the inputs, that is, the error of the row with the largest error. Note also how error is much smaller than the default tolerance \(10^{-3}\). This is due to the fact that dnvmix() iterates until the desired precision is reached for both the log-density and the density itself. Furthermore, the error estimate for log = FALSE is conservative – it will only differ from the error when log = TRUE if at least one density estimate is larger than 1.

4 (Quasi-)random number generation

The function rnvmix() provides a flexible tool to sample from (multivariate) normal variance mixtures. The structure is similar to the one of dnvmix() and pnvmix() (but also different in some aspects; see ?dnvmix). The user can specify the argument qmix which, as usual, corresponds to the quantile function \(F_W^\leftarrow\) of \(W\) or, alternatively, the argument rmix, which corresponds to a random number generator for \(W\). This is due to the fact that there are distributions for which it is hard to find the quantile function, but for which sampling procedures exist (for example, for stable distributions). As an example call of rnvmix(), let us revisit Section 2.1 where \(W\sim\mathrm{Exp}(1.9)\).

## Sampling
n <- 500 # sample size
set.seed(42)
r1 <- rnvmix(n, rmix = list("exp", rate = rate)) # uses the default P = diag(2)

## Plot
if(doPDF) pdf(file = (file <- paste0("fig_rnvmix_W_exp.pdf")),
              width = 6, height = 6)
plot(r1, xlab = expression(X[1]), ylab = expression(X[2]))
if(doPDF) dev.off()

An important argument of rnvmix() is method. This can be either "PRNG" (classical pseudo-random sampling) or "sobol" or "ghalton" (for the inversion method based on the corresponding low-discrepancy point set). If method is not "PRNG", qmix must be provided. As an example, let us revisit Section 2.2 where \(W\) was following a three-point distribution.

## Sampling
set.seed(42)
r1 <- rnvmix(n, qmix = qW)
r2 <- rnvmix(n, qmix = qW, method = "ghalton")

## Plot
if(doPDF) pdf(file = (file <- paste0("fig_rnvmix_W_three-point.pdf")),
              width = 9, height = 6)
ran <- range(r1, r2)
opar <- par(pty = "s")
layout(t(1:2))
plot(r1, xlab = expression(X[1]), ylab = expression(X[2]),
     main = "Pseudo-random sample", xlim = ran, ylim = ran)
plot(r2, xlab = expression(X[1]), ylab = expression(X[2]),
     main = "Quasi-random sample", xlim = ran, ylim = ran)
layout(1)
par(opar)
if(doPDF) dev.off()

When \(W\) is discrete and has finite support, one can also easily sample from the corresponding normal variance mixture using rNorm().

## Sampling
set.seed(42)
r <- lapply(1:3, function(k) rNorm(p[k] * n, scale = diag(x[k], 2)))

## Plot
if(doPDF) pdf(file = (file <- paste0("fig_rnvmix_W_three-point_via_rNorm.pdf")),
              width = 6, height = 6)
ran <- range(r)
opar <- par(pty = "s")
plot(NA, xlim = ran, ylim = ran, xlab = expression(X[1]), ylab = expression(X[2]))
for(k in 1:3) points(r[[k]], col = cols[k+1])
par(opar)

This examples helps understanding normal variance mixtures. Note that the brown points come from \(N(\mathbf{0}, I_2)\), the blue ones from \(N(\mathbf{0}, 3I_2)\) and the green ones from \(N(\mathbf{0}, 5I_2)\) and that their frequencies correspond to the probabilities \(P(W=1)\), \(P(W=3)\) and \(P(W=5)\).

Unlike pnvmix() and dnvmix(), rnvmix() can handle singular normal variance mixtures. In this case, the matrix factor (which is a matrix \(A\in\mathbb{R}^{d\times k}\) such that \(AA^\top=\Sigma\)) has to be provided. In the following example, we consider a Student’s \(t\) distribution via the wrapper rStudent(). As expected in the singular case, all points lie on a plane which is visible after a suitable rotation of the cloud plot.

## Sampling
df <- 3.9 # degrees of freedom
factor <- matrix(c(1,0, 0,1, 0,1), ncol = 2, byrow = TRUE) # (3,2)-matrix 'factor'
Sigma <- tcrossprod(factor) # the 'scale' corresponding to factor
stopifnot(Sigma == factor %*% t(factor))
set.seed(42)
r <- rStudent(n, df = df, factor = factor) # sample

## Plot
if(doPDF) pdf(file = (file <- paste0("fig_rnvmix_singular.pdf")),
              width = 6, height = 6)
cloud(r[,3] ~ r[,1] * r[,2], screen = list(z = 115, x = -68),
      xlab = expression(X[1]), ylab = expression(X[2]), zlab = expression(X[3]),
      scales = list(arrows = FALSE, col = "black"),
      par.settings = modifyList(standard.theme(color = FALSE),
                                list(axis.line = list(col = "transparent"),
                                     clip = list(panel = "off"))))
if(doPDF) dev.off()