Predictability: Part One

Donald R. Williams

Predictability

This tutorial provides the work flow in BGGM for computing predictability in Gaussian graphical models. I will use data from a resilience questionnaire.

There are two options for computing predictability. The first simply assesses the error from the predicted values for each posterior sample. This results in a distribution of predictive error. I refer to this as fitted predictability as it is computed from the fitted values. The additional method, described below, is more Bayesian in spirit. It computes the error from the posterior predictive distribution (replicated data sets from the model).

# packages
library(BGGM)
library(ggplot2)

# resilence data
# remove gender variable
dat <- subset(rsa, select = - gender)

Note that most of the data sets in BGGM include categorical variables such as gender. Thus it is important to check the documentation to ensure that only the relevant variables are included in the analysis.

Fitted Predictability

Estimate the Network

The first step is to estimate the network. This provides the necessary ingredients for computing network predictability.

Compute Predictions

Next the object fit is used to compute Bayesian variance explained.

Note summary = FALSE which returns the predicted samples for each posterior iteration. This is necessary to compute predictability.

Plotting Predictability

Error Bar Plot

Most functions in BGGM have plots associated with them. In most cases, this has been simplified by calling plot.

This is not the most attractive plot, which is by design. The returned object is a ggplot which can then be further customized.

Posterior Predictive Predictability

This is implemeted with the same functions as fitted predictability. The only difference how the prediction are computed.

Compute Predictability

The next step is to compute predictability. There are several options available. In this case, I compute mean squared error.

Plotting Predictability

Error Bar Plot

Most functions in BGGM have plots associated with them. In most cases, this has been simplified by calling plot.

This is not the most attractive plot, which is by design. The returned object is a ggplot which can then be further customized.

Fitted vs. Posterior Predictive Predictability

Note that the posterior predictive method account for uncertainty in the distribution of future data. Hence, the error will not only be larger but there will be more uncertainty, for example,

Alternative Metrics

Currently there are five metrics implemented in BGGM: (1) mean squared error (mse); (2) root mean squared error (rmse); (3) mean absolute error (mae); (4) mean absolute percentage error (mape)

The fifth metric is Bayesian variance explained. This is the focus of another vignette. And there will be a paper introudcing methodology that allows for comparing Bayesian \(R^2\) within and between networks.