# Introduction to gmvarkit

## Introduction

The package gmvarkit contains tools to estimate and analyze both reduced form and structural Gaussian mixture vector autoregressive (GMVAR) models (Kalliovirta, Meitz, and Saikkonen, 2016, and Virolainen, 2020). In this vignette, we first define the reduced form GMVAR model, briefly discuss some of its properties, and explain how to use the functions in gmvarkit to estimate the model, examine the estimates, check its adequacy with quantile residual diagnostics, simulate from the process, and forecast future observations of the process. Then, we define the structural GMVAR model, discuss its identification, and explain how to use the functions in gmvarkit to estimate generalized impulse response function and to test hypotheses regarding the paremeters. Most of the functions work with both, reduced form and structural models, but particularly the one for estimating generalized impulse response function accomodates structural models only.

For more comprehensive discussion on the reduced form model, see the cited article by Kalliovirta, Meitz, and Saikkonen (2016), and for the structural model, see the cited unpublished manuscript by Virolainen (2020). For a shorter introduction and demonstration on how to use gmvarkit, see the readme file.

### Some general notes on gmvarkit

The GMVAR models in gmvarkit are defined as class gmvar S3 objects which can be created with the estimation function fitGMVAR or with the constructor function GMVAR. The created class gmvar objects can then be conveniently used as main arguments in many other functions that allow, for example, model diagnostics, simulations, and forecasting - similarly to many of the popular R packages. Therefore, after estimating a GMVAR model, it’s easy to use the other functions for further analysis. Some tasks, however, such as creating GMVAR models without estimation, or setting up an initial population for the genetic algorithm employed by the estimation function, require accurate knowledge on how the parameter vectors are constructed in this package.

### Structure of this vignette

The rest of this vignette is structured as follows. We have two main sections: one for the reduced form model and one for the structural model. First comes the reduced form model and we start by giving its definition and briefly discussing some of its properties. The form of the reduced form model parameter vector used in gmvarkit is then described. The notation for unconstrained parameter vector is in line with the cited article by Kalliovirta et. al. (2016), however. After that, it is explained and demonstrated by examples how to impose linear constraints on the autoregressive parameters of the model. Finally, we have listed some of the useful functions in gmvarkit and explained how one use them to estimate the model, examine the estimates, evaluate model adequacy, construct a GMVAR model from pre-specified parameters, simulate from a GMVAR process, and forecast a GMVAR process.

The section for the structural GMVAR model starts by defining the model and then proceeds to describe how the structural shocks can be identified - often with less restrictive constraints than in conventional SVAR models, while some of the constraints are also testable. Then, it is discussed how the SGMVAR model can be estimated in gmvarkit and how to impose overidentifying restrictions on the “B-matrix”. It is also explained how one may build a structural model based on a reduced form model when there are exactly two regimes in the model. Finally, impulse response analysis based on a generalized impulse response function is covered.

# Reduced form GMVAR model

## Definition of the GMVAR model

The Gaussian mixture vector autoregressive (GMVAR) model with $$M$$ mixture components and autoregressive order $$p$$, which we refer to as the GMVAR($$p,M$$) model, is a mixture of $$M$$ stationary Gaussian VAR($$p$$) models with time varying mixing weights. At each point of time $$t$$, a GMVAR process generates an observation from one of its mixture components (also called regimes) according to the probabilities pointed by the mixing weights.

Let $$y_t=(y_{1t},...,y_{dt})$$, $$t=1,2,...$$ be the $$d$$-dimensional vector valued time series of interest, and let $$\mathcal{F}_{t-1}$$ be the $$\sigma$$-algebra generated by the random variables $$\lbrace y_{t-j},j>0 \rbrace$$ (containing the information on the past of $$y_t$$). Let $$M$$ be the number of mixture components, and $$\boldsymbol{s}_t=(s_{t,1},...,s_{t,M})$$ a sequence of (non-observable) random vectors such that at each $$t$$ exactly one of its components takes the value one and others take the value zero. The component that takes the value one is selected according to the probabilities $$Pr(s_{t,m}=1|\mathcal{F}_{t-1})\equiv\alpha_{m,t}$$, $$m=1,...,M$$ with $$\sum_{m=1}^M\alpha_{m,t}=1$$ for all $$t$$. Then, for a GMVAR($$p,M$$) model, we have $y_t = \sum_{m=1}^M s_{t,m}(\mu_{m,t}+\Omega_{m}^{1/2}\varepsilon_t), \quad \varepsilon_t\sim\text{NID}(0,I_d),$ where $$\Omega_{m}$$ is a conditional positive definite covariance matrix, NID stands for “normally and independently distributed”, and $\mu_{m,t} = \varphi_{m,0} + \sum_{i=1}^p A_{m,i}y_{t-i}$ is interpreted as the conditional mean of the $$m$$th mixture components. The terms $$\varphi_{m,0}$$ $$(d\times 1)$$ are intercept parameters and $$A_{m,i}$$ $$(d\times d)$$ are the autoregressive (AR) matrices. The random vectors $$\varepsilon_t$$ are assumed to be independent from $$\mathcal{F}_{t-1}$$ and conditionally independent from $$\boldsymbol{s}_{t}$$ given $$\mathcal{F}_{t-1}$$. The probabilities $$\alpha_{m,t}$$ are referred to as the mixing weights.

For each $$m=1,...,M$$, the AR matrices $$A_{m,i}$$, $$i=1,...,p$$ are assumed to satisfy the usual stability condition of VAR models. Namely, denoting $\boldsymbol{A}_m= \begin{bmatrix} A_{m,1} & A_{m,2} & \cdots & A_{m,p-1} & A_{m,p} \\ I_d & 0 & \cdots & 0 & 0 \\ 0 & I_d & \cdots & 0 & 0 \\ \vdots & & \ddots & 0 & 0 \\ 0 & 0 & \cdots & I_d & 0 \end{bmatrix} \enspace (dp\times dp)$ we assume that modulus of all eigenvalues of the matrix $$\boldsymbol{A}_m$$ are smaller than one, for each $$m=1,...,M$$.

Based on the above definition, the conditional density function of $$y_t$$ given $$\mathcal{F}_{t-1}$$, $$f(\cdot|\mathcal{F}_{t-1})$$, is given as $f(y_t|\mathcal{F}_{t-1}) = \sum_{m=1}^M\alpha_{m,t}(2\pi)^{-d/2}\det(\Omega_m)^{-1/2}\exp\left\lbrace -\frac{1}{2}(y_t-\mu_{m,t})´\Omega_m^{-1}(y_t-\mu_{m,t}) \right\rbrace .$ The distribution of $$y_t$$ given its past is thus a mixture of multivariate normal distributions with time varying mixing weights $$\alpha_{m,t}$$. The conditional mean and covariance matrix of $$y_t$$ given $$\mathcal{F}_{t-1}$$ are obtained as $\text{E}[y_t|\mathcal{F}_{t-1}]=\sum_{m=1}^M\alpha_{m,t}\mu_{m,t}, \quad \text{Cov}[y_t|\mathcal{F}_{t-1}]=\sum_{m=1}^M\alpha_{m,t}\Omega_m + \sum_{m=1}^M\alpha_{m,t}\left(\mu_{m,t} - \sum_{n=1}^M\alpha_{n,t}\mu_{n,t} \right)\left(\mu_{m,t} - \sum_{n=1}^M\alpha_{n,t}\mu_{n,t} \right)'.$

In order to define the mixing weights $$\alpha_{m,t}$$, consider first auxiliary stationary Gaussian VAR-processes $$z_{m,t}$$, and the related $$dp$$-dimensional random vectors $$\boldsymbol{z}_{t,m}=(z_{t,m},...,z_{m,t-p+1})$$. The density of $$\boldsymbol{z}_{t,m}$$ is $$$\label{dpdensities} n_{dp}(\boldsymbol{z}_{t,m};\mu_m,\boldsymbol{\Sigma}_{m,p})=(2\pi)^{-dp/2}\det(\boldsymbol{\Sigma}_{m,p})^{-1/2}\exp\left\lbrace -\frac{1}{2}(\boldsymbol{z}_{t,m} - \boldsymbol{1}_p\otimes\mu_m)´\boldsymbol{\Sigma}_{m,p}^{-1}(\boldsymbol{z}_{t,m} - \boldsymbol{1}_p\otimes\mu_m)\right\rbrace ,$$$ where $$\boldsymbol{1}_p = (1,...,1)$$ $$(p\times 1)$$, $$\mu_m=(I_d - \sum_{i=1}^pA_{m,i})^{-1}\varphi_{m,0}$$, and $$\boldsymbol{\Sigma}_{m,p}$$ is obtained from $$vec(\boldsymbol{\Sigma}_{m,p})=(I_{dp^2} - \boldsymbol{A}_m\otimes\boldsymbol{A}_m)^{-1}vec(\Sigma_{m,\varepsilon})$$ where $\Sigma_{m,\varepsilon} = \begin{bmatrix} \Omega_m & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \\ \end{bmatrix} \enspace (dp\times dp).$ Denoting $$\boldsymbol{y}_{t-1}=(y_{t-1},...,y_{t-p})$$ $$(dp\times 1)$$, the mixing weights of the GMVAR model are defined as $\alpha_{m,t} = \frac{\alpha_mn_{dp}(\boldsymbol{y}_{t-1};\mu_m,\boldsymbol{\Sigma}_{m,p})}{\sum_{n=1}^M\alpha_nn_{dp}(\boldsymbol{y}_{t-1};\mu_n,\boldsymbol{\Sigma}_{n,p})},$ where the mixing weights parameters $$\alpha_m$$ are assumed to satisfy $$\sum_{m=1}^M\alpha_{m}=1$$ and the $$dp$$-dimensional normal densities are as the ones defined for $$\boldsymbol{z}_{t,m}$$ above but evaluated at $$\boldsymbol{y}_{t-1}$$. The probabilities for each regime occuring therefore depend on the level, variability, and temporal depedence of the past $$p$$ observations. This is a convenient feature for forecasting but it also enables the researcher to associate certain characteristics to different regimes.

## Some theoretical properties of GMVAR model

### Stationary distribution

#### Theorem 1

Consider the GMVAR process $$y_t$$ that satisfies the model assumptions. Then $$\boldsymbol{y}_t=(y_t,...,y_{t-p+1})$$ is a Markov Chain on $$\mathbb{R}^{dp}$$ with stationary distribution characterized by the density $f(\boldsymbol{y};\boldsymbol{\theta})=\sum_{m=1}^M\alpha_m n_{dp}(\boldsymbol{y};\upsilon_m).$ Moreover, $$\boldsymbol{y}_t$$ is ergodic.

Theorem 1 shows that the stationary distribution of $$\boldsymbol{y}_t$$ is a mixture of $$M$$ multinormal distributions with constant mixing weights $$\alpha_m$$. Consequently, all moments of the stationary distribution exist and are finite.

### Interpretation of the mixing weights $$\alpha_m$$ and $$\alpha_{m,t}$$

Based on the above Theorem 1, $$\alpha_m$$ has the interpretation as the unconditional probability of the random vector $$\boldsymbol{y}_t$$ being generated from the $$m$$th mixture component. Consequently, $$\alpha_m$$ also represents the unconditional probability of the observation $$y_t$$ being generated from the $$m$$th mixture component (see Kalliovirta et al. 2016, Section 3.3 for more details).

As noted, the mixing weights $$\alpha_{m,t}$$ are the conditional probabilities for the random vector $$y_t$$ being generated from the $$m$$th mixture component. According the definition of the mixing weights, regimes with higher weighted likelihood of the past $$p$$ observations are more likely to generate the new observations, with weights being the mixing weight parameters $$\alpha_m$$.

### Properties of the maximum likelihood estimator

1. Under general conditions (see Kalliovirta et al. 2016, Assumption 1), the maximum likelihood estimator for the parameters of the GMVAR model is strongly consistent.

2. Under general conditions (namely, Assumptions 1 and 2 in Kalliovirta et al. 2016), the maximum likelihood estimator has the conventional limiting distribution (normal).

## Parameter vectors and unconstrained models

In this section, we cover the notation for parameter vectors (the same as in Kalliovirta et al. 2016 for unconstrained models) and explain how to impose linear constraints in gmvarkit. Knowledge of the form of the parameter vector is required for creating GMVAR models without estimation with prespecified parameter values or for setting up initial population for the genetic algorithm, but it is not required for estimation and basic analysis of the models. The form of the parameter vector depends on whether an unconstrained or constrained model is considered.

### Order of the model

Specifying the order of a GMVAR model requires specifying the autoregressive order p and the number of mixture components M. The number of time series in the system is denoted by d, and it’s assumed that d is larger than one.1

### Parameter vector of unconstrained model

The parameter vector for unconstrained model is of size ((M(pd^2+d(d+1)/2+1)-1)x1) and has form $\boldsymbol{\theta} = (\boldsymbol{\upsilon_1},...,\boldsymbol{\upsilon_M},\alpha_1,...,\alpha_{M-1}),\enspace \text{where}\quad\quad\enspace\quad$ $\boldsymbol{\upsilon_m}=(\phi_{m,0},\boldsymbol{\phi_m},\sigma_m)\enspace ((pd^2+d(d+1)/2)x1),\quad\quad\enspace$ $\boldsymbol{\phi_m}=(vec(A_{m,1}),...,vec(A_{m,p}))\enspace (pd^2x1), \enspace \text{and}\quad\enspace\;$ $\sigma_m=vech(\Omega_m)\enspace ((d(d+1)/2)x1),\enspace m=1,...,M.$

Above $$\phi_{m,0}$$ denotes the intercept parameter of m:th mixture component, $$A_{m,i}$$ denotes the coefficient matrix of m:th mixture component and i:th lag, $$\Omega_m$$ denotes the (positive definite) error term covariance matrix, and $$\alpha_m$$ denotes the mixing weight parameter of m:th mixture component. $$vec()$$ is a vectorization operator that stacks columns of a matrix into a vector and $$vech()$$ stacks columns of a matrix from the main diagonal downwards (including the main diagonal) into a vector.

The parameter vector above has “intercept” parametrization, referring to the intercept terms $$\phi_{m,0}$$. However, it’s also possible to use “mean” parametrization, where the intercept terms are simply replaced by the regimewise means $$\mu_m=(I_d-\sum_{i=1}^pA_{m,i})^{-1}\phi_{m,0}$$.

## Models imposing linear constraints

### Imposing linear constraints and parameter vector

Imposing linear constraints on the autoregressive parameters of GMVAR model is straightforward in gmvarkit. The constraints are expressed in a rather general form which allows to impose a wide class of constraints but one needs to take the time to construct the constraint matrix carefully for each particular case.

We consider constraints of form $(\boldsymbol{\phi_1},...,\boldsymbol{\phi_M}) = \boldsymbol{C}\boldsymbol{\psi},\enspace \text{where}\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\enspace$ $\boldsymbol{\phi_m}=(vec(A_{m,1}),...,vec(A_{m,p}))\enspace (pd^2x1), \enspace m=1,...,M,$ $$\boldsymbol{C}$$ is known $$(Mpd^2xq)$$ constraint matrix (of full column rank) and $$\boldsymbol{\psi}$$ is unknown $$(qx1)$$ parameter vector.

The parameter vector for constrained model is size ((M(d+d(d+1)/2+1)+q-1)x1) and has form $\boldsymbol{\theta} = (\phi_{1,0},...,\phi_{M,0},\boldsymbol{\psi},\alpha_1,...,\alpha_{M-1}),$ where $$\boldsymbol{\psi}$$ is the $$(qx1)$$ parameter vector containing constrained autoregressive parameters. As in the case of regular models, instead of the intercept parametrization that takes use of intercept terms $$\phi_{m,0}$$, one may use the mean parametrization with regimewise means $$\mu_m$$ instead (m=1,…,M).

### Examples of linear constraints

Consider the following two common uses of linear constraints: restricting the autoregressive parameters to be the same for all regimes and constraining some parameters to zero. Of course also some other constraints may be useful, but we chose to show illustrative examples of these two as they occur in the cited article by Kalliovirta et al. (2016).

#### Restricting AR parameters to be the same for all regimes

To restrict the AR parameters to be the same for all regimes, we want $$\boldsymbol{\phi_m}$$ to be the same for all m=1,…,M. The parameter vector $$\boldsymbol{\psi}$$ $$(qx1)$$ then corresponds to any $$\boldsymbol{\phi_m}=\boldsymbol{\phi}$$, and therefore $$q=pd^2$$. For the constraint matrix we choose $\boldsymbol{C} = [I_{pd^2}:\cdots:I_{pd^2}]' \enspace (Mpd^2xpd^2),$ that is, M pieces of $$(pd^2xpd^2)$$ diagonal matrices stacked on top of each other, because then $\boldsymbol{C}\boldsymbol{\psi}=(\boldsymbol{\psi},...,\boldsymbol{\psi})=(\boldsymbol{\phi},...,\boldsymbol{\phi}).$

#### Restricting AR parameters to be the same for all regimes and constraining non-diagonal elements of coefficient matrices to be zero

The previous example shows how to restrict the AR parameters to be the same for all regimes, but say we also want to constrain the non-diagonal elements of coefficient matrices $$A_{m,i}$$ (m=1,…,M, i=1,…,p) to be zero. We have the constrained parameter $$\boldsymbol{\psi}$$ $$(qx1)$$ representing the unconstrained parameters $$(\boldsymbol{\phi_1},...,\boldsymbol{\phi_M})$$, where by assumption $$\boldsymbol{\phi_m}=\boldsymbol{\phi}=(vec(A_1),...,vec(A_p))$$ $$(pd^2x1)$$ and the elements of $$vec(A_i)$$ (i=1,…,p) corresponding to the diagonal are zero.

For illustrative purposes, let’s consider a GMVAR model with autoregressive degree p=2, number of mixture components M=2 and number of time series in the system d=2. Then we have $\boldsymbol{\phi}=(A_{1,(1,1)},0,0,A_{1,(2,2)},A_{2,(1,1)},0,0,A_{2,(2,2)}) \enspace (8x1) \enspace \text{and}$ $\boldsymbol{\psi}=(A_{1,(1,1)},A_{1,(2,2)},A_{2,(1,1)},A_{2,(2,2)}) \enspace (4x1).\quad\quad\quad\quad\quad\enspace$ By a direct calculation, we can see that choosing the constraint matrix $\boldsymbol{C}=\left[{\begin{array}{c} \boldsymbol{\tilde{c}} \\ \boldsymbol{\tilde{c}} \\ \end{array}}\right] \enspace (Mpd^2x4), \enspace \text{where}$

$\boldsymbol{\tilde{c}}=\left[{\begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ \end{array}}\right] \enspace (pd^2x4)$ satisfies $$\boldsymbol{C}\boldsymbol{\psi}=(\boldsymbol{\phi},...,\boldsymbol{\phi}).$$

## Constraining the unconditional means of some regimes to be the same

In addition to constraining the autoregressive parameters, gmvarkit allows to constrain the unconditional means of some regimes to be the same. This feature is, however, only available for models that are parametrized with the unconditional means instead of intercepts (because some of the estimation is always done with mean-parametrization and one cannot generally swap the parametrization when constraints are imposed on means/intercepts). With the mean-parametrization employed (by setting parametrization="mean"), one may define groups of regimes that have the same mean parameters using the argument same_means. For instance, with three regime model (M=3) the argument same_means=list(c(1, 3), 2) sets the unconditional means of the first and third regimes to be the same while allows the second regime to have different mean.

One can also combine linear constraints on the AR parameters with constraining some of the means to be the same. This allows, for instance, to estimate a model in which only the covariance matrix varies in time. Validity of different constraints can be ecaluated, for example, with likelihood ratio and Wald tests using the functions LR_test and Wald_test.

## Some useful functions in gmvarkit

### Estimating a GMVAR model

gmvarkit provides the function fitGMVAR for estimating a GMVAR model. The maximum likelihood estimation is performed in two phases: in the first phase fitGMVAR uses a genetic algorithm to find starting values for gradient based variable metric algorithm, which it then uses to finalize the estimation in the second phase. It’s important to keep in mind that it’s not guaranteed that the numerical estimation algorithms end up in the global maximum point rather than a local one. Because of high multimodality and challenging surface of the log-likelihood function, it’s actually expected that most of the estimation rounds won’t find the global maximum point. For this reason one should always perform multiple estimation rounds and parallel computing is used to obtain the results faster. The number of CPU cores used can be set with the argument ncores, and the number of estimation rounds can be chosen with the argument ncalls.

If the model estimates poorly, it is often because the number of mixture components M is chosen too large. One reason for is that the genetic algorithm is (with the default settings) designed to find solutions with non-zero mixing weights for all regimes. One may also adjust the settings of the genetic algorithm or set up an initial population with approximate guesses for the estimates. This can by done by passing arguments in fitGMVAR to the function GAfit which implements the genetic algorithm. To check the available settings, read the documentation ?GAfit. If the iteration limit is reached when estimating the model, the function iterate_more can be used to finish the estimation.

If the estimation algorithm fails to create an initial population for the genetic algorithm, it usually helps to scale the individual series so that the AR coefficients (of a VAR model) will be relative small, preferably less than one. Even if one is able to create an initial population, it should be preferred to scale the series so that most of the the AR coefficients will not be very large, as the estimation algorithm works better with small AR coefficients. If needed, another package can be used to fit linear VARs to the series to see which scaling of the series results in relatively small AR coefficients.

### Examining the estimates

Parameters of the estimated model are printed in an illustrative and easy to read form. In order to easily compare approximate standard errors to certain estimates, one can print the approximate standard errors of the estimates in the same form with the function print_std_errors. Numerical approximation of the gradient and Hessian matrix of the log-likelihood at the estimates can be obtained conveniently with the functions get_gradient or get_foc and get_hessian, and eigenvalues of the Hessian can be obtained with the function get_soc.

Graphs of the profile log-likelihood functions of the parameters around the estimates can be plotted with the function profile_logliks.

The estimated objects have their own print, plot, and summary methods. In-sample conditional moments can be examined with the function cond_moment_plot.

The function alt_gmvar can be used to build GMVAR model based on an arbitrary estimation round. This can be used to consider estimates pointed by local maximums instead of the ML estimate. In particular, if the MLE turns out to be a near border solution, it might be appropriate to consider an estimate that is well inside the parameter space.

### Model diagnostics

gmvarkit considers model diagnostics based on multivariate extension of quantile residuals (see Kalliovirta and Saikkonen 2010) which are under the correct model specification asymptotically multivariate standard normal distributed. The quantile residual tests introduced by Kalliovirta and Saikkonen (2010) can be performed with the function quantile_residual_tests by providing the estimated model (that is class gmvar object) as an argument. The argument nsimu can be used to employ a simulation procedure that may enhance the tests size properties.

For graphical diagnostics one may use the function diagostic_plot, which enables one to plot the quantile residual time series, auto- and cross-correlation functions of quantile residuals or their squares, or quantile residual histograms and normal QQ-plots.

### Constructing a GMVAR model without estimation

One may wish to construct an arbitrary GMVAR model without any estimation process, for example in order to simulate from a particular process of interest. An arbitrary model can be created with the function GMVAR. If one wants to add or update data to the model afterwards, it’s advisable to use the function add_data.

### Simulating from a GMVAR process

The function simulateGMVAR is the one for the job. As the main argument it uses a class gmvar object created with fitGMVAR or GMVAR.

### Forecasting GMVAR process

The package gmvarkit contains predict method predict.gmvar for forecasting GMVAR processes. For one step predictions using the exact formula for conditional mean is supported, but forecasts further than that are based on independent simulations. The predictions are either sample means or medians and the confidence intervals are based on sample quantiles. The objects generated by predict.gmvar have their own plot method.

### Hypothesis testing

Test linear hypotheses using Wald test with the function Wald_test and using likelihood ratio test with the function LR_test.

### Univariate analysis

Use the package uGMAR for analysing univariate time series.

# Structural GMVAR model

## Defition of the SGMVAR model

Consider the GMVAR model defined above under the headline “Definition of the GMVAR model”. We focus of the so-called “B-model” setup and write the structural GMVAR model as $$$\label{eq:sgmvar} y_t = \sum_{m=1}^Ms_{t,m}\left(\phi_{m,0} + \sum_{i=i}^{p}A_{m,i}y_{i-1} \right) + B_te_t$$$ and $$$\label{eq:sgmvarerr} u_t\equiv B_te_t= \left\lbrace\begin{matrix} u_{1,t} \sim N(0, \Omega_1) & \text{if} & s_{t,1} = 1 & (\text{with probability } \alpha_{1,t}) \\ u_{2,t} \sim N(0, \Omega_2) & \text{if} & s_{t,2} = 1 & (\text{with probability } \alpha_{2,t}) \\ \vdots & & & \\ u_{M,t} \sim N(0, \Omega_M) & \text{if} & s_{t,M} = 1 & (\text{with probability } \alpha_{M,t}) \\ \end{matrix}\right.$$$ where the probabilities are expressed conditionally on $$\mathcal{F}_{t-1}=\sigma\lbrace y_{t-j}, j>0\rbrace$$ and $$e_t$$ is an orthogonal structural error. Unlike in conventional SVAR analysis, the $$(d\times d)$$ “B-matrix” $$B_t$$, which governs the contemporaneous relations of the shocks, is time-varying. This enables to amplify a constant sized structural shock according to the conditional variance of the reduced form error which varies according to the mixing weights.

We have $$\Omega_{u,t}\equiv\text{Cov}(u_t|\mathcal{F}_{t-1})=\sum_{m=1}^M\alpha_{m,t}\Omega_m$$, while the (conditional) covariance matrix of the structural errors $$e_t=B_t^{-1}u_t$$ (which have a mixture normal distribution and are not IID) is obtained as $$$\text{Cov}(e_t|\mathcal{F}_{t-1})=\sum_{m=1}^M\alpha_{m,t}B_t^{-1}\Omega_mB_t'^{-1}.$$$ The B-matrix $$B_t$$ should thus be chosen so that the structural shocks are orthogonal regardless of which regime they come from.

Following Lanne & Lütkepohl (2010) and Lanne et al. (2010), we decompose the error term covariance matrices as $$$\label{eq:decomp} \Omega_1 = WW' \quad \text{and} \quad \Omega_m=W\Lambda_mW', \quad m=2,...,M,$$$ where the diagonal of $$\Lambda_m=\text{diag}(\lambda_{m1},...,\lambda_{md})$$, $$\lambda_{mi}>0$$ ($$i=1,...,d$$), contains the eigenvalues of the matrix $$\Omega_m\Omega_1^{-1}$$ and columns of the nonsingular $$W$$ are the related eigenvectors. When $$M=2$$, the above decomposition always exists (e.g. Lanne & Lütkepohl, 2010, Proposition 1), but for $$M\geq 3$$ its existence requires that the matrices $$\Omega_m\Omega_1^{-1}$$ share the common eigenvectors in $$W$$. Whether imposing the above structure on the covariance matrices is appropriate with $$M\geq 3$$, can be evaluated by estimating the restricted and unrestricted SGMVAR models and conducting a likelihood ratio test, comparing values of information criteria, or examining quantile residual diagnostics, for example. These three methods are accomodated in gmvarkit: values of the information criteria are calculated for all estimated models are available, for instance, in the summary print, likelihood ratio test is available as the function LR_test and quantile residual diagnostics can be examined with the functions diagnostic_plot and quantile_residual_tests.

Lanne & Lütkepohl (2010) show that for a given ordering of the eigenvalues $$\lambda_{mi}$$, $$W$$ is unique apart from changing all signs in a column, as long as for all $$i\neq j\in \lbrace 1,...,d \rbrace$$ there exists an $$m\in \lbrace 2,...,M \rbrace$$ such that $$\lambda_{mi} \neq \lambda_{mj}$$. A locally unique B-matrix that amplifies a constant sized structural shock according to the conditional variance of the reduced form error is therefore obtained as $$$\label{eq:bt} B_t=W(\alpha_{1,t}I_d + \sum_{m=2}^M\alpha_{m,t}\Lambda_m)^{1/2}$$$ which simultaneously diagonalizes $$\Omega_1,...,\Omega_M$$, and $$\Omega_{u,t}$$ for each $$t$$ so that $$\text{Cov}(e_t|\mathcal{F}_{t-1})=I_d$$. In gmvarkit, the B-matrix always has the above structure.

## Identification of the structural shocks

Even if all pairs of $$\lambda_{mi}$$ are distinct for some $$m$$, unique identification of $$B_t$$ requires deciding upon the ordering of the eigenvalues $$\lambda_{mi}$$ in the diagonals of $$\Lambda_m$$ which fixes also the ordering of the columns of $$W$$. While statistical identification of the model is achieved with any arbitrary ordering the eigenvalues, it does not guarantee that the structural shocks have economic interpretations. In particular, a structural shock relates to an economic shock through the specific constraints in the corresponding column of $$W$$ (or equally of the B-matrix) that only the shock of interest satisfies. If such constraints are readily satisfied in the (unrestricted) estimate of $$W$$, the identification amounts to labelling the structural shocks by the appropriate economic shocks, as long as the constraints are strong enough to pin down a unique ordering for the columns of $$W$$ (this argument will be formalized in Proposition 1 below).

If the unrestricted estimate of $$W$$ is such that the researcher cannot associate the economic shocks to it, the appropriate constraints can placed for their identification. Besides a single sign constraint in each column of $$W$$, any constraints on $$W$$ are overidentifying and therefore testable. Moreover, globally unique identification of the structural shocks can often be achieved with more flexible constraints than in conventional SVAR models. Also, as in practice one is often interested in identifying only some specific shock or shocks (such as a monetary policy shock, for instance), it makes sense to consider only partial identification of the B-matrix as well. Specifically, we say that the $$j$$th structural shock is uniquely identified if the $$j$$th column of the B-matrix is unique for given mixing weights $$\alpha_{1,t},...,\alpha_{M,t}$$. This requires that the $$j$$th columns of $$W$$ and $$\Lambda_m$$, $$m=2,..,M$$, are unique. The following proposition gives sufficient and necessary conditions for global identification of the last $$d_1$$ shocks when the related pairs of $$\lambda_{mi}$$ are distinct for some $$m$$.

#### Proposition 1

Suppose $$\Omega_1=WW'$$ and $$\Omega_m=W\Lambda_mW', \ \ m=2,...,M,$$ where $$\Lambda_m=\text{diag}(\lambda_{m1},...,\lambda_{md})$$, $$\lambda_{mi}>0$$ ($$i=1,...,d$$), contains the eigenvalues of $$\Omega_m\Omega_1^{-1}$$ in the diagonal and the columns of the nonsingular $$W$$ are the related eigenvectors. Then, the last $$d_1$$ structural shocks are uniquely identified if

1. for all $$j > d-d_1$$ and $$i\neq j$$ there exists an $$m\in \lbrace 2,...,M \rbrace$$ such that $$\lambda_{mi} \neq \lambda_{mj}$$,

2. the columns of $$W$$ are constrained in a way that for all $$i\neq j > d - d_1$$, the $$i$$th column cannot satisfy the constraints of the $$j$$th column as is nor after changing all signs in the $$i$$th column, and

3. there is at least one (strict) sign constraint in each of the last $$d_1$$ columns of $$W$$.

Notice that condition (3) of Proposition 1 fixes the signs in the last $$d_1$$ columns of $$W$$ and therefore the signs of the instantaneous effects of the corresponding shocks. Changing the signs would be effectively the same as changing the signs of the corresponding shocks so condition (3) is not restrictive, however. The assumption that the identified shocks are the last $$d_1$$ shocks is neither restrictive as one may always reorder the structural shocks (and possibly the variables) accordingly.

For example, if $$d=3$$, $$\lambda_{m1}\neq\lambda_{m3}$$ for some $$m$$, and $$\lambda_{m2}\neq\lambda_{m3}$$ for some $$m$$, the third structural shock can be identified with the following constraints: $B_t=\begin{bmatrix} * & * & * \\ + & + & - \\ + & + & + \\ \end{bmatrix} \ \text{or} \ \begin{bmatrix} - & * & + \\ - & + & - \\ * & + & + \\ \end{bmatrix} \ \text{or} \ \begin{bmatrix} + & 0 & - \\ * & * & * \\ + & * & + \\ \end{bmatrix}$ and so on, where $$"*"$$ signifies that the element is not constrained, $$"+"$$ denotes strict positive and $$"-"$$ a strict negative sign constraint, and $$"0"$$ means that the element is constrained to zero. Because imposing zero or sign constraints on $$W$$ equals to placing them on $$B_t$$, they may be justified economically. Furthermore, besides a single sign constraint in each column, the constraints are over-identifying and can thus be also justified statistically. Sign constraints, however, don’t reduce the dimension of the parameter space, making some of the measures such as the conventional likelihood ratio test and information criteria unsuitable for testing them. Quantile residual diagnostics, on the other hand, can be used to evaluate how well the restricted model is able to encapsulate the statistical properties of the data compared to the unrestricted alternative.

Shocks that don’t satisfy the condition (1) of Proposition 1 can still be identified but it requires stronger conditions. The following proposition provides sufficient criteria for global identification of the last $$d_1$$ shocks when exactly one of the eigenvalues $$\lambda_{mi}$$ with $$i\neq j > d - d_1$$ is identical to $$\lambda_{mj}$$ in all $$m$$. For simplicity, we assume that only another one the shocks with identical eigenvalues is to be identified.

#### Proposition 2

Let $$d_1<d$$. Consider the matrix decomposition of Proposition 1 and further suppose that for $$j=d-d_1+1$$ and some $$i\leq d-d_1$$ we have $$\lambda_{mi}=\lambda_{mj}$$ for all $$m$$, but for all $$l$$ that are not in $$\lbrace i,j \rbrace$$, $$\lambda_{ml}\neq\lambda_{mj}$$ for some $$m$$. Then, the last $$d_1$$ structural shocks are uniquely identified if conditions (1)-(3) of Proposition 1 are otherwise satisfied, and in addition

1. the column $$i\leq d-d_1$$ of $$W$$ such that $$\lambda_{mi}=\lambda_{mj}$$ for all $$m$$ has at least one (strict) sign constraint and the $$j$$th column has a zero constraint where the $$i$$th column has the (strict) sign constraint.

Note that the assumption $$j=d-d_1+1$$ is made without loss of generality as the structural shocks can always be reordered accordingly by also reordering the columns of $$W$$ (including the constraints) and the eigenvalues $$\lambda_{mi}$$ correspondingly.2

To exemplify, if $$d=4$$, $$\lambda_{m1}\neq\lambda_{m4}$$ for some $$m$$, $$\lambda_{m2}\neq\lambda_{m4}$$ for some $$m$$, and $$\lambda_{m3}=\lambda_{m4}$$ for all $$m$$, the following constraints lead to global identification of last shock: $B_t=\begin{bmatrix} * & * & * & *\\ * & * & + & 0\\ + & + & * & -\\ + & + & * & +\\ \end{bmatrix} \ \text{or} \ \begin{bmatrix} * & * & - & 0\\ * & * & * & *\\ + & - & * & +\\ - & + & * & +\\ \end{bmatrix} \ \text{or} \ \begin{bmatrix} + & - & - & 0\\ * & * & * & *\\ * & * & * & *\\ * & * & * & +\\ \end{bmatrix}$ and so on. As is demonstrated above, the structural shocks can often be identified with less restrictive or more suitable constraints than in a conventional SVAR model even when some of the eigenvalues are identical for all covariance matrices.

If more than two eigenvalues are identical for all $$m=2,...,M$$ but they are not all identical, it may still be possible to find less restrictive (or more suitable) conditions for identification of the shocks. Specifically, the idea utilized in the proof of Proposition 2 (presented in an Appendix of Virolainen (2020)) can be applied to larger numbers of identical eigenvalues. If all the eigenvalues are identical for all covariance matrices, then $$\Omega_m=\lambda_{m1}\Omega_1$$ and the identification condition is the same as for the conventional SVAR model. As a general remark, observe that constraining an element of $$B_t$$ to be any constant other than zero is infeasible because all elements in the B-matrix are either zero or time varying due to the time-varying mixing weights.

## Estimation of the structural GMVAR model

The SGMVAR model is estimated similarly to the reduced form version, expect that the model is parametrized with $$W$$ and $$\lambda_{mi}$$, $$m=2,...,M$$, $$i=1,...,d$$ instead of the covariance matrices $$\Omega_{m}$$, $$m=1,...,M$$. The estimation is can be done with the function fitGMVAR but now the argument structural_pars needs to be supplied with a list providing the constraints on $$W$$ (which equally imposes the constraints on the B-matrix) and optionally linear constraints on the $$\lambda_{mi}$$ parameters.

The list structural_pars should contain at least the element W which is a $$(dxd)$$ matrix matrix with its entries imposing constraints on W: NA indicating that the element is unconstrained, a positive value indicating strict positive sign constraint, a negative value indicating strict negative sign constraint, and zero indicating that the element is constrained to zero. The optional element named C_lambda which is a $$(d(M-1) \times r)$$ constraint matrix that satisfies ($$\lambda_{2},...,\lambda_{M}) =C_{\lambda} \gamma$$ where $$\lambda_{m}=(\lambda_{m1},...,\lambda_{md})$$ and $$\gamma$$ is the new $$(r x 1)$$ parameter subject to which the model is estimated (similarly to AR parameter constraints). The entries of C_lambda must be either positive or zero. Ignore (or set to NULL) if the eigenvalues $$\lambda_{mi}$$ should not be constrained. Note that other constraints than constraining some of the $$\lambda_{mi}$$ to be identical are not recommended but if such constraints are imposed, the argument lambda_scale in the genetic algorithm (see ?GAfit) should be adjusted accordingly.

## Building structural model based on a reduced form model

If the number of regimes is two ($$M=2$$), a structural model can be build based on a reduced form model because the matrix decomposition used in the simultaneous diagonalization of the error term covariance matrices always exists. This can be done with function gmvar_to_sgmvar which should be supplied with the reduced form model and it returns a corresponding structural model. After creating the structural model, the columns of $$W$$ can be reordered ex-post with the function reorder_W_columns which also reorders all $$\lambda_{mi}$$ accordingly and hence the resulting model will conincide with the original reduced form model. Also, all signs any column of $$W$$ can be swapped ex-post with the function swap_W_signs.

## Impulse response analysis

Following Koop et al. (1996) and Kilian & Lütkepohl (2017), we consider the generalized impulse response function (GIRF) defined as $$$\label{eq:girf} \text{GIRF}(n,\delta_j,\mathcal{F}_{t-1}) = \text{E}[y_{t+n}|\delta_j,\mathcal{F}_{t-1}] - \text{E}[y_{t+n}|\mathcal{F}_{t-1}],$$$ where $$n$$ is the chosen horizon, $$\mathcal{F}_{t-1}=\sigma\lbrace y_{t-j},j>0\rbrace$$ as before, the first term in the RHS is the expected realization of the process at time $$t+n$$ conditionally on a structural shock of magnitude $$\delta_j \in\mathbb{R}$$ in the $$j$$th element of $$e_t$$ at time $$t$$ and the previous observations, and the second term in the RHS is the expected realization of the process conditionally on the previous observations only. GIRF thus expresses the expected difference in the future outcomes when the specific structural shock hits the system at time $$t$$ as opposed to all shocks being random.

Due to the $$p$$-step Markov property of the SGMVAR model, conditioning on (the $$\sigma$$-algebra generated by) the $$p$$ previous observations $$\boldsymbol{y}_{t-1}\equiv(y_{t-1},...,y_{t-p})$$ is effectively the same as conditioning on $$\mathcal{F}_{t-1}$$ at the time $$t$$ and later. The history $$\boldsymbol{y}_{t-1}$$ can be either fixed or random, but with random history the GIRF becomes a random vector, however. Using fixed $$\boldsymbol{y}_{t-1}$$ makes sense when one is interested in the effects of the shock in a particular point of time, whereas more general results are obtained by assuming that $$\boldsymbol{y}_{t-1}$$ follows the stationary distribution of the process. If one is, on the other hand, concerned about a specific regime, $$\boldsymbol{y}_{t-1}$$ can be assumed to follow the stationary distribution of the corresponding component model.

In practice, the GIRF and its distributional properties can be approximated with a Monte Carlo algorithm that generates independent realizations of the process and then takes the sample mean for point estimate. If $$\boldsymbol{y}_{t-1}$$ is random and follows the distribution $$G$$, the GIRF should be estimated for different values of $$\boldsymbol{y}_{t-1}$$ generated from $$G$$, and then the sample mean and sample quantiles can be taken to obtain the point estimate and confidence intervals. The algorithm implemented in gmvarkit is presented in an Appendix of Virolainen (2020).

Because the SGMVAR model allows to associate specific features or economic interpretations for different regimes, it might be interesting to also examine the effects of a structural shock to the mixing weights $$\alpha_{m,t}$$, $$m=1,...,M$$. We then consider the related GIRFs $$E[\alpha_{m,t+n}|\delta_j,\boldsymbol{y}_{t-1}] - E[\alpha_{m,t+n}|\boldsymbol{y}_{t-1}]$$ for which point estimates and confidence intervals can be constructed similarly to ().

In gmvarkit, the GIRF can be estimated with the function GIRF which should be supplied with the estimated SGMVAR model or a SGMVAR model build with hand specified parameter values using the function GMVAR. The size of the structural shock can be set with the argument shock_size. If not specified, the size of one standard deviation is used; that is, the size one. Among other arguments, the function may also be supplied with the argument init_regimes which specifies from which regimes’ stationary distributions the initial values are generated from. If more than one regime is specified, a mixture distribution with weights given by the mixing weight parameters is used. Alternatively, one may specify fixed initial values with the argument init_values. Note that the confidence intervals (whose confidence level can be specified with the argument ci) reflect uncertainty about the initial value only and not about the parameter estimates.

Because estimating the GIRF and confidence intervals for it is computationally demanding, parallel computing is taken use of to shorten the estimation time. The number of CPU cores used can be set with the argument ncores.

As final remarks, note that the objects returned by the GIRF function have their own plot and print methods. Also, cumulative impulse responses of the specified variables can be obtained directly by specifying the argument which_cumulative in the function GIRF.

## References

• Lanne, M. and Lütkepohl, H. (2010). Structural vector autoregressions with nonnormal residuals. Journal of Business & Economic Statistics, 28(1):159–168.
• Lanne, M., Lütkepohl, H., and Maciejowsla, K. (2010). Structural vector autoregressions with markov switching. Journal of Economic Dynamics and Control, 34(2):121–131.
• Kalliovirta L., Meitz M. and Saikkonen P. (2016) Gaussian mixture vector autoregression. Journal of Econometrics, 192, 485-498.
• Kalliovirta L. and Saikkonen P. (2010) Reliable Residuals for Multivariate Nonlinear Time Series Models. Unpublished Revision of HECER Discussion Paper No. 247.
• Kilian, L. and Lütkepohl, H. (2017). Structural Vector Autoregressive Analysis. Cambridge University Press, Cambridge, 1st edition.
• Koop, G., Pesaran, M., and Potter, S. (1996). Impulse response analysis in nonlinear multivariate models. Journal of Econometrics, 74(1):119–147.
• Virolainen S. 2020. Structural Gaussian mixture vector autoregressive model. Unpublished working paper, available as arXiv:2007.04713.

1. For univariate analysis one may use the package uGMAR.↩︎

2. While the assumptions in Propositions 1 and 2 regarding the ordering of the shocks are not restrictive, they can also be relaxed to allow for any ordering and are made here for simplicity only.↩︎