## Vector autoregressions in Stata

**Introduction**

In a univariate autoregression, a stationary time-series variable \(y_t\) can often be modeled as depending on its own lagged values:

\begin{align}

y_t = \alpha_0 + \alpha_1 y_{t-1} + \alpha_2 y_{t-2} + \dots

+ \alpha_k y_{t-k} + \varepsilon_t

\end{align}

When one analyzes multiple time series, the natural extension to the autoregressive model is the vector autoregression, or VAR, in which a vector of variables is modeled as depending on their own lags and on the lags of every other variable in the vector. A two-variable VAR with one lag looks like

\begin{align}

y_t &= \alpha_{0} + \alpha_{1} y_{t-1} + \alpha_{2} x_{t-1}

+ \varepsilon_{1t} \\

x_t &= \beta_0 + \beta_{1} y_{t-1} + \beta_{2} x_{t-1}

+ \varepsilon_{2t}

\end{align}

Applied macroeconomists use models of this form to both describe macroeconomic data and to perform causal inference and provide policy advice.

In this post, I will estimate a three-variable VAR using the U.S. unemployment rate, the inflation rate, and the nominal interest rate. This VAR is similar to those used in macroeconomics for monetary policy analysis. I focus on basic issues in estimation and postestimation. Data and do-files are provided at the end. Additional background and theoretical details can be found in Ashish Rajbhandari’s [earlier post], which explored VAR estimation using simulated data.

**Data and estimation**

When writing down a VAR, one makes two basic model-selection choices. First, one chooses which variables to include in the VAR. This decision is typically motivated by the research question and guided by theory. Second, one chooses the lag length. Heuristics may be used, such as “include one year worth of lags”, or there are formal lag-length selection criteria available. Once the lag length has been determined, one may proceed to estimation; once the parameters of the VAR have been estimated, one can perform postestimation procedures to assess model fit.

I use quarterly observations on the U.S. unemployment rate, rate of consumer price inflation, and short-term nominal interest rate from 1955 to 2005. The three series were downloaded from the Federal Reserve Economic Database at https://fred.stlouisfed.org. In the Stata output that follows, the inflation rate is referred to as **inflation**, the unemployment rate as **unrate**, and the interest rate as **ffr** (federal funds rate). Hence, the VAR I will estimate is

\begin{align}

\begin{bmatrix}

{\bf inflation}_t \\ {\bf unrate}_t \\ {\bf ffr}_t

\end{bmatrix}

=

{\bf a_0}

+

{\bf A_1}

\begin{bmatrix}

{\bf inflation}_{t-1} \\ {\bf unrate}_{t-1} \\ {\bf ffr}_{t-1}

\end{bmatrix}

+

\dots

+

{\bf A_k}

\begin{bmatrix}

{\bf inflation}_{t-k} \\ {\bf unrate}_{t-k} \\ {\bf ffr}_{t-k}

\end{bmatrix}

+

\begin{bmatrix}

\varepsilon_{1,t} \\ \varepsilon_{2,t} \\ \varepsilon_{3,t}

\end{bmatrix}

\end{align}

\({\bf a_0}\) is a vector of intercept terms and each of \({\bf A_1}\) to \({\bf A_k}\) is a \(3 \times 3\) matrix of coefficients. VARs with these variables, or close analogues to them, are common in monetary policy analysis.

The next step is to decide on a sensible lag length. I use the **varsoc** command to run lag-order selection diagnostics.

. varsoc inflation unrate ffr, maxlag(8) Selection-order criteria Sample: 41 - 236 Number of obs = 196 +---------------------------------------------------------------------------+ |lag | LL LR df p FPE AIC HQIC SBIC | |----+----------------------------------------------------------------------| | 0 | -1242.78 66.5778 12.712 12.7323 12.7622 | | 1 | -433.701 1618.2 9 0.000 .018956 4.54796 4.62922 4.74867 | | 2 | -366.662 134.08 9 0.000 .010485 3.95574 4.09793 4.30696* | | 3 | -351.034 31.257 9 0.000 .009801 3.8881 4.09123 4.38985 | | 4 | -337.734 26.6 9 0.002 .009383 3.84422 4.1083 4.4965 | | 5 | -319.353 36.763 9 0.000 .008531 3.7485 4.07351 4.5513 | | 6 | -296.967 44.77* 9 0.000 .007447* 3.61191* 3.99787* 4.56524 | | 7 | -292.066 9.8034 9 0.367 .007773 3.65373 4.10063 4.75759 | | 8 | -286.45 11.232 9 0.260 .008057 3.68826 4.1961 4.94265 | +---------------------------------------------------------------------------+ Endogenous: inflation unrate ffr Exogenous: _cons

**varsoc** displays the results of a battery of lag-order selection tests. The details of these tests may be found in **help varsoc**. Both the likelihood ratio test and Akaike’s information criterion recommend six lags, which I use through the rest of this post.

With variables and lag length in hand, there are two objects to estimate: the coefficient matrices and the covariance matrix of the error terms. Coefficients can be estimated by least squares, equation by equation. The covariance matrix of the errors may be estimated from the sample covariance matrix of the residuals. **var** performs both tasks.

The table of coefficients is displayed by default, and the covariance estimate of the error terms can be found in the stored result **e(Sigma)**:

. var inflation unrate ffr, lags(1/6) dfk small Vector autoregression Sample: 39 - 236 Number of obs = 198 Log likelihood = -298.8751 AIC = 3.594698 FPE = .0073199 HQIC = 3.97786 Det(Sigma_ml) = .0041085 SBIC = 4.541321 Equation Parms RMSE R-sq F P > F ---------------------------------------------------------------- inflation 19 .430015 0.9773 427.7745 0.0000 unrate 19 .252309 0.9719 343.796 0.0000 ffr 19 .795236 0.9481 181.8093 0.0000 ---------------------------------------------------------------- ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- inflation | inflation | L1. | 1.37357 .0741615 18.52 0.000 1.227227 1.519913 L2. | -.383699 .1172164 -3.27 0.001 -.6150029 -.1523952 L3. | .2219455 .1107262 2.00 0.047 .0034489 .440442 L4. | -.6102823 .1105383 -5.52 0.000 -.8284081 -.3921565 L5. | .6247347 .1158098 5.39 0.000 .3962065 .8532629 L6. | -.2352624 .0719141 -3.27 0.001 -.3771708 -.093354 | unrate | L1. | -.4638928 .1386526 -3.35 0.001 -.7374967 -.1902889 L2. | .6567903 .2370568 2.77 0.006 .1890049 1.124576 L3. | -.271786 .2472491 -1.10 0.273 -.759684 .2161119 L4. | -.4545188 .2473079 -1.84 0.068 -.9425328 .0334952 L5. | .6755548 .2387697 2.83 0.005 .2043893 1.14672 L6. | -.1905395 .136066 -1.40 0.163 -.4590393 .0779602 | ffr | L1. | .1135627 .0439648 2.58 0.011 .0268066 .2003187 L2. | -.1155366 .0607816 -1.90 0.059 -.2354774 .0044041 L3. | .0356931 .0628766 0.57 0.571 -.0883817 .1597678 L4. | -.0928074 .0620882 -1.49 0.137 -.2153263 .0297116 L5. | .0285487 .0605736 0.47 0.638 -.0909816 .1480789 L6. | .0309895 .0436299 0.71 0.478 -.0551055 .1170846 | _cons | .3255765 .1730832 1.88 0.062 -.0159696 .6671226 -------------+---------------------------------------------------------------- unrate | inflation | L1. | .0903987 .0435139 2.08 0.039 .0045326 .1762649 L2. | -.1647856 .0687761 -2.40 0.018 -.3005019 -.0290693 L3. | .0502256 .064968 0.77 0.440 -.0779761 .1784273 L4. | .0919702 .0648577 1.42 0.158 -.036014 .2199543 L5. | -.0091229 .0679508 -0.13 0.893 -.1432106 .1249648 L6. | -.0475726 .0421952 -1.13 0.261 -.1308366 .0356914 | unrate | L1. | 1.511349 .0813537 18.58 0.000 1.350814 1.671885 L2. | -.5591657 .1390918 -4.02 0.000 -.8336363 -.2846951 L3. | -.0744788 .1450721 -0.51 0.608 -.3607503 .2117927 L4. | -.1116169 .1451066 -0.77 0.443 -.3979565 .1747227 L5. | .3628351 .1400968 2.59 0.010 .0863813 .639289 L6. | -.1895388 .079836 -2.37 0.019 -.3470796 -.031998 | ffr | L1. | -.022236 .0257961 -0.86 0.390 -.0731396 .0286677 L2. | .0623818 .0356633 1.75 0.082 -.0079928 .1327564 L3. | -.0355659 .0368925 -0.96 0.336 -.1083661 .0372343 L4. | .0184223 .0364299 0.51 0.614 -.0534651 .0903096 L5. | .0077111 .0355412 0.22 0.828 -.0624226 .0778449 L6. | -.0097089 .0255996 -0.38 0.705 -.0602247 .040807 | _cons | .187617 .1015557 1.85 0.066 -.0127834 .3880173 -------------+---------------------------------------------------------------- ffr | inflation | L1. | .1425755 .1371485 1.04 0.300 -.1280603 .4132114 L2. | .1461452 .2167708 0.67 0.501 -.2816098 .5739003 L3. | -.0988776 .2047683 -0.48 0.630 -.502948 .3051928 L4. | -.4035444 .2044208 -1.97 0.050 -.8069291 -.0001598 L5. | .5118482 .2141696 2.39 0.018 .0892262 .9344702 L6. | -.1468158 .1329922 -1.10 0.271 -.40925 .1156184 | unrate | L1. | -1.411603 .2564132 -5.51 0.000 -1.917585 -.9056216 L2. | 1.525265 .4383941 3.48 0.001 .660179 2.39035 L3. | -.6439154 .4572429 -1.41 0.161 -1.546195 .2583646 L4. | .8175053 .4573517 1.79 0.076 -.0849893 1.72 L5. | -.344484 .4415619 -0.78 0.436 -1.21582 .5268524 L6. | .0366413 .2516297 0.15 0.884 -.459901 .5331835 | ffr | L1. | 1.003236 .0813051 12.34 0.000 .8427961 1.163676 L2. | -.4497879 .1124048 -4.00 0.000 -.6715968 -.2279789 L3. | .4273715 .1162791 3.68 0.000 .1979173 .6568256 L4. | -.0775962 .114821 -0.68 0.500 -.3041731 .1489807 L5. | .259904 .1120201 2.32 0.021 .0388542 .4809538 L6. | -.2866806 .0806857 -3.55 0.000 -.445898 -.1274631 | _cons | .2580589 .3200865 0.81 0.421 -.3735695 .8896873 ------------------------------------------------------------------------------ . matlist e(Sigma) | inflation unrate ffr -------------+--------------------------------- inflation | .1849129 unrate | -.0064425 .0636598 ffr | .0788766 -.09169 .6324

The output of **var** organizes its results by equation, where an “equation” is identified with its dependent variable: hence, there is an inflation equation, an unemployment equation, and an interest rate equation. **e(Sigma)** holds the covariance matrix of the estimated residuals from the VAR. Note that the residuals are correlated across equations.

As you might expect, the table of coefficients is rather long. Not including the constant terms, a VAR with \(n\) variables and \(k\) lags will have \(kn^2\) coefficients; our 3-variable, 6-lag VAR has nearly 60 coefficients that are estimated with only 198 observations. The options **dfk** and **small** apply small-sample corrections to the large-sample statistics that are reported by default. We can glance down the table of coefficients, standard errors, *t* statistics, and *p*-values, but it is not immediately informative to look at the coefficients on individual covariates in isolation. Because of this, many applied papers do not even report the table of coefficients; instead, they report some postestimation statistics that are (hopefully) more informative. The next two sections will explore two common postestimation statistics that are used to assess VAR output: Granger causality tests and impulse–response functions.

**Evaluating the output of a VAR: Granger causality tests**

A variable \(x_t\) is said to “Granger-cause” another variable \(y_t\) if, given the lags of \(y_t\), the lags of \(x_t\) are jointly statistically significant in the \(y_t\) equation. For example, the interest rate Granger-causes unemployment if lags of the interest rate are jointly statistically significant in the unemployment equation. The **vargranger** postestimation command performs a battery of Granger causality tests.

. quietly var inflation unrate ffr, lags(1/6) dfk small . vargranger Granger causality Wald tests +------------------------------------------------------------------------+ | Equation Excluded | F df df_r Prob > F | |--------------------------------------+---------------------------------| | inflation unrate | 3.5594 6 179 0.0024 | | inflation ffr | 1.6612 6 179 0.1330 | | inflation ALL | 4.6433 12 179 0.0000 | |--------------------------------------+---------------------------------| | unrate inflation | 2.0466 6 179 0.0618 | | unrate ffr | 1.2751 6 179 0.2709 | | unrate ALL | 3.3316 12 179 0.0002 | |--------------------------------------+---------------------------------| | ffr inflation | 3.6745 6 179 0.0018 | | ffr unrate | 7.7692 6 179 0.0000 | | ffr ALL | 5.1996 12 179 0.0000 | +------------------------------------------------------------------------+

As before, equations are distinguished by their dependent variable. For each equation, **vargranger** tests for the Granger causality of each variable in the VAR individually, then tests for the Granger causality of all added variables jointly. Consider the Granger causality tests for the unemployment equation. The row with “**ffr** excluded” tests the null hypothesis that all coefficients on lags of the interest rate in the unemployment equation are equal to zero, against the alternative that at least one is not equal to zero. The *p*-value of 0.27 does not fall below the typical statistical significance threshold of 0.05; hence, we cannot reject the null hypothesis that lags of the interest rate do not affect the unemployment rate. With this model and these data, the interest rate does not Granger-cause unemployment. By contrast, in the interest rate equation, lags of both inflation and unemployment are statistically significant and can be said to Granger-cause the interest rate.

The “**all** excluded” row for each equation excludes all lags that are not the autocorrelation coefficients in an equation; it is a joint test for the significance of all lags of all other variables in that equation. It may be considered a test between a purely autoregressive specification (null) against the VAR specification for that equation (alternate).

You can replicate the results of the Granger causality tests by running ordinary least squares on each equation and using **test** with the appropriate null hypothesis:

. quietly regress unrate l(1/6).unrate l(1/6).ffr l(1/6).inflation . test l1.inflation=l2.inflation=l3.inflation > =l4.inflation=l5.inflation=l6.inflation=0 ( 1) L.inflation - L2.inflation = 0 ( 2) L.inflation - L3.inflation = 0 ( 3) L.inflation - L4.inflation = 0 ( 4) L.inflation - L5.inflation = 0 ( 5) L.inflation - L6.inflation = 0 ( 6) L.inflation = 0 F( 6, 179) = 2.05 Prob > F = 0.0618 . test l1.ffr=l2.ffr=l3.ffr=l4.ffr=l5.ffr=l6.ffr=0 ( 1) L.ffr - L2.ffr = 0 ( 2) L.ffr - L3.ffr = 0 ( 3) L.ffr - L4.ffr = 0 ( 4) L.ffr - L5.ffr = 0 ( 5) L.ffr - L6.ffr = 0 ( 6) L.ffr = 0 F( 6, 179) = 1.28 Prob > F = 0.2709

The results of a “manual” Granger causality test match the results from **vargranger**.

**Evaluating the output of a VAR: Impulse responses**

The second set of statistics often used to evaulate a VAR is to simulate some shocks to the system and trace out the effects of those shocks on endogenous variables. But remember that the shocks were correlated across equations,

. matlist e(Sigma) | inflation unrate ffr -------------+--------------------------------- inflation | .1849129 unrate | -.0064425 .0636598 ffr | .0788766 -.09169 .6324

and it is ambiguous to talk about a “shock” to, say, the inflation equation when the error terms are correlated across equations.

One approach to this problem is to suppose that there are underlying structural shocks \(\bf{u}_t\), which are (by definition) uncorrelated, and that these shocks are related to the reduced-form shocks via the following relationship:

\begin{align*}

\boldsymbol{\varepsilon}_t &= {\bf A} {\bf u}_t \\

E(\bf{u}_t \bf{u}_t’) &= \bf{I}

\end{align*}

If we denote the covariance matrix of the error terms by \(\boldsymbol{\Sigma}\), then the \(\bf{A}\) matrix is linked to \(\boldsymbol{\Sigma}\) via

\begin{align*}

\boldsymbol{\Sigma} &= E(\boldsymbol{\varepsilon}_t

\boldsymbol{ \varepsilon}_t’) \\

&= E(\bf{A} \bf{u}_t \bf{u}_t’ \bf{A}’) \\

&= \bf{A} E(\bf{u}_t \bf{u}_t’) \bf{A}’ \\

&= \bf{A} \bf{A}’

\end{align*}

Because we have estimated \(\boldsymbol{\hat \Sigma}\), the problem is to construct \(\bf{\hat A}\) from

\begin{align}

\boldsymbol{\hat \Sigma} =\bf{\hat A} \bf{\hat A}’ \label{cov} \tag{1}

\end{align}

Many \(\bf{A}\) matrices satisfy (1). One way to narrow down the possible candidates is to assume that \(\bf{A}\) is lower-triangular; then \(\bf{A}\) can be found uniquely via a Cholesky decomposition of \(\bf{\Sigma}\). This approach is so common that it is built into the **var** postestimation results and can be accessed directly.

The assumption that \(\bf{A}\) is lower-triangular imposes an ordering on the variables in the VAR, and different orderings will produce different \(\bf{A}\). The economic content of this ordering is that the shock to any one equation affects the variables later in the ordering contemporaneously but that each variable in the VAR is contemporaneously unaffected by the shocks to the equations above it. For this post, I will impose the ordering we have used so far: the equations are ordered inflation first, then unemployment, then the interest rate. The inflation shock is allowed to affect all three variables contemporaneously; the unemployment shock is allowed to affect the interest rate contemporaneously, but not inflation; and the interest rate shock comes “last” and does not affect either inflation or unemployment contemporaneously.

With \(\bf{A}\) in hand, we can produce shocks that are uncorrelated across equations and trace out the effects of those shocks on the variables in the VAR. We can build the impulse–response functions with **irf create**, then graph the output with **irf graph**.

. quietly var inflation unrate ffr, lags(1/6) dfk small . irf create var1, step(20) set(myirf) replace (file myirf.irf now active) (file myirf.irf updated) . irf graph oirf, impulse(inflation unrate ffr) response(inflation unrate ffr) > yline(0,lcolor(black)) xlabel(0(4)20) byopts(yrescale)

After running the VAR, **irf create** creates an **.irf** file that stores numerous results from the VAR that may be of interest in postestimation. The results of more than one VAR may be stored in a single **.irf** file, so we give the VAR a name, in this case **var1**. The **set()** option names the **.irf** file—in this case **myirf.irf**—and sets it as the “active” **.irf** file for the purposes of later postestimation commands. The **step(20)** option instructs **irf create** to generate certain statistics, such as forecasts, out to a horizon of 20 periods.

The **irf graph** command graphs some of the statistics stored in the **.irf** file. Of the many statistics in that file, we will be interested in the orthogonalized impulse–response function, so we specify **oirf**, hence, the command **irf graph oirf**. The **impulse()** and **response()** options specify which equations to shock and which variables to graph; we will shock all equations and graph all variables.

The impulse–response graphs are the following:

The impulse–response graph places one impulse in each row and one response variable in each column. The horizontal axis for each graph is in the units of time that your VAR is estimated in, in this case quarters; hence, the impulse–response graph shows the effect of a shock over a 20-quarter period. The vertical axis is in units of the variables in the VAR; in this case, everything is measured in percentage points, so the vertical units in all panels are percentage point changes.

The first row shows the effect of a one-standard-deviation impulse to the interest rate equation. The interest rate is persistent and remains elevated for about 12 periods (3 years) after the initial impulse. Inflation declines slightly after eight quarters, but the response is not statistically significant at any horizon. The unemployment rate rises slowly for about 12 periods, peaking at a 0.2 perentage point increase, before declining.

The second row shows the impact of a shock to the inflation equation. An unexpected increase in inflation is associated with a highly persistent increase in the unemployment rate and the interest rate. Both the interest rate and unemployment rate remain elevated even five years after the impulse to inflation.

Finally, the third row shows the impact to a shock to the unemployment equation. An impulse to the unemployment rate causes inflation to decline by about one half of one percentage point over the following year. The interest rate responds strongly to the unemployment shock, falling by nearly one percentage point over the year following the shock.

Both the VAR and the ordering used here are illustrative. All the inferences are conditional on the \(\bf{A}\) matrix, that is, the ordering of the variables in the VAR. Different orderings will produce different \(\bf{A}\) matrices, which in turn will produce different impulse responses. In addition, there are identification strategies that go beyond simply ordering the equations; I will discuss those methods in a later post.

**Conclusion**

In this post, I estimated a VAR model and discussed two common postestimation statistics: Granger causality tests and impulse–response functions. In my next post, I will go deeper into the impulse response function and describe alternative identification strategies for performing structural inference in a VAR.