Home > Statistics > Tests of forecast accuracy and forecast encompassing

Tests of forecast accuracy and forecast encompassing

\(\newcommand{\mub}{{\boldsymbol{\mu}}}
\newcommand{\eb}{{\boldsymbol{e}}}
\newcommand{\betab}{\boldsymbol{\beta}}\)Applied time-series researchers often want to compare the accuracy of a pair of competing forecasts. A popular statistic for forecast comparison is the mean squared forecast error (MSFE), a smaller value of which implies a better forecast. However, a formal test, such as Diebold and Mariano (1995), distinguishes whether the superiority of one forecast is statistically significant or is simply due to sampling variability.

A related test is the forecast encompassing test. This test is used to determine whether one of the forecasts encompasses all the relevant information from the other. The resulting test statistic may lead a researcher to either combine the two forecasts or drop the forecast that contains no additional information.

In this post, I construct a pair of one-step-ahead recursive forecasts for the change in inflation rate from an autoregression model (AR) and a vector autoregression model (VAR). Based on an exercise in Clark and McCracken (2001), I consider a bivariate VAR with changes in inflation and unemployment rate as dependent variables. I compare the forecasts using MSFE, perform tests of predictive accuracy and forecast encompassing to determine whether unemployment rate is useful in predicting inflation rate. Because the VAR model nests the AR model, I assess the significance of the test statistic using appropriate critical values provided in McCracken (2007) and Clark and McCracken (2001).

Forecasting methods

Let \(y_t\) denote a series we want to forecast and \(\hat{y}_{t+h|t}\) denote the \(h\)-step-ahead forecast of \(y_t\) at time \(t\). Out-of-sample forecasts are usually computed with a fixed, rolling, or recursive window method. In all the methods, an initial sample of \(T\) observations is used to estimate the parameters of the model. In a fixed window method, estimation is performed once in a sample of \(T\) observations and \(h\)-step-ahead forecasts are made based on those estimates. In a rolling window method, the size of the estimation sample is kept fixed. However, estimation is performed multiple times by incrementing the beginning and end of the initial sample by the same number. Out-of-sample forecasts are computed after each estimation. A recursive window method is similar to a rolling window method except that the beginning time period is held fixed while the the ending period increases.

I use the rolling prefix command with the recursive option to generate recursive forecasts. rolling executes the command it prefixes on a window of observations and stores results such as parameter estimates or other summary statistics. In the following section, I write commands that fit AR(2) and VAR(2) models, compute one-step-ahead forecasts, and return relevant statistics. I use these commands with the rolling prefix and store the recursive forecasts in a new dataset.

Programming a command for the rolling prefix

In usinfl.dta, I have quarterly data from 1948q1 to 2016q1 on the changes in inflation rate and the unemployment rate obtained from the St. Louis FRED database. The first model is an AR(2) model of change in inflation, which is given by
\[
{\tt dinflation}_{\,\,t} = \beta_0 + \beta_1 {\tt dinflation}_{\,\,t-1} +
\beta_2 {\tt dinflation}_{\,\,t-2} + \epsilon_t
\]
where \(\beta_0\) is the intercept and \(\beta_1\) and \(\beta_2\) are the AR parameters. I generate recursive forecasts for these models beginning in 2002q2. This implies an initial window size of 218 observations from 1948q1 to 2002q1. The choice of the window size for computing forecasts in applied research is arbitrary. See Rossi and Inoue (2012) for tests of predictive accuracy that is robust to the choice of window size.

In the following code block, I have a command labeled fcst_ar2.ado that estimates the parameters of an AR(2) model using regress and computes one-step-ahead forecasts.

Code block 1: fcst_ar2.ado

program fcst_ar2, rclass
        syntax [if]

        qui tsset
        local timevar `r(timevar)'
        local first `r(tmin)'

        regress l(0/2).dinflation `if'
        summarize `timevar' if e(sample)

        local last = `r(max)'-`first'+1
        local fcast = _b[_cons] + _b[l.dinflation]*     ///
                dinflation[`last'] + _b[l2.dinflation]* ///
                dinflation[`last'-1]

        return scalar fcast = `fcast'
        return scalar actual = dinflation[`last'+1]
        return scalar sqerror = (dinflation[`last'+1]-  ///
                `fcast')^2
end

Line 1 defines the name of the command and declares it to be rclass so that I can return my results in r() after executing the command. Line 2 defines the syntax for the command. I specify an if qualifier in the syntax so that my command can identify the correct observations when used with the rolling prefix. Lines 4–6 store the time variable and the beginning time index in my dataset as local macros timevar and first, respectively. Lines 8–9 use regress to fit an AR(2) model and summarize the time variable in the sample used for estimation. Line 11 stores the last time index in the local macro last. The one-step-ahead forecast for the AR(2) model is
\[
\hat{y}_{t+1|t} = \hat{\beta}_0 + \hat{\beta}_1 {\tt dinflation}_{\,\,t-1} + \hat{\beta}_2 {\tt dinflation}_{\,\,t-2}
\]
where \(\hat{\beta}\)’s are the estimated parameters. Lines 12–14 store the forecast in a local macro fcast. Lines 16–19 return the forecasted value, actual value, and the squared error in local macros fcast, actual, and sqerror, respectively. Line 20 specifies the end of the command.

The second model is a VAR(2) model of change in inflation and unemployment rate and is given by
\[
\begin{bmatrix} {\tt dinflation}_{\,\,t} \\
{\tt dunrate}_{\,\,t}
\end{bmatrix}
= \mub + {\bf B}_1 \begin{bmatrix} {\tt
dinflation}_{\,\,t-1}\\ {\tt dunrate}_{\,\,t-1}
\end{bmatrix}
+ {\bf B}_2 \begin{bmatrix}
{\tt dinflation}_{\,\,t-2}\\
{\tt dunrate}_{\,\,t-2}
\end{bmatrix} + \eb_t
\]
where \(\mub\) is a \(2\times 1\) vector of intercepts, \({\bf B}_1\) and \({\bf B}_2\) are \(2\times 2\) matrices of parameters, and \(\eb_t\) is a \(2\times 1\) vector of IID errors with mean \({\bf 0}\) and covariance \(\boldsymbol{\Sigma}\).

In the following code block, I have a command called fcst_var2.ado that estimates the parameters of a VAR(2) model using var and computes one-step-ahead forecasts.

Code block 2: fcst_var2.ado

program fcst_var2, rclass

        syntax [if]

        quietly tsset
        local timevar `r(timevar)'
        local first `r(tmin)'

        var dinflation dunrate `if'
        summarize `timevar' if e(sample)

        local last = `r(max)'-`first'+1
        local fcast = _b[dinflation:_cons] +             ///
                _b[dinflation:l.dinflation]*             ///
                dinflation[`last'] +                     ///
                _b[dinflation:l2.dinflation]*            ///
                dinflation[`last'-1] +                   ///
                _b[dinflation:l.dunrate]*dunrate[`last'] ///
                + _b[dinflation:l2.dunrate]*dunrate[`last'-1]

        return scalar fcast = `fcast'
        return scalar actual = dinflation[`last'+1]
        return scalar sqerror = (dinflation[`last'+1]-   ///
                `fcast')^2
end

The structure of this command is similar to the AR(2) I described earlier. The one-step-ahead forecast of dinflation equation for the VAR(2) model is
\[
\hat{y}_{t+1|t} = \hat{\mu}_1 + \hat{\beta}_{1,11} {\tt
dinflation}_{\,\,t-1} + \hat{\beta}_{1,12} {\tt dunrate}_{\,\,t-1} +
\hat{\beta}_{2,11} {\tt dinflation}_{\,\,t-2} \\
+ \hat{\beta}_{2,12}
{\tt dunrate}_{\,\,t-2}
\]
where \(\hat{\mu}_1\) and \(\hat{\beta}\)’s are the estimated parameters corresponding to the dinflation equation. Lines 13–19 store the one-step-ahead forecast in local macro fcast.

Recursive forecasts

I use the rolling prefix command with the command fcst_ar2. The squared errors, one-step-ahead forecast, and the actual values returned by fcst_ar2 are stored in variables named ar2_sqerror, ar2_fcst, and actual, respectively. I specify a window size of 218 and store the estimates in the dataset ar2.

. use usinfl, clear

. rolling ar2_sqerr=r(sqerror) ar2_fcast=r(fcast) actual=r(actual), 
> window(218) recursive saving(ar2, replace): fcst_ar2
(running fcst_ar2 on estimation sample)

Rolling replications (56)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
.....e
file ar2.dta saved

With a window size of 218, I have 55 usable observations that are one-step-ahead forecasts. I do the same with fcst_var2 below and store the estimates in the dataset var2.

. rolling var2_sqerr=r(sqerror) var2_fcast=r(fcast), window(218) 
> recursive saving(var2, replace): fcst_var2
(running fcst_var2 on estimation sample)

Rolling replications (56)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
.....e
file var2.dta saved

I merge the two datasets containing the forecasts and label the actual and forecast variables. I plot the actual versus the forecasts of dinflation obtained from an AR(2) and a VAR(2) model respectively.

. use ar2, clear
(rolling: fcst_ar2)

. quietly merge 1:1 end using var2

. tsset end
        time variable:  end, 2002q2 to 2016q1
                delta:  1 quarter

. label var end "Quarterly date"

. label var actual "Change in inflation"

. label var ar2_fcast "Forecasts from AR(2)"

. label var var2_fcast "Forecasts from VAR(2)"

. tsline actual ar2_fcast var2_fcast, title("Actual vs Forecasts")

graph1

Both models produce forecasts that track the actual change in inflation. However, the performance of one forecast from the other is indistinguishable from the figure above.

Comparing MSFE

A popular statistic used to compare out-of-sample forecasts is the MSFE. I use the mean command to compute this.

. mean ar2_sqerr var2_sqerr

Mean estimation                   Number of obs   =         55

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
   ar2_sqerr |   .9210348   .3674181      .1844059    1.657664
  var2_sqerr |   .9040102   .3377726      .2268169    1.581203
--------------------------------------------------------------

The MSFE of the forecasts produced by the VAR(2) model is slightly smaller than that of the AR(2) model. This comparison, however, is based on a single sample and does not reflect the predictive performance in the population.

Test for equal predictive accuracy

McCracken (2007) provides test statistics to test the predictive accuracy of forecasts generated by a pair of nested parametric models. Under the null hypothesis, the expected loss of the pair of forecasts is the same. Under the alternative, the expected loss of the bigger model is less than that of the one it nests. In the current application, the null and the alternative hypothesis are as follows
\[
H_o: E[L_{t+1}(\betab_1)] = E[L_{t+1}(\betab_2)] \quad \mbox{vs} \quad
H_a: E[L_{t+1}(\betab_1)] > E[L_{t+1}(\betab_2)]
\]
where \(L_{t+1}(\cdot)\) denotes a squared error loss. \(\betab_1\) and \(\betab_2\) are the parameter vectors of the AR(2) and VAR(2) models, respectively.

I construct two test statistics, which McCracken labels as OOS-F and OOS-T for out-of-sample F and t tests. The OOS-T statistic is based on Diebold and Mariano (1995). The limiting distribution of both test statistics is nonstandard. The OOS-F and OOS-T test statistics are as follows
\[
\mathrm{OOS-F} = P \left[ \frac{\mathrm{MSFE}_1(\hat{\beta}_1) –
\mathrm{MSFE}_2(\hat{\beta}_2)}{\mathrm{MSFE}_2(\hat{\beta}_2)}\right]\\
\mathrm{OOS-T} = \hat{\omega}^{-0.5} \left[ \mathrm{MSFE}_1(\hat{\beta}_1) –
\mathrm{MSFE}_2(\hat{\beta}_2)\right]
\]
where \(P\) is the number of out-of-sample observations and \(\mathrm{MSFE}_1(\cdot)\) and \(\mathrm{MSFE}_2(\cdot)\) are the mean squared forecast error of the AR(2) and VAR(2) models, respectively. \(\hat{\omega}\) denotes a consistent estimate of the asymptotic variance of the mean of the squared-error loss differential. I compute the OOS-F statistic below:

. quietly mean ar2_sqerr var2_sqerr

. local msfe_ar2 = _b[ar2_sqerr]

. local msfe_var2 = _b[var2_sqerr]

. local P = 55

. display "OOS-F: " `P'*(`msfe_ar2'-`msfe_var2')/`msfe_var2'
OOS-F: 1.0357811

The corresponding 95% critical value from table 4 (using \(\pi=0.2\) and \(k_2=2\)) in McCracken (2007) is 1.453. The OOS-F fails to reject the null hypothesis, implying that the forecasts from the AR(2) and VAR(2) have similar predictive accuracy.

To compute the OOS-T statistic, I first estimate \(\hat{\omega}\) by using the heteroskedasticity and autocorrelation consistent (HAC) standard error from a regression of squared error loss differential on a constant.

. generate ld = ar2_sqerr-var2_sqerr
(1 missing value generated)

. gmm (ld-{_cons}), vce(hac nwest 4) nolog
note: 1 missing value returned for equation 1 at initial values

Final GMM criterion Q(b) =  4.41e-35

note: model is exactly identified

GMM estimation

Number of parameters =   1
Number of moments    =   1
Initial weight matrix: Unadjusted              Number of obs   =         55
GMM weight matrix:     HAC Bartlett 4

---------------------------------------------------------------------------
          |                 HAC
          |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------+----------------------------------------------------------------
   /_cons |   .0170247   .0512158     0.33   0.740    -.0833564    .1174057
---------------------------------------------------------------------------
HAC standard errors based on Bartlett kernel with 4 lags.
Instruments for equation 1: _cons

. display "OOS-T: " (`msfe_ar2'-`msfe_var2')/_se[_cons]
OOS-T: .33241066

The corresponding 95% critical value from table 1 in McCracken (2007) is 1.140. The OOS-T also fails to reject the null hypothesis, implying that the forecasts from the AR(2) and VAR(2) have similar predictive accuracy.

Test for forecast encompassing

Under the null hypothesis, forecasts from model 1 encompass that from model 2; thus forecasts from the latter model contain no additional information. Under the alternative hypothesis, forecast from model 2 contain more information than that from model 1. The new encompassing test statistic labeled as ENC-New in Clark and McCracken (2001) is given by
\[
\mathrm{ENC-New} = P \left[ \frac{\mathrm{MSFE}_1(\hat{\beta}_1) –
\sum_{t=1}^P\hat{e}_{1,t+1}\hat{e}_{2,t+1}/P}{\mathrm{MSFE}_2(\hat{\beta}_2)}\right]
\]
where \(\hat{e}_{1,t+1}\) and \(\hat{e}_{2,t+1}\) are one-step-ahead forecast errors. I compute the ENC-New statistic below:

. generate err_p = sqrt(ar2_sqerr*var2_sqerr)
(1 missing value generated)

. generate numer = ar2_sqerr-err_p
(1 missing value generated)

. quietly summarize numer

. display "ENC-NEW: " `P'*(r(mean)/`msfe_var2')
ENC-NEW: 1.3818277

The corresponding 95% critical value from table 1 in Clark and McCracken (2001) is 1.028. The ENC-New test rejects the null hypothesis, implying that the forecasts from the AR(2) do not encompass the VAR(2) forecasts. In other words, unemployment rate is useful for forecasting inflation rate.

The failure to reject the null hypothesis by the OOS-F and OOS-T test may be due to the lack of power of these tests compared with that of the ENC-New test.

Conclusion

In this post, I used the rolling prefix command to generate out-of-sample recursive forecasts from an AR(2) of changes in inflation and a VAR(2) model of changes in inflation and unemployment rate. I then constructed test statistics for forecast accuracy and forecast encompassing to determine whether unemployment rate is useful for forecasting inflation rate.

References

Clark, T. E., and M. W. McCracken. 2001. Tests of equal forecast accuracy and encompassing for nested models. Journal of Econometrics 105: 85–110.

Diebold, F. X., and R. S. Mariano. 1995. Comparing predictive accuracy. Journal of Business and Economic Statistics 13: 253–263.

McCracken, M. W. 2007. Aymptotics for out of sample tests of Granger causality. Journal of Econometrics 140: 719–752.

Rossi, B., and A. Inoue. 2012. Out-of-sample forecast tests robust to the choice of window size. Journal of Business and Economic Statistics 30: 432–453.

  • Christopher F Baum

    The Clark and McCracken statistic tests competing nested parametric models. Diebold and Mariano’s test is more general, in that the models need not be related. You could compare, for instance, an AR(1) forecast to a regression model’s forecast (or a nonlinear model’s forecast) of a given series. In Stata, ssc describe dmariano.

  • Syed Basher

    Very helpful discussion, thanks. Two quick inquiries: a) what is k2? b) what if the test statistics (F, t and ENC-NEW) are negative? Does that indicate non rejection of the null hypothesis?

  • Ashish Rajbhandari

    Dear Syed,
    k2 is the number of extra predictors in the nesting model, VAR(2) in this case. Yes a negative test statistic indicates failure to reject the null hypothesis.

    Ashish

  • Spock GP

    Very interesting post Ashish, thank you. Would this also be applicable
    in a panel data context? If so, then P would again be the number of
    out-of-sample observations or it would be N*(number of OOS
    observations), where N the number of individuals in the panel?

  • Jabba

    Very interessting post, thank you. How do you determine k_2 and pi to find McCracken’s (2007) critical values?