Cointegration or spurious regression?

Home > Statistics > Cointegration or spurious regression?

Cointegration or spurious regression?

6 September 2016 Ashish Rajbhandari, Senior Econometrician Go to comments

\(\newcommand{\betab}{\boldsymbol{\beta}}\)Time-series data often appear nonstationary and also tend to comove. A set of nonstationary series that are cointegrated implies existence of a long-run equilibrium relation. If such an equlibrium does not exist, then the apparent comovement is spurious and no meaningful interpretation ensues.

Analyzing multiple nonstationary time series that are cointegrated provides useful insights about their long-run behavior. Consider long- and short-term interest rates such as the yield on a 30-year and a 3-month U.S. Treasury bond. According to the expectations hypothesis, long-term interest rates are determined by the average of expected future short-term rates. This implies that the yields on the two bonds cannot deviate from one another over time. Thus, if the two yields are cointegrated, any influence to the short-term rate leads to adjustments in the long-term interest rate. This has important implications in making various policy or investment decisions.

In a cointegration analysis, we begin by regressing a nonstationary variable on a set of other nonstationary variables. Suprisingly, in finite samples, regressing a nonstationary series with another arbitrary nonstationary series usually results in significant coefficients with a high \(R^2\). This gives a false impression that the series may be cointegrated, a phenomenon commonly known as spurious regression.

In this post, I use simulated data to show the asymptotic properties of an ordinary least-squares (OLS) estimator under cointegration and spurious regression. I then perform a test for cointegration using the Engle and Granger (1987) method. These exercises provide a good first step toward understanding cointegrated processes.

Cointegration

For ease of exposition, I consider two variables, \(y_t\) and \(x_t\), that are integrated of order 1 or I(1). They can be transformed to a stationary or I(0) series by taking first differences.

The variables \(y_t\) and \(x_t\) are cointegrated if their linear combination is I(0). This implies \(\betab [y_t,x_t]’ = e_t\), where \(\betab\) is the cointegrating vector and \(e_t\) is a stationary equilibrium error. Typically, with the addition of more variables, more than one cointegrating vector may exist. Nonetheless, the Engle–Granger method assumes a single cointegrating vector regardless of the number of variables being modeled.

A normalizing assumption such as \(\betab = (1,-\beta)\) is made to uniquely identify the cointegrating vector using OLS. This type of normalization specifies which variables occur on the left- and right-hand side of the equation. In the case of cointegrated variables, the choice of normalization is irrelevant asymptotically. Nonetheless, in practice, they are typically guided by some theory that dictates their relationship. The normalization in the current application leads to the following regression specification.

\begin{equation}
\label{eq1}
y_t = \beta x_t + e_t
\tag{1}
\end{equation}

The equation above describes the long-run relation between \(y_t\) and \(x_t\). This is also referred to as a “static” regression because no other dynamics in the variables and serial correlation in the error term are assumed.

The OLS estimator is given by \(\hat{\beta}= \frac{\sum_{t=1}^T y_t x_t}{\sum_{t=1}^T x^2_t}\). Because \(y_t\) and \(x_t\) are both I(1) processes, the numerator and denominator converge to complicated functions of Brownian motions as \(T\to \infty\). However, \(\hat{\beta}\) still consistently estimates the true \(\beta\) regardless of whether \(x_t\) is correlated with \(e_t\). In fact, the estimator is superconsistent, which implies it converges to the true value at a faster rate than an OLS estimator of a regression on stationary series. Inference on \(\hat{\beta}\) is not straightforwad because its asymptotic distribution is nonstandard and also depends on whether deterministic terms such as constant and time trend are specified.

Monte Carlo simulations

I perform Monte Carlo simulations (MCS) with 1,000 replications to plot the empirical densities of the OLS estimator \(\hat{\beta}\) under spurious regression and cointegration. In the case of spurious regression, the empirical density of \(\hat{\beta}\) does not shrink toward the true value even after increasing the sample size. This implies that the OLS estimator is inconsistent. On the other hand, if the series are cointegrated, we expect the empirical density of \(\hat{\beta}\) to converge to its true value.

Data-generating process for spurious regression

I generate \(y_t\) and \(x_t\) according to the following specification,

\begin{align}
\label{eq2}
y_t &= 0.7 x_t + e_t \\
\nonumber
x_t &= x_{t-1} + \nu_{xt}\\
\nonumber
e_t &= e_{t-1} + \nu_{yt} \tag{2}
\end{align}

where \(\nu_{xt}\) and \(\nu_{yt}\) are IID N(0,1). \(x_t\) and \(e_t\) are independent random walks. This is a spurious regression because the linear combination \(y_t – 0.7x_t = e_t\) is an I(1) process.

Data-generating process for cointegration

I generate \(y_t\) and \(x_t\) according to the following specification,

\begin{align}
\label{eq3}
y_t &= 0.7 x_t + e_{yt} \\
\nonumber
x_t &= x_{t-1} + e_{xt} \tag{3}
\end{align}

where \(x_t\) is the only I(1) process. I allow for contemporaneous correlation as well as serial correlation by generating the error terms \(e_{yt}\) and \(e_{xt}\) as a vector moving average (VMA) process with 1 lag. The VMA model is given by

\begin{align*}
e_{yt} &= 0.3 \nu_{yt-1} + 0.4 \nu_{xt-1} + \nu_{yt} \\
e_{xt} &= 0.7 \nu_{yt-1} + 0.1 \nu_{xt-1} + \nu_{xt} \\
\end{align*}

where \(\nu_{yt}\) and \(\nu_{xt}\) are drawn from a normal distribution with mean 0 and variance matrix \({\bf \Sigma} = \left[\begin{matrix} 1 & 0.7 \\ 0.7 & 1.5 \end{matrix}\right]\). The graph below plots the empirical densities. The code for performing the simulations is provided in the Appendix.

In the case of spurious regression, the OLS estimator is inconsistent because it does not converge to its true value even after increasing the sample size from 100 to 1,000. The vertical line at 0.7 represents the true value of \(\beta\). Moreover, in finite samples, the coefficients in a spurious regression are usually significant with a high \(R^2\). In fact, Phillips (1986) showed that the \(t\) and \(F\) statistics diverge in distribution as \(T\to \infty\), thus making any inferences invalid.

In the case of cointegration, I artificially introduced serial correlation in the error terms in the data-generating process. This led to a biased estimate as seen in the graph above for a sample size of 100, which improves slightly as the sample size increases to 200. Remarkably, the OLS estimator converges to its true value as we increase the sample size to 1,000. While consistency is an important property, we still cannot perform inference on \(\hat{\beta}\) because of its nonstandard asymptotic distribution.

Testing for cointegration

We saw in the previous section that the OLS estimator is consistent if the series are cointegrated. This is true even when we fail to account for serial correlation in the error terms. To test for cointegration, we can construct residuals based on the static regression and test for the presence of unit root. If the series are cointegrated, the estimated residuals will be close to being stationary. This is the approach in the Engle–Granger two-step method. The first step estimates \(\beta\) in \eqref{eq1} by OLS followed by testing for unit root in the residuals. Tests such as the augmented Dickey–Fuller (ADF) or Phillips–Perron (PP) can be performed on the residuals. Refer to an earlier post for a discussion of these tests.

The null and alternative hypotheses for testing cointegration are as follows:

\[
H_o: e_t = I(1) \qquad H_a: e_t = I(0)
\]

The null hypothesis states that \(e_t\) is nonstationary, which implies no cointegration between \(y_t\) and \(x_t\). The alternative hypothesis states that \(e_t\) is stationary and implies cointegration.

If the true cointegrating vector \(\betab\) is known, then we simply plug its value in (1) and obtain the \(e_t\) series. We then apply the ADF test to \(e_t\). The test statistic will have the standard DF distribution.

In the case of an unknown cointegrating vector \(\betab\), the ADF test statistic based on the estimated residuals does not follow the standard DF distribution under the null hypothesis of no cointegration (Phillips and Ouliaris 1990). Furthermore, Hansen (1992) shows that the distribution of the ADF statistic also depends on whether \(y_t\) and \(x_t\) contain deterministic terms such as constants and time trends. Hamilton (1994, 766) provides critical values for making inference in such cases.

Example

I have two datasets that I saved while performing MCS in the previous section. spurious.dta consists of two variables, \(y_t\) and \(x_t\), generated according to the spurious regression equations in (2). coint.dta also consists of \(y_t\) and \(x_t\), generated using the cointegration equation in (3).

First, I test for cointegration in spurious.dta.

. use spurious, clear

. reg y x, nocons

      Source |       SS           df       MS      Number of obs   =       100
-------------+----------------------------------   F(1, 99)        =      6.20
       Model |  63.4507871         1  63.4507871   Prob > F        =    0.0144
    Residual |  1013.32308        99  10.2355867   R-squared       =    0.0589
-------------+----------------------------------   Adj R-squared   =    0.0494
       Total |  1076.77387       100  10.7677387   Root MSE        =    3.1993

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |  -.1257176   .0504933    -2.49   0.014    -.2259073   -.0255279
------------------------------------------------------------------------------

The coefficient on x is negative and significant. I use predict with the resid option to generate the residuals and perform an ADF test using the dfuller command. I use the noconstant option to supress the constant term in the regression and lags(2) to adjust for serial correlation. The noconstant option in the dfuller command implies fitting a random walk model.

. predict spurious_resd, resid

. dfuller spurious_resd, nocons lags(2)

Augmented Dickey-Fuller test for unit root         Number of obs   =        97

                               ---------- Interpolated Dickey-Fuller ---------
                  Test         1% Critical       5% Critical      10% Critical
               Statistic           Value             Value             Value
------------------------------------------------------------------------------
 Z(t)             -1.599            -2.601            -1.950            -1.610

As mentioned earlier, the critical values of the DF distribution may not be used in this case. The 5% critical value provided in the first row of Hamilton (1994, 766) under case 1 in table B.9 is \(-\)2.76. The test statistic of \(-\)1.60 in this case implies failure to reject the null hypothesis of no cointegration.

I perform the same two-step method for coint.dta below.

. use coint, clear

. reg y x, nocons

      Source |       SS           df       MS      Number of obs   =       100
-------------+----------------------------------   F(1, 99)        =   3148.28
       Model |  4411.48377         1  4411.48377   Prob > F        =    0.0000
    Residual |   138.72255        99  1.40123788   R-squared       =    0.9695
-------------+----------------------------------   Adj R-squared   =    0.9692
       Total |  4550.20632       100  45.5020632   Root MSE        =    1.1837

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .7335899   .0130743    56.11   0.000     .7076477    .7595321
------------------------------------------------------------------------------

. predict coint_resd, resid

. dfuller coint_resd, nocons lags(2)

Augmented Dickey-Fuller test for unit root         Number of obs   =        97

                               ---------- Interpolated Dickey-Fuller ---------
                  Test         1% Critical       5% Critical      10% Critical
               Statistic           Value             Value             Value
------------------------------------------------------------------------------
 Z(t)             -5.955            -2.601            -1.950            -1.610

Comparing the DF test statistic of \(-\)5.95 with the appropriate critical value of \(-\)2.76, I reject the null hypothesis of no cointegration at the 5% level.

Conclusion

In this post, I performed MCS to show the consistency of the OLS estimator under cointegration. I used the Engle–Granger two-step approach to test for cointegration in simulated data.

References

Engle, R. F., and C. W. J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing. Econometrica 55: 251–276.

Hamilton, J. D. 1994. Time Series Analysis. Princeton: Princeton University Press.

Hansen, B. E. 1992. Effcient estimation and testing of cointegrating vectors in the presence of deterministic trends. Journal of Econometrics 53: 87–121.

Phillips, P. C. B. 1986. Understanding spurious regressions in econometrics. Journal of Econometrics 33: 311–340.

Phillips, P. C. B., and S. Ouliaris. 1990. Asymptotic properties of residual based tests for cointegration. Econometrica 58: 165–193.

Appendix

Code for spurious regression

The following code performs an MCS for a sample size of 100.

Code block 1: spurious100.do

cscript
set seed 2016

local MC = 1000

quietly postfile spurious beta_t100 using t100, replace

forvalues i=1/`MC' {
        quietly {
                drop _all

                set obs 100
                gen time = _n
                tsset time

                gen nu_y = rnormal(0,0.7)
                gen nu_x = rnormal(0,1.5)

                gen err_y = nu_y in 1
                gen err_x = nu_x in 1
                replace err_y = l.err_y + nu_y in 2/l
                replace err_x = l.err_x + nu_x in 2/l

                gen y = err_y in 1
                gen x = err_x
                replace y = 0.7*x + err_y in 2/l
                if (`i'==1) save spurious, replace
                qui reg y x, nocons
        }
                post spurious (_b[x])
}
postclose spurious

Code for cointegration

The following code performs an MCS for a sample size of 100.

Code block 2: coint100.do

cscript
set seed 2016

local MC = 1000

quietly postfile coint beta_t100 using t100, replace

forvalues i=1/`MC' {
        quietly {
                drop _all

                set obs 100
                gen time = _n
                tsset time

                matrix V = (1,0.7\0.7,1.5)
                drawnorm nu_y nu_x, cov(V)

                gen err_y = nu_y in 1
                gen err_x = nu_x in 1
                replace err_y = 0.3*l.nu_y + 0.4*l.nu_x ///
                        + nu_y in 2/l
                replace err_x = 0.7*l.nu_y + 0.1*l.nu_x ///
                        + nu_x in 2/l

                gen x = err_x in 1
                replace x = l.x + err_x in 2/l
                gen y = 0.7*x + err_y
                if (`i'==1) save coint, replace
                qui reg y x, nocons
        }
                post coint (_b[x])
}
postclose coint

Combining graphs

Assuming you have run an MCS for sample sizes of 200 and 1,000, you may combine the graph using the following code block.

Code block 3: grcombine.do

/*Spurious regression*/
use t100, clear
quietly merge 1:1 _n using t200
drop _merge
quietly merge 1:1 _n using t1000
drop _merge
kdensity beta_t100, n(1000) generate(x_100 f_100) ///
        kernel(gaussian) nograph
label variable f_100 "T=100"
kdensity beta_t200, n(1000) generate(x_200 f_200) ///
        kernel(gaussian) nograph
label variable f_200 "T=200"
kdensity beta_t1000, n(1000) generate(x_1000 f_1000) ///
        kernel(gaussian) nograph
label variable f_1000 "T=1000"
graph twoway (line f_100 x_100) (line f_200 x_200) ///
        (line f_1000 x_1000), legend(rows(1))      ///
        subtitle("Spurious regression")            ///
        saving(spurious, replace) xmlabel(0.7)     ///
        xline(0.7, lcolor(black)) nodraw

/*Cointegration*/
use t100, clear
quietly merge 1:1 _n using t200
drop _merge
quietly merge 1:1 _n using t1000
drop _merge
kdensity beta_t100, n(1000) generate(x_100 f_100) ///
        kernel(gaussian) nograph
label variable f_100 "T=100"
kdensity beta_t200, n(1000) generate(x_200 f_200) ///
        kernel(gaussian) nograph
label variable f_200 "T=200"
kdensity beta_t1000, n(1000) generate(x_1000 f_1000) ///
        kernel(gaussian) nograph
label variable f_1000 "T=1000"
graph twoway (line f_100 x_100) (line f_200 x_200) ///
        (line f_1000 x_1000), legend(rows(1))      ///
        subtitle("Cointegration")                  ///
        saving(cointegration, replace)             ///
        xline(0.7, lcolor(black)) nodraw

graph combine spurious.gph cointegration.gph,      ///
      title("Empirical densities of {&beta}")

Categories: Statistics Tags: cointegration, Engle Granger, nonstationary, spurious regression, time series

An ordered-probit inverse probability weighted (IPW) estimator Two faces of misspecification in maximum likelihood: Heteroskedasticity and robust standard errors

Cointegration or spurious regression?

Subscribe to the Stata Blog

Recent articles

Archives

Categories

Links