## xtabond cheat sheet

Random-effects and fixed-effects panel-data models do not allow me to use observable information of previous periods in my model. They are static. Dynamic panel-data models use current and past information. For instance, I may model current health outcomes as a function of health outcomes in the past— a sensible modeling assumption— and of past observable and unobservable characteristics.

Today I will provide information that will help you interpret the estimation and postestimation results from Stata’s Arellano–Bond estimator **xtabond**, the most common linear dynamic panel-data estimator.

**The instruments and the regressors**

We have fictional data for 1,000 people from 1991 to 2000. The outcome of interest is income (**income**), and the explanatory variables are years of schooling (**educ**) and an indicator for marital status (**married**). Below, we fit an Arellano–Bond model using **xtabond**.

. xtabond income married educ, vce(robust) Arellano-Bond dynamic panel-data estimation Number of obs = 8,000 Group variable: id Number of groups = 1,000 Time variable: year Obs per group: min = 8 avg = 8 max = 8 Number of instruments = 39 Wald chi2(3) = 3113.63 Prob > chi2 = 0.0000 One-step results (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | Robust income | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- income | L1. | .2008311 .0036375 55.21 0.000 .1937018 .2079604 | married | 1.057667 .1006091 10.51 0.000 .8604764 1.254857 educ | .057551 .0045863 12.55 0.000 .0485619 .06654 _cons | .2645702 .0805474 3.28 0.001 .1067002 .4224403 ------------------------------------------------------------------------------ Instruments for differenced equation GMM-type: L(2/.).income Standard: D.married D.educ Instruments for level equation Standard: _cons

A couple of elements in the output table are different from what one would expect. The output includes a coefficient for the lagged value of the dependent variable that we did not specify in the command. Why?

In the Arellano–Bond framework, the value of the dependent variable in the previous period is a predictor for the current value of the dependent variable. Stata includes the value of the dependent variable in the previous period for us. Another noteworthy aspect that appears in the table is the mention of 39 instruments in the header. This is followed by a footnote that refers to GMM and standard-type instruments. Here a bit of math will help us understand what is going on.

The relationship of interest is given by

\[\begin{equation*}

y_{it} = x_{it}’\beta_1 + y_{i(t-1)}\beta_2 + \alpha_i + \varepsilon_{it}

\end{equation*}\]

In the equation above, \(y_{it}\) is the outcome of interest for individual \(i\) at time \(t\), \(x_{it}\) are a set of regressors that may include past values, \(y_{i(t-1)}\) is the value of the outcome in the previous period, \(\alpha_i\) is a time-invariant unobservable, and \(\varepsilon_{it}\) is a time-varying unobservable.

As in the fixed-effects framework, we assume the time-invariant unobserved component is related to the regressors. When unobservables and observables are correlated, we have an endogeneity problem that yields inconsistent parameter estimates if we use a conventional linear panel-data estimator. One solution is taking first-differences of the relationship of interest. However, the strategy of taking first-differences does not work. Why?

\[\begin{eqnarray*}

\Delta y_{it} &=& \Delta x_{it}’\beta_1 + \Delta y_{i(t-1)} + \Delta \varepsilon_{it} \\

E\left( \Delta y_{i(t-1)} \Delta \varepsilon_{it} \right) &\neq & 0

\end{eqnarray*}\]

In the first equation above, we got rid of \(\alpha_i\), which is correlated with our regressors, but we generated a new endogeneity problem. The second equation above illustrates one of our regressors is related to our unobservables. The solution is instrumental variables. Which instrumental variables? Arellano–Bond suggest the second lags of the dependent variable and all the feasible lags thereafter. This generates the set of moment conditions defined by

\[\begin{eqnarray*}

E\left( \Delta y_{i(t-2)} \Delta \varepsilon_{it} \right) &=& 0 \\

E\left( \Delta y_{i(t-3)} \Delta \varepsilon_{it} \right) &=& 0 \\

\ldots & & \\

E\left( \Delta y_{i(t-j)} \Delta \varepsilon_{it} \right) &=& 0

\end{eqnarray*}\]

In our example, we have 10 time periods, which yield the following set of instruments:

\[\begin{eqnarray*}

t&=10& \quad y_{t-8}, y_{t-7}, y_{t-6}, y_{t-5}, y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1} \\

t&=9& \quad y_{t-7}, y_{t-6}, y_{t-5}, y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1} \\

t&=8& \quad y_{t-6}, y_{t-5}, y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1} \\

t& = 7& \quad y_{t-5}, y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1} \\

t&= 6& \quad y_{t-4}, y_{t-3}, y_{t-2}, y_{t-1} \\

t&= 5& \quad y_{t-3}, y_{t-2}, y_{t-1} \\

t&= 4& \quad y_{t-2}, y_{t-1} \\

t&=3& \quad y_{t-1}

\end{eqnarray*}\]

This gives us 36 instruments which are what the table calls GMM-type instruments. GMM has been explored in the blog post Estimating parameters by maximum likelihood and method of moments using mlexp and gmm and we will talk about it in a later post. The other three instruments are given by the first difference of the regressors **educ** and **married** and the constant. This is no different from two-stage least squares, where we include the exogenous variables as part of our instrument list.

**Testing for serial correlation**

The key for the instrument set in Arellano–Bond to work is that

\[\begin{equation}

E\left( \Delta y_{i(t-j)} \Delta \varepsilon_{it} \right) = 0 \quad j \geq 2

\end{equation}\]

We can test these conditions in Stata using **estat abond**. In essence, the differenced unobserved time-invariant component should be unrelated to the second lag of the dependent variable and the lags thereafter. If this is not the case, we are back to the initial problem, endogeneity. Again, a bit of math will help us understand what is going on.

All is well if

\[\begin{equation}

\Delta \varepsilon_{it} = \Delta \nu_{it}

\end{equation}\]

The unobservable is serially correlated of order 1 but not serially correlated of orders 2 or beyond.

But we are in trouble if

\[\begin{equation}

\Delta \varepsilon_{it} = \Delta \nu_{it} + \Delta \nu_{i(t-1)}

\end{equation}\]

The second lag of the dependent variable will be related to the differenced time-varying component \(\Delta \varepsilon_{it}\). Another way of saying this is that the differenced time-varying unobserved component is serially correlated with an order greater than 1.

**estat abond** provides a test for the serial correlation structure. For the example above,

. estat abond Arellano-Bond test for zero autocorrelation in first-differenced errors +-----------------------+ |Order | z Prob > z| |------+----------------| | 1 |-22.975 0.0000 | | 2 |-.36763 0.7132 | +-----------------------+ H0: no autocorrelation

We reject no autocorrelation of order 1 and cannot reject no autocorrelation of order 2. There is evidence that the Arellano–Bond model assumptions are satisfied. If this were not the case, we would have to look for different instruments. Essentially, we would have to fit a different dynamic model. This is what the **xtdpd** command allows us to do, but it is beyond the scope of this post.

**Parting words**

Dynamic panel-data models provide a useful research framework. In this post, I touched on the interpretation of a couple of results from estimation and postestimation from **xtabond** that will help you understand your output.