Fixed effects or random effects: The Mundlak approach
Today I will discuss Mundlak’s (1978) alternative to the Hausman test. Unlike the latter, the Mundlak approach may be used when the errors are heteroskedastic or have intragroup correlation.
What is going on?
Say I want to fit a linear panel-data model and need to decide whether to use a random-effects or fixed-effects estimator. My decision depends on how time-invariant unobservable variables are related to variables in my model. Here are two examples that may yield different answers:
- A panel dataset of individuals endowed with innate ability that does not change over time
- A panel dataset of countries where the time-invariant unobservables in our model are sets of country-specific geographic characteristics
In the first case, innate ability can affect observable characteristics such as the amount of schooling someone pursues. In the second case, geographic characteristics are probably not correlated with the variables in our model. Of course, these are conjectures, and we want a test to verify if unobservables are related to the variables in our model.
First, I will tell you how to compute the test; then, I will explain the theory and intuition behind it.
What is going on?
Computing the test
- Compute the panel-level average of your time-varying covariates.
- Use a random-effects estimator to regress your covariates and the panel-level means generated in (1) against your outcome.
- Test that the panel-level means generated in (1) are jointly zero.
If you reject that the coefficients are jointly zero, the test suggests that there is correlation between the time-invariant unobservables and your regressors, namely, the fixed-effects assumptions are satisfied. If you cannot reject the null that the generated regressors are zero, there is evidence of no correlation between the time-invariant unobservable and your regressors; that is, the random effects assumptions are satisfied.
Below I demonstrate the three-step procedure above using simulated data. The data satisfy the fixed-effects assumptions and have two time-varying covariates and one time-invariant covariate.
STEP 1
. bysort id: egen mean_x2 = mean(x2) . bysort id: egen mean_x3 = mean(x3)
STEP 2
. quietly xtreg y x1 x2 x3 mean_x2 mean_x3, vce(robust) . estimates store mundlak
STEP 3
. test mean_x2 mean_x3 ( 1) mean_x2 = 0 ( 2) mean_x3 = 0 chi2( 2) = 8.94 Prob > chi2 = 0.0114
We reject the null hypothesis. This suggests that time-invariant unobservables are related to our regressors and that the fixed-effects model is appropriate. Note that I used a robust estimator of the variance-covariance matrix. I could not have done this if I had used a Hausman test.
Where all this came from
A linear panel-data model is given by
\[\begin{equation*}
y_{it} = x_{it}\beta + \alpha_i + \varepsilon_{it}
\end{equation*}\]
The index \(i\) denotes the individual and the index \(t\) time. \(y_{it}\) is the outcome of interest, \(x_{it}\) is the set of regressors, \(\varepsilon_{it}\) is the time-varying unobservable, and \(\alpha_i\) is the time-invariant unobservable.
The key to the Mundlak approach is to determine if \(\alpha_i\) and \(x_{it}\) are correlated. We know how to think about this problem from our regression intuition. We can think of the mean of \(\alpha_i\) conditional on the time-invariant part of our regressors in the same way that we think of the mean of our outcome conditional on our covariates.
\[\begin{eqnarray*}
\alpha_i &=& \bar{x}_i\theta + \nu_i \\
E\left(\alpha_i|x_i\right) &=& \bar{x}_i\theta
\end{eqnarray*}\]
In the expression above, \(\bar{x}_i\) is the panel-level mean of \(x_{it}\), and \(\nu_i\) is a time-invariant unobservable that is uncorrelated to the regressors.
As in regression, if \(\theta = 0\), we know \(\alpha_i\) and the covariates are uncorrelated. This is what we test. The implied model is given by
\[\begin{eqnarray*}
y_{it} &=& x_{it}\beta + \alpha_i + \varepsilon_{it} \\
y_{it} &=& x_{it}\beta + \bar{x}_i\theta + \nu_i + \varepsilon_{it} \\
E\left(y_{it}|x_{it}\right) &=& x_{it}\beta + \bar{x}_i\theta
\end{eqnarray*}\]
The second equality replaces \(\alpha_i\) by \(\bar{x}_i\theta + \nu_i\). The third equality relies on the fact that the regressors and unobservables are mean independent. The test is given by
\[\begin{equation*}
H_{\text{o}}: \theta = 0
\end{equation*}\]
Reference
Mundlak, Y. 1978: On the pooling of time series and cross section data. Econometrica 46:69-85.