Home > Statistics > Fixed effects or random effects: The Mundlak approach

Fixed effects or random effects: The Mundlak approach

Today I will discuss Mundlak’s (1978) alternative to the Hausman test. Unlike the latter, the Mundlak approach may be used when the errors are heteroskedastic or have intragroup correlation.

What is going on?

Say I want to fit a linear panel-data model and need to decide whether to use a random-effects or fixed-effects estimator. My decision depends on how time-invariant unobservable variables are related to variables in my model. Here are two examples that may yield different answers:

1. A panel dataset of individuals endowed with innate ability that does not change over time
2. A panel dataset of countries where the time-invariant unobservables in our model are sets of country-specific geographic characteristics

In the first case, innate ability can affect observable characteristics such as the amount of schooling someone pursues. In the second case, geographic characteristics are probably not correlated with the variables in our model. Of course, these are conjectures, and we want a test to verify if unobservables are related to the variables in our model.

First, I will tell you how to compute the test; then, I will explain the theory and intuition behind it.

What is going on?

Computing the test

1. Compute the panel-level average of your time-varying covariates.
2. Use a random-effects estimator to regress your covariates and the panel-level means generated in (1) against your outcome.
3. Test that the panel-level means generated in (1) are jointly zero.

If you reject that the coefficients are jointly zero, the test suggests that there is correlation between the time-invariant unobservables and your regressors, namely, the fixed-effects assumptions are satisfied. If you cannot reject the null that the generated regressors are zero, there is evidence of no correlation between the time-invariant unobservable and your regressors; that is, the random effects assumptions are satisfied.

Below I demonstrate the three-step procedure above using simulated data. The data satisfy the fixed-effects assumptions and have two time-varying covariates and one time-invariant covariate.

STEP 1

. bysort id: egen mean_x2 = mean(x2)

. bysort id: egen mean_x3 = mean(x3)


STEP 2

. quietly xtreg y x1 x2 x3 mean_x2 mean_x3, vce(robust)

. estimates store mundlak


STEP 3

. test mean_x2 mean_x3

( 1)  mean_x2 = 0
( 2)  mean_x3 = 0

chi2(  2) =    8.94
Prob > chi2 =    0.0114


We reject the null hypothesis. This suggests that time-invariant unobservables are related to our regressors and that the fixed-effects model is appropriate. Note that I used a robust estimator of the variance-covariance matrix. I could not have done this if I had used a Hausman test.

Where all this came from

A linear panel-data model is given by

$\begin{equation*} y_{it} = x_{it}\beta + \alpha_i + \varepsilon_{it} \end{equation*}$

The index $$i$$ denotes the individual and the index $$t$$ time. $$y_{it}$$ is the outcome of interest, $$x_{it}$$ is the set of regressors, $$\varepsilon_{it}$$ is the time-varying unobservable, and $$\alpha_i$$ is the time-invariant unobservable.

The key to the Mundlak approach is to determine if $$\alpha_i$$ and $$x_{it}$$ are correlated. We know how to think about this problem from our regression intuition. We can think of the mean of $$\alpha_i$$ conditional on the time-invariant part of our regressors in the same way that we think of the mean of our outcome conditional on our covariates.

$\begin{eqnarray*} \alpha_i &=& \bar{x}_i\theta + \nu_i \\ E\left(\alpha_i|x_i\right) &=& \bar{x}_i\theta \end{eqnarray*}$

In the expression above, $$\bar{x}_i$$ is the panel-level mean of $$x_{it}$$, and $$\nu_i$$ is a time-invariant unobservable that is uncorrelated to the regressors.

As in regression, if $$\theta = 0$$, we know $$\alpha_i$$ and the covariates are uncorrelated. This is what we test. The implied model is given by

$\begin{eqnarray*} y_{it} &=& x_{it}\beta + \alpha_i + \varepsilon_{it} \\ y_{it} &=& x_{it}\beta + \bar{x}_i\theta + \nu_i + \varepsilon_{it} \\ E\left(y_{it}|x_{it}\right) &=& x_{it}\beta + \bar{x}_i\theta \end{eqnarray*}$

The second equality replaces $$\alpha_i$$ by $$\bar{x}_i\theta + \nu_i$$. The third equality relies on the fact that the regressors and unobservables are mean independent. The test is given by

$\begin{equation*} H_{\text{o}}: \theta = 0 \end{equation*}$

Reference

Mundlak, Y. 1978: On the pooling of time series and cross section data. Econometrica 46:69-85.

Categories: Statistics Tags:
• Alfonso Sánchez-Peñalver

Enrique, this is the CRE model you mentioned to me in Statalist, isn’t it? I understand the test and the logic behind it. However, I have several comments. When estimating the model with both the variables and the means of the variables you’re in effect accounting for the between and within effects. Testing the joint significance of the coefficients on the means is not only testing that the unobserved effects are correlated with the variables, which it is, but also whether the between and within coefficients are equal. The question that really pops into my mind is whether when we have that within and between coefficients are different, why would fixed effects represent the right estimator? All we really know is that random effects is inconsistent, so it is not valid. But fixed effects does not capture the between effect. To put an example consider giving over time. The panel variable is the household, and then you just observe different periods. Households are very consistent on their giving over time, so age may not have a strong effect within the household, but it may have a strong variation across households. How then can we conclude that fixed-effects is the right estimator if it’s not capturing everything?

• epinzon

Hello Alfonso,

With regard to estimation, if the assumptions about the form of the time-invariant component (alpha) are satisfied, and you have strict exogeneity, the fixed-effects estimator should give you a consistent estimator of the parameters of the time-varying coefficients. The key point is that we have specified the form of the correlation between alpha and the regressors correctly.

In this post I was not considering estimation but showing the Mundlak approach as an alternative to the Hausman test. However, you are right to point out that this approach can be considered for estimation. In particular, it might be useful in non-linear panel data models. However, as you reflect in your comment, thinking about estimation requires a more careful consideration of the assumptions and models you are pursuing.

Perhaps in the future I will write about this topic but as I mentioned before Jeff Wooldridge has a very nice discussion on the topic on http://www.iza.org/conference_files/…nonlin_iza.pdf for those interested.

• Alfonso Sánchez-Peñalver

Hi Enrique, as I said I understand the test and agree that is a good alternative to Hausman’s. My point is that I think it’s more explicit than the Hausman test in testing that the between effect is different from the within effect. The Hausman test tests the differences between the fixed effects coefficients and those from random effects. Fixed-effects coefficients are the within coefficients, and random effects coefficients are a weighted average of the within and between coefficients (Rabe-Hesketh and Skrondal’s Multilevel and Longitudinal Modeling Using Stata explain this very well in Chapter 3 of their Volume 1), so the differences between random and fixed effects lies in the difference between the between coefficient and the within coefficient. The coefficients you are testing in this model are exactly that: the difference between the within and between coefficients (again Rabe-Hesketh and Skrondal’s chapter 3). My concern with the Hausman test and any test that tries to make you decide between a random effects and a fixed effects model is that it doesn’t consider the information that you are losing when selecting the fixed-effects model. As you mention, fixed-effects will give a consistent estimate of the within (time-varying) coefficients, but it presents no information on the between (cross-sectional varying) coefficients. By choosing fixed-effects we can only explain what happens across time on average. We cannot explain what happens across panels when a variable changes. And the point is that those effects may be completely different. If you ever do a piece on panel data estimation that accounts for this, let me know because I’ll be glad to exchange ideas with you.

• epinzon

Hi Alfonso,

I agree that you loose information when you use the fixed-effects model. Also, I agree with your characterization of the scope of the tests I discuss.

• Alfonso Sánchez-Peñalver

Enrique, ¿cómo sigues? Today I was reading a discussion in researchgate where a member needed testing for fixed or random effects but at two-levels, not just one. That is, he has two cluster levels (banks and countries) and wants to test for fixed versus random effects at each of these two levels. Assuming that they’re not nested I immediately thought of this approach. Then I started wondering what would happen if the random-effects between both levels are correlated… should we include that in the test. So I was wondering if you could look into testing between random and fixed effects when you have more than one effect level. Preferably on how to test with nested effects, and with cross-effects. Maybe in two separate posts. Sorry if I’m giving you more work. Thanks.

• Hi. I wonder to know is the Mundlak (1978) procedure to control for level 2 endogeneity has been extended to a 3-level multilevel model in a longitudinal setting (namely, time is the first level of the hierarchy, firms are at the secodn level and regions athe the third level).

• if

• Martin Paul

In using the Cre for non linear models, do you need to average all time varying variables??