Including covariates in crossed-effects models
The manual entry for xtmixed documents all the official features in the command, and several applications. However, it would be impossible to address all the models that can be fitted with this command in a manual entry. I want to show you how to include covariates in a crossed-effects model.
Let me start by reviewing the crossed-effects notation for xtmixed. I will use the homework dataset from Kreft and de Leeuw (1998) (a subsample from the National Education Longitudinal Study of 1988). You can download the dataset from the webpage for Rabe-Hesketh & Skrondal (2008) (http://www.stata-press.com/data/mlmus2.html), and run all the examples in this entry.
If we want to fit a model with variable math (math grade) as outcome, and two crossed effects: variable region and variable urban, the standard syntax would be:
(1) xtmixed math ||_all:R.region || _all: R.urban
The underlying model for this syntax is
math_ijk = b + u_i + v_j + eps_ijk
where i represents the region and j represents the level of variable urban, u_i are i.i.d, v_j are i.i.d, and eps_ijk are i.i.d, and all of them are independent from each other.
The standard notation for xtmixed assumes that levels are always nested. In order to fit non-nested models, we create an artificial level with only one category consisting of all the observations; in addition, we use the notation R.var, which indicates that we are including dummies for each category of variable var, while constraining the variances to be the same.
That is, if we write
xtmixed math ||_all:R.region
we are just fitting the model:
xtmixed math || region:
but we are doing it in a very inefficient way. What we are doing is exactly the following:
generate one = 1 tab region, gen(id_reg) xtmixed math || one: id_reg*, cov(identity) nocons
That is, instead of estimating one variance parameter, we are estimating four, and constraining them to be equal. Therefore, a more efficient way to fit our mixed model (1), would be:
xtmixed math ||_all:R.region || urban:
This will work because urban is nested in one. Therefore, if we want to include a covariate (also known as random slope) in one of the levels, we just need to place that level at the end and use the usual syntax for random slope, for example:
xtmixed math public || _all:R.region || urban: public
Now let’s assume that we want to include random coefficients in both levels; how would we do that? The trick is to use the _all notation to include a random coefficient in the model. For example, if we want to fit
(2) xtmixed math meanses || region: meanses
we are assuming that variable meanses (mean SES per school) has a different effect (random slope) for each region. This model can be expressed as
math_ik = x_ik*b + sigma_i + alpha_i*meanses_ik
where sigma_i are i.i.d, alpha_i are i.i.d, and sigmas and alphas are independent from each other. This model can be fitted by generating all the interactions of meanses with the regions, including a random alpha_i for each interaction, and restricting their variances to be equal. In other words, we can fit model (2) also as follows:
unab idvar: id_reg* foreach v of local idvar{ gen inter`v' = meanses*`v' } xtmixed math meanses /// || _all:inter*, cov(identity) nocons /// || _all: R.region
Finally, we can use all these tools to include random coefficients in both levels, for example:
xtmixed math parented meanses public || _all: R.region || /// _all:inter*, cov(identity) nocons || urban: public
References:
Kreft, I.G.G and de J. Leeuw. 1998. Introducing Multilevel Modeling. Sage.
Rabe-Hesketh, S. and A. Skrondal. 2008. Multilevel and Longitudinal Modeling Using Stata, Second Edition. Stata Press