Archive

Posts Tagged ‘SEM’

Using gsem to combine estimation results


gsem is a very flexible command that allows us to fit very sophisticated models. However, it is also useful in situations that involve simple models.

For example, when we want to compare parameters among two or more models, we usually use suest, which combines the estimation results under one parameter vector and creates a simultaneous covariance matrix of the robust type. This covariance estimate is described in the Methods and formulas of [R] suest as the robust variance from a “stacked model”. Actually, gsem can estimate these kinds of “stacked models”, even if the estimation samples are not the same and eventually overlap. By using the option vce(robust), we can replicate the results from suest if the models are available for gsem. In addition, gsem allows us to combine results from some estimation commands that are not supported by suest, like models including random effects.

 

Example: Comparing parameters from two models

 

Let’s consider the childweight dataset, described in [ME] mixed. Consider the following models, where weights of boys and girls are modeled using the age and the age-squared:

. webuse childweight, clear
(Weight data on Asian children)

. regress  weight age c.age#c.age if girl == 0, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.985022   .6343855    12.59   0.000     6.725942    9.244101
             |
 c.age#c.age |   -1.74346   .2374504    -7.34   0.000    -2.214733   -1.272187
             |
       _cons |   3.684363   .3217223    11.45   0.000     3.045833    4.322893
------------------------------------------------------------------------------

. regress  weight age c.age#c.age if girl == 1, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.008066   .5164687    13.57   0.000     5.982746    8.033386
             |
 c.age#c.age |  -1.450582   .1930318    -7.51   0.000    -1.833798   -1.067365
             |
       _cons |   3.480933   .2616616    13.30   0.000     2.961469    4.000397
------------------------------------------------------------------------------

To test whether birthweights are the same for the two groups, we need to test whether the intercepts in the two regressions are the same. Using suest, we would proceed as follows:

. quietly regress weight age c.age#c.age if girl == 0, noheader

. estimates store boys

. quietly regress weight age c.age#c.age if girl == 1, noheader

. estimates store girls

. suest boys girls

Simultaneous results for boys, girls

                                                  Number of obs   =        198

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022   .4678417    17.07   0.000     7.068069    8.901975
             |
 c.age#c.age |   -1.74346   .2034352    -8.57   0.000    -2.142186   -1.344734
             |
       _cons |   3.684363   .1719028    21.43   0.000      3.34744    4.021286
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289   .1870822     2.55   0.011     .1103546    .8437032
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066   .4166916    16.82   0.000     6.191365    7.824766
             |
 c.age#c.age |  -1.450582   .1695722    -8.55   0.000    -1.782937   -1.118226
             |
       _cons |   3.480933   .1556014    22.37   0.000      3.17596    3.785906
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127   .1351769     0.07   0.943    -.2552292    .2746545
------------------------------------------------------------------------------

Invoking an estimation command with the option coeflegend will give us a legend we can use to refer to the parameters when we use postestimation commands like test.

. suest, coeflegend

Simultaneous results for boys, girls

                                                  Number of obs   =        198

------------------------------------------------------------------------------
             |      Coef.  Legend
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022  _b[boys_mean:age]
             |
 c.age#c.age |   -1.74346  _b[boys_mean:c.age#c.age]
             |
       _cons |   3.684363  _b[boys_mean:_cons]
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289  _b[boys_lnvar:_cons]
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066  _b[girls_mean:age]
             |
 c.age#c.age |  -1.450582  _b[girls_mean:c.age#c.age]
             |
       _cons |   3.480933  _b[girls_mean:_cons]
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127  _b[girls_lnvar:_cons]
------------------------------------------------------------------------------

. test  _b[boys_mean:_cons] = _b[girls_mean:_cons]

 ( 1)  [boys_mean]_cons - [girls_mean]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

We find no evidence that the intercepts are different.

Now, let’s replicate those results by using the gsem command. We generate the variable weightboy, a copy of weight for boys and missing otherwise, and the variable weightgirl, a copy of weight for girls and missing otherwise.

. quietly generate weightboy = weight if girl == 0

. quietly generate weightgirl = weight if girl == 1

. gsem (weightboy <- age c.age#c.age) (weightgirl <- age c.age#c.age), ///
>      nolog vce(robust)

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |              Robust
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  .4678417   17.07  0.000     7.068069    8.901975
                 |
     c.age#c.age |   -1.74346  .2034352   -8.57  0.000    -2.142186   -1.344734
                 |
           _cons |   3.684363  .1719028   21.43  0.000      3.34744    4.021286
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  .4166916   16.82  0.000     6.191365    7.824766
                 |
     c.age#c.age |  -1.450582  .1695722   -8.55  0.000    -1.782937   -1.118226
                 |
           _cons |   3.480933  .1556014   22.37  0.000      3.17596    3.785906
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  .3014028                    1.071012    2.280821
var(e.weightgirl)|    .978849  .1364603                    .7448187    1.286414
-------------------------------------------------------------------------------

. gsem, coeflegend

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |      Coef.  Legend
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  _b[weightboy:age]
                 |
     c.age#c.age |   -1.74346  _b[weightboy:c.age#c.age]
                 |
           _cons |   3.684363  _b[weightboy:_cons]
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  _b[weightgirl:age]
                 |
     c.age#c.age |  -1.450582  _b[weightgirl:c.age#c.age]
                 |
           _cons |   3.480933  _b[weightgirl:_cons]
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  _b[var(e.weightboy):_cons]
var(e.weightgirl)|    .978849  _b[var(e.weightgirl):_cons]
-------------------------------------------------------------------------------

. test  _b[weightgirl:_cons]=  _b[weightboy:_cons]

 ( 1)  - [weightboy]_cons + [weightgirl]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

gsem allowed us to fit models on different subsets simultaneously. By default, the model is assumed to be a linear regression, but several links and families are available; for example, you can combine two Poisson models or a multinomial logistic model with a regular logistic model. See [SEM] sem and gsem for details.

Here, I use the vce(robust) option to replicate the results for suest. However, when estimation samples don’t overlap, results from both estimations are assumed to be independent, and thus the option vce(robust) is not needed. When performing the estimation without the vce(robust) option, the joint covariance matrix will contain two blocks with the covariances from the original models and 0s outside those blocks.

 

An example with random effects

 

The childweight dataset contains repeated measures, and it is, in the documentation, analyzed used the mixed command, which allows us to account for the intra-individual correlation via random effects.

Now, let’s use the techniques described above to combine results from two random-effects models. Here are the two separate models:

. mixed weight age c.age#c.age if girl == 0 || id:, nolog

Mixed-effects ML regression                     Number of obs      =       100
Group variable: id                              Number of groups   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   1070.28
Log likelihood = -149.05479                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   8.328882   .4601093    18.10   0.000     7.427084    9.230679
             |
 c.age#c.age |  -1.859798   .1722784   -10.80   0.000    -2.197458   -1.522139
             |
       _cons |   3.525929   .2723617    12.95   0.000      2.99211    4.059749
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   .7607779   .2439115      .4058409    1.426133
-----------------------------+------------------------------------------------
               var(Residual) |   .7225673   .1236759      .5166365    1.010582
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    30.34 Prob >= chibar2 = 0.0000

. mixed weight age c.age#c.age if girl == 1 || id:, nolog

Mixed-effects ML regression                     Number of obs      =        98
Group variable: id                              Number of groups   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   2141.72
Log likelihood =  -114.3008                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.273082   .3167266    22.96   0.000     6.652309    7.893854
             |
 c.age#c.age |  -1.538309    .118958   -12.93   0.000    -1.771462   -1.305156
             |
       _cons |   3.354834   .2111793    15.89   0.000      2.94093    3.768738
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   .6925554   .1967582       .396848    1.208606
-----------------------------+------------------------------------------------
               var(Residual) |   .3034231   .0535359      .2147152    .4287799
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    47.42 Prob >= chibar2 = 0.0000

Random effects can be included in a gsem model by incorporating latent variables at the group level; these are the latent variables M1[id] and M2[id] below. By default, gsem will try to estimate a covariance when it sees two latent variables at the same level. This can be easily solved by restricting this covariance term to 0. Option vce(robust) should be used whenever we want to produce the mechanism used by suest.

. gsem (weightboy <- age c.age#c.age M1[id])   ///
>      (weightgirl <- age c.age#c.age M2[id]), ///
>      cov(M1[id]*M2[id]@0) vce(robust) nolog

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood = -263.35559

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weightgirl]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                 |              Robust
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   8.328882  .4211157   19.78  0.000      7.50351    9.154253
                 |
     c.age#c.age |  -1.859798  .1591742  -11.68  0.000    -2.171774   -1.547823
                 |
          M1[id] |          1 (constrained)
                 |
           _cons |   3.525929  .1526964   23.09  0.000      3.22665    3.825209
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.273082  .3067378   23.71  0.000     6.671887    7.874277
                 |
     c.age#c.age |  -1.538309   .120155  -12.80  0.000    -1.773808    -1.30281
                 |
          M2[id] |          1 (constrained)
                 |
           _cons |   3.354834  .1482248   22.63  0.000     3.064319     3.64535
-----------------+-------------------------------------------------------------
      var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
      var(M2[id])|   .6925553  .1850283                    .4102429    1.169144
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
var(e.weightgirl)|   .3034231  .0667975                    .1970877    .4671298
-------------------------------------------------------------------------------

Above, we have the joint output from the two models, which would allow us to perform tests among parameters in both models. Notice that option vce(robust) implies that standard errors will be clustered on the groups determined by id.

gsem, when called with the vce(robust) option, will complain if there are inconsistencies among the groups in the models (for example, if the random effects in both models were crossed).

 

Checking that you are fitting the same model

 

In the previous model, gsem‘s default covariance structure included a term that wasn’t in the original two models, so we needed to include an additional restriction. This can be easy to spot in a simple model, but if you don’t want to rely just on a visual inspection, you can write a small loop to make sure that all the estimates in the joint model are actually also in the original models.

Let’s see an example with random effects, this time with overlapping data.

. *fit first model and save the estimates
. gsem (weightboy <- age c.age#c.age M1[id]), nolog

Generalized structural equation model             Number of obs   =        100
Log likelihood = -149.05479

 ( 1)  [weightboy]M1[id] = 1
-------------------------------------------------------------------------------
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4609841   18.07   0.000     7.425369    9.232394
                |
    c.age#c.age |  -1.859798  .1725233  -10.78   0.000    -2.197938   -1.521659
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .2726322   12.93   0.000      2.99158    4.060279
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2439114                     .4058407    1.426132
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1236759                     .5166366    1.010582
-------------------------------------------------------------------------------

. mat b1 = e(b)

. *fit second model and save the estimates
. gsem (weight <- age M2[id]), nolog

Generalized structural equation model             Number of obs   =        198
Log likelihood = -348.32402

 ( 1)  [weight]M2[id] = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight <-    |
         age |   3.389281   .1152211    29.42   0.000     3.163452    3.615111
             |
      M2[id] |          1  (constrained)
             |
       _cons |   5.156913   .1803059    28.60   0.000      4.80352    5.510306
-------------+----------------------------------------------------------------
  var(M2[id])|   .6076662   .2040674                      .3146395    1.173591
-------------+----------------------------------------------------------------
var(e.weight)|   1.524052   .1866496                      1.198819    1.937518
------------------------------------------------------------------------------

. mat b2 = e(b)

. *stack estimates from first and second models
. mat stacked = b1, b2

. *estimate joint model and save results
. gsem (weightboy <- age c.age#c.age M1[id]) ///
>      (weight <- age M2[id]), cov(M1[id]*M2[id]@0) vce(robust) nolog

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood = -497.37881

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weight]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                |              Robust
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4211157   19.78   0.000      7.50351    9.154253
                |
    c.age#c.age |  -1.859798  .1591742  -11.68   0.000    -2.171774   -1.547823
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .1526964   23.09   0.000      3.22665    3.825209
----------------+--------------------------------------------------------------
weight <-       |
            age |   3.389281  .1157835   29.27   0.000      3.16235    3.616213
                |
         M2[id] |          1 (constrained)
                |
          _cons |   5.156913  .1345701   38.32   0.000      4.89316    5.420665
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
     var(M2[id])|   .6076662     .1974                     .3214791    1.148623
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
   var(e.weight)|   1.524052  .1705637                     1.223877    1.897849
-------------------------------------------------------------------------------

. mat b = e(b)

. *verify that estimates from the joint model are the same as
. *from models 1 and 2
. local stripes : colfullnames(b)

. foreach l of local stripes{
  2.    matrix  r1 =  b[1,"`l'"]
  3.    matrix r2 = stacked[1,"`l'"]
  4.    assert reldif(el(r1,1,1), el(r2,1,1))<1e-5
  5. }

The loop above verifies that all the labels in the second model correspond to estimates in the first and that the estimates are actually the same. If you omit the restriction for the variance in the joint model, the assert command will produce an error.

 

Technical note

 

As documented in [U] 20.21.2 Correlated errors: Cluster-robust standard errors, the formula for the robust estimator of the variance is

\[
V_{robust} = \hat V(\sum_{j=1}^N u'_ju_j) \hat V
\]

where \(N\) is the number of observations, \(\hat V\) is the conventional estimator of the variance, and for each observation \(j\), \(u_j\) is a row vector (with as many columns as parameters), which represents the contribution of this observation to the gradient. (If we stack the rows \(u_j\), the columns of this matrix are the scores.)

When we apply suest, the matrix \(\hat V\) is constructed as the stacked block-diagonal conventional variance estimates from the original submodels; this is the variance you will see if you apply gsem to the joint model without the vce(robust) option. The \(u_j\) values used by suest are now the values from both estimations, so we have as many \(u_j\) values as the sum of observations in the two original models and each row contains as many columns as the total number of parameters in both models. This is the exact operation that gsem, vce(robust) does.

When random effects are present, standard errors will be clustered on groups. Instead of observation-level contributions to the gradient, we would use cluster-level contributions. This means that observations in the two models would need to be clustered in a consistent manner; observations that are common to the two estimations would need to be in the same cluster in the two estimations.

Categories: Statistics Tags: , , ,

Fitting ordered probit models with endogenous covariates with Stata’s gsem command


The new command gsem allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates.

 

Parameterizations for an ordinal probit model

 
The ordinal probit model is used to model ordinal dependent variables. In the usual parameterization, we assume that there is an underlying linear regression, which relates an unobserved continuous variable \(y^*\) to the covariates \(x\).

\[y^*_{i} = x_{i}\gamma + u_i\]

The observed dependent variable \(y\) relates to \(y^*\) through a series of cut-points \(-\infty =\kappa_0<\kappa_1<\dots< \kappa_m=+\infty\) , as follows:

\[y_{i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{i} \leq \kappa_j\]

Provided that the variance of \(u_i\) can’t be identified from the observed data, it is assumed to be equal to one. However, we can consider a re-scaled parameterization for the same model; a straightforward way of seeing this, is by noting that, for any positive number \(M\):

\[\kappa_{j-1} < y^*_{i} \leq \kappa_j \iff
M\kappa_{j-1} < M y^*_{i} \leq M\kappa_j
\]

that is,

\[\kappa_{j-1} < x_i\gamma + u_i \leq \kappa_j \iff
M\kappa_{j-1}< x_i(M\gamma) + Mu_i \leq M\kappa_j
\]

In other words, if the model is identified, it can be represented by multiplying the unobserved variable \(y\) by a positive number, and this will mean that the standard error of the residual component, the coefficients, and the cut-points will be multiplied by this number.

Let me show you an example; I will first fit a standard ordinal probit model, both with oprobit and with gsem. Then, I will use gsem to fit an ordinal probit model where the residual term for the underlying linear regression has a standard deviation equal to 2. I will do this by introducing a latent variable \(L\), with variance 1, and coefficient \(\sqrt 3\). This will be added to the underlying latent residual, with variance 1; then, the ‘new’ residual term will have variance equal to \(1+((\sqrt 3)^2\times Var(L))= 4\), so the standard deviation will be 2. We will see that as a result, the coefficients, as well as the cut-points, will be multiplied by 2.

. sysuse auto, clear
(1978 Automobile Data)

. oprobit rep mpg disp , nolog

Ordered probit regression                         Number of obs   =         69
                                                  LR chi2(2)      =      14.68
                                                  Prob > chi2     =     0.0006
Log likelihood = -86.352646                       Pseudo R2       =     0.0783

------------------------------------------------------------------------------
       rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
-------------+----------------------------------------------------------------
       /cut1 |  -1.570496   1.146391                      -3.81738    .6763888
       /cut2 |  -.7295982   1.122361                     -2.929386     1.47019
       /cut3 |   .6580529   1.107838                     -1.513269    2.829375
       /cut4 |    1.60884   1.117905                     -.5822132    3.799892
------------------------------------------------------------------------------

. gsem (rep <- mpg disp, oprobit), nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.352646

--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
  displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -1.570496   1.146391    -1.37   0.171     -3.81738    .6763888
         /cut2 |  -.7295982   1.122361    -0.65   0.516    -2.929386     1.47019
         /cut3 |   .6580529   1.107838     0.59   0.553    -1.513269    2.829375
         /cut4 |    1.60884   1.117905     1.44   0.150    -.5822132    3.799892
--------------------------------------------------------------------------------

. local a = sqrt(3)

. gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.353008

 ( 1)  [rep78]L = 1.732051
 ( 2)  [var(L)]_cons = 1
--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |    .099532     .07113     1.40   0.162    -.0398802    .2389442
  displacement |  -.0059739   .0043002    -1.39   0.165    -.0144022    .0024544
             L |   1.732051  (constrained)
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -3.138491   2.293613    -1.37   0.171     -7.63389    1.356907
         /cut2 |  -1.456712   2.245565    -0.65   0.517    -5.857938    2.944513
         /cut3 |   1.318568    2.21653     0.59   0.552     -3.02575    5.662887
         /cut4 |   3.220004   2.236599     1.44   0.150     -1.16365    7.603657
---------------+----------------------------------------------------------------
         var(L)|          1  (constrained)
--------------------------------------------------------------------------------

 

Ordinal probit model with endogenous covariates

 
This model is defined analogously to the model fitted by -ivprobit- for probit models with endogenous covariates; we assume an underlying model with two equations,

\[
\begin{eqnarray}
y^*_{1i} =& y_{2i} \beta + x_{1i} \gamma + u_i & \\
y_{2i} =& x_{1i} \pi_1 + x_{2i} \pi_2 + v_i & \,\,\,\,\,\, (1)
\end{eqnarray}
\]

where \(u_i \sim N(0, 1) \), \(v_i\sim N(0,s^2) \), and \(corr(u_i, v_i) = \rho\).

We don’t observe \(y^*_{1i}\); instead, we observe a discrete variable \(y_{1i}\), such as, for a set of cut-points (to be estimated) \(\kappa_0 = -\infty < \kappa_1 < \kappa_2 \dots < \kappa_m = +\infty \),

\[y_{1i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{1i} \leq \kappa_j \]

 

The parameterization we will use

 
I will re-scale the first equation, preserving the correlation. That is, I will consider the following system:

\[
\begin{eqnarray}
z^*_{1i} =&
y_{2i}b +x_{1i}c + t_i + \alpha L_i &\\
y_{2i} = &x_{1i}\pi_1 + x_{2i}\pi_2 + w_i + \alpha L_i & \,\,\,\,\,\, (2)
\end{eqnarray}
\]

where \(t_i, w_i, L_i\) are independent, \(t_i \sim N(0, 1)\) , \(w_i \sim N(0,\sigma^2)\), \(L_i \sim N(0, 1)\)

\[y_{1i} = j {\mbox{ if }} \lambda_{j-1} < z^*_{1i} \leq \lambda_j \]

By introducing a latent variable in both equations, I am modeling a correlation between the error terms. The fist equation is a re-scaled version of the original equation, that is, \(z^*_1 = My^*_1\),

\[ y_{2i}b +x_{1i}c + t_i + \alpha_i L_i
= M(y_{2i}\beta) +M x_{1i}\gamma + M u_i \]

This implies that
\[M u_i = t_i + \alpha_i L_i, \]
where \(Var(u_i) = 1\) and \(Var(t_i + \alpha L_i) = 1 + \alpha^2\), so the scale is \(M = \sqrt{1+\alpha^2} \).

The second equation remains the same, we just express \(v_i\) as \(w_i + \alpha L_i\). Now, after estimating the system (2), we can recover the parameters in (1) as follows:

\[\beta = \frac{1}{\sqrt{1+ \alpha^2}} b\]
\[\gamma = \frac{1}{\sqrt{1+ \alpha^2}} c\]
\[\kappa_j = \frac{1}{\sqrt{1+ \alpha^2}} \lambda_j \]

\[V(v_i) = V(w_i + \alpha L_i) =V(w_i) + \alpha^2\].

\[\rho = Cov(t_i + \alpha L_i, w_i + \alpha L_i) =
\frac{\alpha^2}{(\sqrt{1+\alpha^2}\sqrt{V(w_i)+\alpha^2)}}\]

Note: This parameterization assumes that the correlation is positive; for negative values of the correlation, \(L\) should be included in the second equation with a negative sign (that is, L@(-a) instead of L@a). When trying to perform the estimation with the wrong sign, the model most likely won’t achieve convergence. Otherwise, you will see a coefficient for L that is virtually zero. In Stata 13.1 we have included features that allow you to fit the model without this restriction. However, this time we will use the older parameterization, which will allow you to visualize the different components more easily.

 

Simulating data, and performing the estimation

 

clear
set seed 1357
set obs 10000
forvalues i = 1(1)5 {
    gen x`i' =2* rnormal() + _n/1000
}

mat C = [1,.5 \ .5, 1]
drawnorm z1 z2, cov(C)

gen y2 = 0
forvalues i = 1(1)5 {
    replace y2 = y2 + x`i'
}
replace y2 = y2 + z2

gen y1star = y2 + x1 + x2 + z1
gen xb1 = y2 + x1 + x2

gen y1 = 4
replace y1 = 3 if xb1 + z1 <=.8
replace y1 = 2 if xb1 + z1 <=.3
replace y1 = 1 if xb1 + z1 <=-.3
replace y1 = 0 if xb1 + z1 <=-.8

gsem (y1 <- y2 x1 x2 L@a, oprobit) (y2 <- x1 x2 x3 x4 x5 L@a), var(L@1)

local y1 y1
local y2 y2

local xaux  x1 x2 x3 x4 x5
local xmain  y2 x1 x2

local s2 sqrt(1+_b[`y1':L]^2)
foreach v in `xmain'{
    local trans `trans' (`y1'_`v': _b[`y1':`v']/`s2')
}

foreach v in `xaux' _cons {
    local trans `trans' (`y2'_`v': _b[`y2':`v'])
}

qui tab `y1' if e(sample)
local ncuts = r(r)-1
forvalues i = 1(1) `ncuts'{
    local trans `trans' (cut_`i': _b[`y1'_cut`i':_cons]/`s2')
}

local s1 sqrt(  _b[var(e.`y2'):_cons]  +_b[`y1':L]^2)

local trans `trans' (sig_2: `s1')
local trans `trans' (rho_12: _b[`y1':L]^2/(`s1'*`s2'))
nlcom `trans'

 

Results

 
This is the output from gsem:

Generalized structural equation model             Number of obs   =      10000
Log likelihood = -14451.117

 ( 1)  [y1]L - [y2]L = 0
 ( 2)  [var(L)]_cons = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 <-        |
          y2 |   1.379511   .0775028    17.80   0.000     1.227608    1.531414
          x1 |   1.355687   .0851558    15.92   0.000     1.188785    1.522589
          x2 |   1.346323   .0833242    16.16   0.000      1.18301    1.509635
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
-------------+----------------------------------------------------------------
y2 <-        |
          x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
          x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
          x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
          x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
          x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
       _cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
-------------+----------------------------------------------------------------
y1           |
       /cut1 |  -1.131155   .1157771    -9.77   0.000    -1.358074   -.9042358
       /cut2 |  -.5330973   .1079414    -4.94   0.000    -.7446585    -.321536
       /cut3 |   .2722794   .1061315     2.57   0.010     .0642654    .4802933
       /cut4 |     .89394   .1123013     7.96   0.000     .6738334    1.114047
-------------+----------------------------------------------------------------
       var(L)|          1  (constrained)
-------------+----------------------------------------------------------------
    var(e.y2)|   .3823751    .074215                      .2613848    .5593696
------------------------------------------------------------------------------

These are the results we obtain when we transform the values reported by gsem to the original parameterization:

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       y1_y2 |   1.088455   .0608501    17.89   0.000     .9691909    1.207719
       y1_x1 |   1.069657   .0642069    16.66   0.000      .943814    1.195501
       y1_x2 |   1.062269   .0619939    17.14   0.000      .940763    1.183774
       y2_x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
       y2_x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
       y2_x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
       y2_x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
       y2_x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
    y2__cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
       cut_1 |   -.892498   .0895971    -9.96   0.000    -1.068105   -.7168909
       cut_2 |  -.4206217   .0841852    -5.00   0.000    -.5856218   -.2556217
       cut_3 |   .2148325   .0843737     2.55   0.011     .0494632    .3802018
       cut_4 |    .705332   .0905974     7.79   0.000     .5277644    .8828997
       sig_2 |   .9943267    .007031   141.42   0.000     .9805462    1.008107
      rho_12 |   .4811176   .0477552    10.07   0.000     .3875191     .574716
------------------------------------------------------------------------------

The estimates are quite close to the values used for the simulation. If you try to perform the estimation with the wrong sign for the coefficient for L, you will get a number that is virtually zero (if you get convergence at all). In this case, the evaluator is telling us that the best value it can find, provided the restrictions we have imposed, is zero. If you see such results, you may want to try the opposite sign. If both give a zero coefficient, it means that this is the solution, and there is not endogeneity at all. If one of them is not zero, it means that the non-zero value is the solution. As stated before, in Stata 13.1, the model can be fitted without this restriction.

Using Stata’s SEM features to model the Beck Depression Inventory

I just got back from the 2012 Stata Conference in San Diego where I gave a talk on Psychometric Analysis Using Stata and from the 2012 American Psychological Association Meeting in Orlando. Stata’s structural equation modeling (SEM) builder was popular at both meetings and I wanted to show you how easy it is to use. If you are not familiar with the basics of SEM, please refer to the references at the end of the post. My goal is simply to show you how to use the SEM builder assuming that you already know something about SEM. If you would like to view a video demonstration of the SEM builder, please click the play button below:

The data used here and for the silly examples in my talk were simulated to resemble one of the most commonly used measures of depression: the Beck Depression Inventory (BDI). If you find these data too silly or not relevant to your own research, you could instead imagine it being a set of questions to measure mathematical ability, the ability to use a statistical package, or whatever you wanted.

The Beck Depression Inventory

Originally published by Aaron Beck and colleagues in 1961, the BDI marked an important change in the conceptualization of depression from a psychoanalytic perspective to a cognitive/behavioral perspective. It was also a landmark in the measurement of depression shifting from lengthy, expensive interviews with a psychiatrist to a brief, inexpensive questionnaire that could be scored and quantified. The original inventory consisted of 21 questions each allowing ordinal responses of increasing symptom severity from 0-3. The sum of the responses could then be used to classify a respondent’s depressive symptoms as none, mild, moderate or severe. Many studies have demonstrated that the BDI has good psychometric properties such as high test-retest reliability and the scores correlate well with the assessments of psychiatrists and psychologists. The 21 questions can also be grouped into two subscales. The affective scale includes questions like “I feel sad” and “I feel like a failure” that quantify emotional symptoms of depression. The somatic or physical scale includes questions like “I have lost my appetite” and “I have trouble sleeping” that quantify physical symptoms of depression. Since its original publication, the BDI has undergone two revisions in response to the American Psychiatric Association’s (APA) Diagnostic and Statistical Manuals (DSM) and the BDI-II remains very popular.

The Stata Depression Inventory

Since the BDI is a copyrighted psychometric instrument, I created a fictitious instrument called the “Stata Depression Inventory”. It consists of 20 questions each beginning with the phrase “My statistical software makes me…”. The individual questions are listed in the variable labels below.

. describe qu1-qu20

variable  storage  display    value
 name       type   format     label      variable label
------------------------------------------------------------------------------
qu1         byte   %16.0g     response   ...feel sad
qu2         byte   %16.0g     response   ...feel pessimistic about the future
qu3         byte   %16.0g     response   ...feel like a failure
qu4         byte   %16.0g     response   ...feel dissatisfied
qu5         byte   %16.0g     response   ...feel guilty or unworthy
qu6         byte   %16.0g     response   ...feel that I am being punished
qu7         byte   %16.0g     response   ...feel disappointed in myself
qu8         byte   %16.0g     response   ...feel am very critical of myself
qu9         byte   %16.0g     response   ...feel like harming myself
qu10        byte   %16.0g     response   ...feel like crying more than usual
qu11        byte   %16.0g     response   ...become annoyed or irritated easily
qu12        byte   %16.0g     response   ...have lost interest in other people
qu13        byte   %16.0g     qu13_t1    ...have trouble making decisions
qu14        byte   %16.0g     qu14_t1    ...feel unattractive
qu15        byte   %16.0g     qu15_t1    ...feel like not working
qu16        byte   %16.0g     qu16_t1    ...have trouble sleeping
qu17        byte   %16.0g     qu17_t1    ...feel tired or fatigued
qu18        byte   %16.0g     qu18_t1    ...makes my appetite lower than usual
qu19        byte   %16.0g     qu19_t1    ...concerned about my health
qu20        byte   %16.0g     qu20_t1    ...experience decreased libido

The responses consist of a 5-point Likert scale ranging from 1 (Strongly Disagree) to 5 (Strongly Agree). Questions 1-10 form the affective scale of the inventory and questions 11-20 form the physical scale. Data were simulated for 1000 imaginary people and included demographic variables such as age, sex and race. The responses can be summarized succinctly in a matrix of bar graphs:

Classical statistical analysis

The beginning of a classical statistical analysis of these data might consist of summing the responses for questions 1-10 and referring to them as the “Affective Depression Score” and summing questions 11-20 and referring to them as the “Physical Depression Score”.

egen Affective = rowtotal(qu1-qu10)
label var Affective "Affective Depression Score"
egen physical = rowtotal(qu11-qu20)
label var physical "Physical Depression Score"

We could be more sophisticated and use principal components to create the affective and physical depression score:

pca qu1-qu20, components(2)
predict Affective Physical
label var Affective "Affective Depression Score"
label var Physical "Physical Depression Score"

We could then ask questions such as “Are there differences in affective and physical depression scores by sex?” and test these hypotheses using multivariate statistics such as Hotelling’s T-squared statistic. The problem with this analysis strategy is that it treats the depression scores as though they were measured without error and can lead to inaccurate p-values for our test statistics.

Structural equation modeling

Structural equation modeling (SEM) is an ideal way to analyze data where the outcome of interest is a scale or scales derived from a set of measured variables. The affective and physical scores are treated as latent variables in the model resulting in accurate p-values and, best of all….these models are very easy to fit using Stata! We begin by selecting the SEM builder from the Statistics menu:

In the SEM builder, we can select the “Add Measurement Component” icon:

which will open the following dialog box:

In the box labeled “Latent Variable Name” we can type “Affective” (red arrow below) and we can select the variables qu1-qu10 in the “Measured variables” box (blue arrow below).

When we click “OK”, the affective measurement component appears in the builder:

We can repeat this process to create a measurement component for our physical depression scale (images not shown). We can also allow for covariance/correlation between our affective and physical depression scales using the “Add Covariance” icon on the toolbar (red arrow below).

I’ll omit the intermediate steps to build the full model shown below but it’s easy to use the “Add Observed Variable” and “Add Path” icons to create the full model:

Now we’re ready to estimate the parameters for our model. To do this, we click the “Estimate” icon on the toolbar (duh!):

And the flowing dialog box appears:

Let’s ignore the estimation options for now and use the default settings. Click “OK” and the parameter estimates will appear in the diagram:

Some of the parameter estimates are difficult to read in this form but it is easy to rearrange the placement and formatting of the estimates to make them easier to read.

If we look at Stata’s output window and scroll up, you’ll notice that the SEM Builder automatically generated the command for our model:

sem (Affective -> qu1) (Affective -> qu2) (Affective -> qu3)
    (Affective -> qu4) (Affective -> qu5) (Affective -> qu6)
    (Affective -> qu7) (Affective -> qu8) (Affective -> qu9)
    (Affective -> qu10) (Physical -> qu11) (Physical -> qu12)
    (Physical -> qu13) (Physical -> qu14) (Physical -> qu15)
    (Physical -> qu16) (Physical -> qu17) (Physical -> qu18)
    (Physical -> qu19) (Physical -> qu20) (sex -> Affective)
    (sex -> Physical), latent(Affective Physical) cov(e.Physical*e.Affective)

We can gather terms and abbreviate some things to make the command much easier to read:

sem (Affective -> qu1-qu10) ///
    (Physical -> qu11-qu20) /// 
    (sex -> Affective Physical) ///
    , latent(Affective Physical ) ///
    cov( e.Physical*e.Affective)

We could then calculate a Wald statistic to test the null hypothesis that there is no association between sex and our affective and physical depression scales.

test sex

 ( 1)  [Affective]sex = 0
 ( 2)  [Physical]sex = 0

           chi2(  2) =    2.51
         Prob > chi2 =    0.2854

Final thoughts
This is an admittedly oversimplified example – we haven’t considered the fit of the model or considered any alternative models. We have only included one dichotomous independent variable. We might prefer to use a likelihood ratio test or a score test. Those are all very important issues and should not be ignored in a proper data analysis. But my goal was to demonstrate how easy it is to use Stata’s SEM builder to model data such as those arising from the Beck Depression Inventory. Incidentally, if these data were collected using a complex survey design, it would not be difficult to incorporate the sampling structure and sample weights into the analysis. Missing data can be handled easily as well using Full Information Maximum Likelihood (FIML) but those are topics for another day.

If you would like view the slides from my talk, download the data used in this example or view a video demonstration of Stata’s SEM builder using these data, please use the links below. For the dataset, you can also type use followed by the URL for the data to load it directly into Stata.

Slides:
http://stata.com/meeting/sandiego12/materials/sd12_huber.pdf

Data:
http://stata.com/meeting/sandiego12/materials/Huber_2012SanDiego.dta

YouTube video demonstration:
http://www.youtube.com/watch?v=Xj0gBlqwYHI

References

Beck AT, Ward CH, Mendelson M, Mock J, Erbaugh J (June 1961). An inventory for measuring depression. Arch. Gen. Psychiatry 4 (6): 561–71.

Beck AT, Ward C, Mendelson M (1961). Beck Depression Inventory (BDI). Arch Gen Psychiatry 4 (6): 561–571

Beck AT, Steer RA, Ball R, Ranieri W (December 1996). Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. Journal of Personality Assessment 67 (3): 588–97
Bollen, KA. (1989). Structural Equations With Latent Variables. New York, NY: John Wiley and Sons

Kline, RB (2011). Principles and Practice of Structural Equation Modeling. New York, NY: Guilford Press

Raykov, T & Marcoulides, GA (2006). A First Course in Structural Equation Modeling. Mahwah, NJ: Lawrence Erlbaum

Schumacker, RE & Lomax, RG (2012) A Beginner’s Guide to Structural Equation Modeling, 3rd Ed. New York, NY: Routledge

Categories: Statistics Tags: , ,

Multilevel random effects in xtmixed and sem — the long and wide of it

xtmixed was built from the ground up for dealing with multilevel random effects — that is its raison d’être. sem was built for multivariate outcomes, for handling latent variables, and for estimating structural equations (also called simultaneous systems or models with endogeneity). Can sem also handle multilevel random effects (REs)? Do we care?

This would be a short entry if either answer were “no”, so let’s get after the first question.

Can sem handle multilevel REs?

A good place to start is to simulate some multilevel RE data. Let’s create data for the 3-level regression model

y_ijk = {beta}x_ijk + mu_k + nu_jk + epsilon_ijk

where the classical multilevel regression assumption holds that mu_k nu_jk and epsilon_ijk are distributed i.i.d. normal and are uncorrelated.

This represents a model of i nested within j nested within k. An example would be students nested within schools nested within counties. We have random intercepts at the 2nd and 3rd levels — mu_k, nu_jk. Because these are random effects, we need estimate only the variance of mu_k, nu_jk, and epsilon_ijk.

For our simulated data, let’s assume there are 3 groups at the 3rd level, 2 groups at the 2nd level within each 3rd level group, and 2 individuals within each 2nd level group. Or, K=3, J=2, and I=2. Having only 3 groups at the 3rd level is silly. It gives us only 3 observations to estimate the variance of mu_k. But with only 3*2*2 observations, we will be able to easily see our entire dataset, and the concepts scale to any number of 3rd-level groups.

First, create our 3rd-level random effects — mu_k.

. set obs 3
. gen k = _n
. gen Uk = rnormal()

tabular{01111}{000}{k Uk 1 mu_1 2 mu_2 3 mu_3}

There are only 3 mu_k in our dataset.

I am showing the effects symbolically in the table rather than showing numeric values. It is the pattern of unique effects that will become interesting, not their actual values.

Now, create our 2nd-level random effects — nu_jk — by doubling this data and creating 2nd-level effects.

. expand 2
. by k, sort: gen j = _n
. gen Vjk = rnormal()

tabular{01010101}{00000}{ k Uk j Vjk 1 mu_1 1 nu_1 1 mu_1 2 nu_2 2 mu_2 1 nu_3  2 mu_2 2 nu_4  3 mu_3 1 nu_5  3 mu_3 2 nu_6  }

We have 6 unique values of our 2nd-level effects and the same 3 unique values of our 3rd-level effects. Our original 3rd-level effects just appear twice each.

Now, create our 1st-level random effects — epsilon_ijk — which we typically just call errors.

. expand 2
. by k j, sort: gen i = _n
. gen Eijk = rnormal()

tabular{01010101010101}{0000000}{ k Uk j Vjk i Eijk 1 mu_1 1 nu_1 1 epsilon_1 1 mu_1 1 nu_1 2 epsilon_2 1 mu_1 2 nu_2 1 epsilon_3 1 mu_1 2 nu_2 2 epsilon_4 2 mu_2 1 nu_3 1 epsilon_5 2 mu_2 1 nu_3 2 epsilon_6 2 mu_2 2 nu_4 1 epsilon_7 2 mu_2 2 nu_4 2 epsilon_8 3 mu_3 1 nu_5 1 epsilon_9 3 mu_3 1 nu_5 2 epsilon_10 3 mu_3 2 nu_6 1 epsilon_11 3 mu_3 2 nu_6 2 epsilon_12 }

There are still only 3 unique mu_k in our dataset, and only 6 unique nu_jk.

Finally, we create our regression data, using beta = 2,

. gen xijk = runiform()
. gen yijk = 2 * xijk + Uk + Vjk + Eijk

tabular{01010101010101}{000000000}{ k Uk j Vjk i Eijk xijk yijk 1 mu_1 1 nu_1 1 epsilon_1 x_1 y_1 1 mu_1 1 nu_1 2 epsilon_2 x_2 y_2 1 mu_1 2 nu_2 1 epsilon_3 x_3 y_3 1 mu_1 2 nu_2 2 epsilon_4 x_4 y_4 2 mu_2 1 nu_3 1 epsilon_5 x_5 y_5 2 mu_2 1 nu_3 2 epsilon_6 x_6 y_6 2 mu_2 2 nu_4 1 epsilon_7 x_7 y_7 2 mu_2 2 nu_4 2 epsilon_8 x_8 y_8 3 mu_3 1 nu_5 1 epsilon_9 x_9 y_9 3 mu_3 1 nu_5 2 epsilon_10 x_10 y_10 3 mu_3 2 nu_6 1 epsilon_11 x_11 y_11 3 mu_3 2 nu_6 2 epsilon_12 x_12 y_12 }

We could estimate our multilevel RE model on this data by typing,

. xtmixed yijk xijk || k: || j:

xtmixed uses the index variables k and j to deeply understand the multilevel structure of the our data. sem has no such understanding of multilevel data. What it does have is an understanding of multivariate data and a comfortable willingness to apply constraints.

Let’s restructure our data so that sem can be made to understand its multilevel structure.

First some renaming so that the results of our restructuring will be easier to interpret.

. rename Uk U
. rename Vjk V
. rename Eijk E
. rename xijk x
. rename yijk y

We reshape to turn our multilevel data into multivariate data that sem has a chance of understanding. First, we reshape wide on our 2nd-level identifier j. Before that, we egen to create a unique identifier for each observation of the two groups identified by j.

. egen ik = group(i k)
. reshape wide y x E V, i(ik) j(j)

tabular{01010101}{000100010000}{ k U i V1 E1 x1 y1 V2 E2 x2 y2 1 mu_1 1 nu_1 epsilon_1  x_1  y_1  nu_2 epsilon_3  x_3  y_3 1 mu_1 2 nu_1 epsilon_2  x_2  y_2  nu_2 epsilon_4  x_4  y_4 2 mu_2 1 nu_3 epsilon_5  x_5  y_5  nu_4 epsilon_7  x_7  y_7 2 mu_2 2 nu_3 epsilon_6  x_6  y_6  nu_4 epsilon_8  x_8  y_8 3 mu_3 1 nu_5 epsilon_9  x_9  y_9  nu_6 epsilon_11 x_11 y_11 3 mu_3 2 nu_5 epsilon_10 x_10 y_10 nu_6 epsilon_12 x_12 y_12 }

We now have a y variable for each group in j (y1 and y2). Likewise, we have two x variables, two residuals, and most importantly two 2nd-level random effects V1 and V2. This is the same data, we have merely created a set of variables for every level of j. We have gone from multilevel to multivariate.
We still have a multilevel component. There are still two levels of i in our dataset. We must reshape wide again to remove any remnant of multilevel structure.

. drop ik
. reshape wide y* x* E*, i(k) j(i)

tabular{01111}{00101001001001001}{ k U V1 V2 E11 x11 y11 E12 x12 y12 E11 x11 y11 E12 x12 y12 1 mu_1 nu_1 nu_2 epsilon_1  x_1  y_1  epsilon_2  x_2  y_2 epsilon_3  x_2  y_2 epsilon_4  x_4  y_4 2 mu_2 nu_3 nu_4 epsilon_5  x_5  y_5  epsilon_6  x_6  y_6 epsilon_7  x_7  y_7 epsilon_8  x_8  y_8 3 mu_3 nu_5 nu_6 epsilon_9  x_9  y_9  epsilon_10 x_10 y_10 epsilon_11 x_11 y_11  epsilon_12 x_12 y_12 }

I admit that is a microscopic font, but it is the structure that is important, not the values. We now have 4 y’s, one for each combination of 2nd- and 3rd-level identifiers — i and j. Likewise for the x’s and E’s.

We can think of each xji yji pair of columns as representing a regression for a specific combination of j and i — y11 on x11, y12 on x12, y21 on x21, and y22 on x22. Or, more explicitly,

y11 = {beta}x11 + mu + nu_1 + epsilon_11
y12 = {beta}x11 + mu + nu_1 + epsilon_12

y21 = {beta}x11 + mu + nu_2 + epsilon_21
y22 = {beta}x11 + mu + nu_2 + epsilon_22

So, rather than a univariate multilevel regression with 4 nested observation sets, (J=2) * (I=2), we now have 4 regressions which are all related through mu and each of two pairs are related through nu_j. Oh, and all share the same coefficient beta. Oh, and the epsilon_jk all have identical variances. Oh, and the nu_j also have identical variances. Luckily both the sem command and the SEM Builder (the GUI for sem) make setting constraints easy.

There is one other thing we haven’t addressed. xtmixed understands random effects. Does sem? Random effects are just unobserved (latent) variables and sem clearly understands those. So, yes, sem does understand random effects.

Many SEMers would represent this model in a path diagram by drawing.

There is a lot of information in that diagram. Each regression is represented by one of the x boxes being connected by a path to a y box. That each of the four paths is labeled with B means that we have constrained the regressions to have the same coefficient. The y21 and y22 boxes also receive input from the random latent variable V2 (representing our 2nd-level random effects). The other two y boxes receive input from V1 (also our 2nd-level random effects). For this to match how xtmixed handles random effects, V1 and V2 must be constrained to have the same variance. This was done in the path diagram by “locking” them to have the same variance — S_v. To match xtmixed, each of the four residuals must also have the same variance — shown in the diagram as S_e. The residuals and random effect variables also have their paths constrained to 1. That is to say, they do not have coefficients.

We do not need any of the U, V, or E variables. We kept these only to make clear how the multilevel data was restructured to multivariate data. We might “follow the money” in a criminal investigation, but with simulated multilevel data is is best to “follow the effects”. Seeing how these effects were distributed in our reshaped data made it clear how they entered our multivariate model.

Just to prove that this all works, here are the results from a simulated dataset (K=100 rather than the 3 that we have been using). The xtmixed results are,

. xtmixed yijk xijk || k: || j: , mle var

  (log omitted)

Mixed-effects ML regression                     Number of obs      =       400

-----------------------------------------------------------
                |   No. of       Observations per Group
 Group Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
              k |      100          4        4.0          4
              j |      200          2        2.0          2
-----------------------------------------------------------

                                                Wald chi2(1)       =     61.84
Log likelihood = -768.96733                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
        yijk |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        xijk |   1.792529   .2279392     7.86   0.000     1.345776    2.239282
       _cons |    .460124   .2242677     2.05   0.040     .0205673    .8996807
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
k: Identity                  |
                  var(_cons) |   2.469012   .5386108      1.610034    3.786268
-----------------------------+------------------------------------------------
j: Identity                  |
                  var(_cons) |   1.858889    .332251      1.309522    2.638725
-----------------------------+------------------------------------------------
               var(Residual) |   .9140237   .0915914      .7510369    1.112381
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(2) =   259.16   Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

The sem results are,

sem (y11 <- x11@bx _cons@c V1@1 U@1)
    (y12 <- x12@bx _cons@c V1@1 U@1)
    (y21 <- x21@bx _cons@c V2@1 U@1)
    (y22 <- x22@bx _cons@c V2@1 U@1) ,
        covstruct(_lexog, diagonal) cov(_lexog*_oexog@0)  
        cov( V1@S_v V2@S_v  e.y11@S_e e.y12@S_e e.y21@S_e e.y22@S_e)
  
  (notes omitted)

Endogenous variables

Observed:  y11 y12 y21 y22

Exogenous variables

Observed:  x11 x12 x21 x22
Latent:    V1 U V2
  
  (iteration log omitted)

Structural equation model                       Number of obs      =       100
Estimation method  = ml
Log likelihood     = -826.63615
  
  (constraint listing omitted)
------------------------------------------------------------------------------
             |                 OIM             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Structural   |  
  y11 <-     |
         x11 |   1.792529   .2356323     7.61   0.000     1.330698     2.25436
          V1 |          1   7.68e-17  1.3e+16   0.000            1           1
           U |          1   2.22e-18  4.5e+17   0.000            1           1
       _cons |    .460124    .226404     2.03   0.042     .0163802    .9038677
  -----------+----------------------------------------------------------------
  y12 <-     |
         x12 |   1.792529   .2356323     7.61   0.000     1.330698     2.25436
          V1 |          1   2.00e-22  5.0e+21   0.000            1           1
           U |          1   5.03e-17  2.0e+16   0.000            1           1
       _cons |    .460124    .226404     2.03   0.042     .0163802    .9038677
  -----------+----------------------------------------------------------------
  y21 <-     |
         x21 |   1.792529   .2356323     7.61   0.000     1.330698     2.25436
           U |          1   5.70e-46  1.8e+45   0.000            1           1
          V2 |          1   5.06e-45  2.0e+44   0.000            1           1
       _cons |    .460124    .226404     2.03   0.042     .0163802    .9038677
  -----------+----------------------------------------------------------------
  y22 <-     |
         x22 |   1.792529   .2356323     7.61   0.000     1.330698     2.25436
           U |          1  (constrained)
          V2 |          1  (constrained) 
       _cons |    .460124    .226404     2.03   0.042     .0163802    .9038677
-------------+----------------------------------------------------------------
Variance     |
       e.y11 |   .9140239    .091602                        .75102    1.112407
       e.y12 |   .9140239    .091602                        .75102    1.112407
       e.y21 |   .9140239    .091602                        .75102    1.112407
       e.y22 |   .9140239    .091602                        .75102    1.112407
          V1 |   1.858889   .3323379                      1.309402    2.638967
           U |   2.469011   .5386202                      1.610021    3.786296
          V2 |   1.858889   .3323379                      1.309402    2.638967
-------------+----------------------------------------------------------------
Covariance   |
  x11        |
          V1 |          0  (constrained)
           U |          0  (constrained)
          V2 |          0  (constrained)
  -----------+----------------------------------------------------------------
  x12        |
          V1 |          0  (constrained)
           U |          0  (constrained)
          V2 |          0  (constrained)
  -----------+----------------------------------------------------------------
  x21        |
          V1 |          0  (constrained)
           U |          0  (constrained)
          V2 |          0  (constrained)
  -----------+----------------------------------------------------------------
  x22        |
          V1 |          0  (constrained)
           U |          0  (constrained)
          V2 |          0  (constrained)
  -----------+----------------------------------------------------------------
  V1         |
           U |          0  (constrained)
          V2 |          0  (constrained)
  -----------+----------------------------------------------------------------
  U          |
          V2 |          0  (constrained)
------------------------------------------------------------------------------
LR test of model vs. saturated: chi2(25)  =     22.43, Prob > chi2 = 0.6110

And here is the path diagram after estimation.

The standard errors of the two estimation methods are asymptotically equivalent, but will differ in finite samples.

Sidenote: Those familiar with multilevel modeling will be wondering if sem can handle unbalanced data. That is to say a different number of observations or subgroups within groups. It can. Simply let reshape create missing values where it will and then add the method(mlmv) option to your sem command. mlmv stands for maximum likelihood with missing values. And, as strange as it may seem, with this option the multivariate sem representation and the multilevel xtmixed representations are the same.

Do we care?

You will have noticed that the sem command was, well, it was really long. (I wrote a little loop to get all the constraints right.) You will also have noticed that there is a lot of redundant output because our SEM model has so many constraints. Why would anyone go to all this trouble to do something that is so simple with xtmixed? The answer lies in all of those constraints. With sem we can relax any of those constraints we wish!

Relax the constraint that the V# have the same variance and you can introduce heteroskedasticity in the 2nd-level effects. That seems a little silly when there are only two levels, but imagine there were 10 levels.

Add a covariance between the V# and you introduce correlation between the groups in the 3rd level.

What’s more, the pattern of heteroskedasticity and correlation can be arbitrary. Here is our path diagram redrawn to represent children within schools within counties and increasing the number of groups in the 2nd level.

We have 5 counties at the 3rd level and two schools within each county at the 2nd level — for a total of 10 dimensions in our multivariate regression. The diagram does not change based on the number of children drawn from each school.

Our regression coefficients have been organized horizontally down the center of the diagram to allow room along the left and right for the random effects. Taken as a multilevel model, we have only a single covariate — x. Just to be clear, we could generalize this to multiple covariates by adding more boxes with covariates for each dependent variable in the diagram.

The labels are chosen carefully. The 3rd-level effects N1, N2, and N3 are for northern counties, and the remaining second level effects S1 and S2 are for southern counties. There is a separate dependent variable and associated error for each school. We have 4 public schools (pub1 pub2, pub3, and pub4); three private schools (prv1 prv2, and prv3); and 3 church-sponsored schools (chr1 chr2, and chr3).

The multivariate structure seen in the diagram makes it clear that we can relax some constraints that the multilevel model imposes. Because the sem representation of the model breaks the 2nd level effect into an effect for each county, we can apply a structure to the 2nd level effect. Consider the path diagram below.

We have correlated the effects for the 3 northern counties. We did this by drawing curved lines between the effects. We have also correlated the effects of the two southern counties. xtmixed does not allow these types of correlations. Had we wished, we could have constrained the correlations of the 3 northern counties to be the same.

We could also have allowed the northern and southern counties to have different variances. We did just that in the diagram below by constraining the northern counties variances to be N and the southern counties variances to be S.

In this diagram we have also correlated the errors for the 4 public schools. As drawn, each correlation is free to take on its own values, but we could just as easily constrain each public school to be equally correlated with all other public schools. Likewise, to keep the diagram readable, we did not correlate the private schools with each other or the church schools with each other. We could have done that.

There is one thing that xtmixed can do that sem cannot. It can put a structure on the residual correlations within the 2nd level groups. xtmixed has a special option, residuals(), for just this purpose.

With xtmixed and sem you get,

  • robust and cluster-robust SEs
  • survey data

With sem you also get

  • endogenous covariates
  • estimation by GMM
  • missing data — MAR (also called missing on observables)
  • heteroskedastic effects at any level
  • correlated effects at any level
  • easy score tests using estat scoretests
    • are the beta coefficients truly are the same across all equations/levels, whether effects?
    • are effects or sets of effects uncorrelated?
    • are effects within a grouping homoskedastic?

Whether you view this rethinking of multilevel random-effects models as multivariate structural equation models (SEMs) as interesting, or merely an academic exercise, depends on whether your model calls for any of the items in the second list.

Stata 12 Announced

We are pleased to announce a new version of Stata: Stata 12. You can order it today, it starts shipping on July 25, and you can find out about it at www.stata.com/stata12/.

Here are the highlights of what’s new:

There are other things that are new, too. Things like functions for Tukey’s Studentized range and Dunnett’s multiple range, baseline odds for logistic regression, truncated count-data regressions, probability predictions, robust and cluster-robust SEs for fixed-effects Poisson regression, and the like under General Statistics. Or under Survey Data, support for SEM, bootstrap and successive difference replicate (SDR) weights, goodness of fit after binary models, coefficient of variation, and more. Or under Panel Data, probability predictions, multiple imputation support, and more. Or under Survival Data, a goodness-of-fit statistic that is robust to censoring. Or PDF export of results and graphs.

We could go on, but you get the idea. We think Stata 12 is worth a look.