Archive

Posts Tagged ‘gsem’

Using gsem to combine estimation results


gsem is a very flexible command that allows us to fit very sophisticated models. However, it is also useful in situations that involve simple models.

For example, when we want to compare parameters among two or more models, we usually use suest, which combines the estimation results under one parameter vector and creates a simultaneous covariance matrix of the robust type. This covariance estimate is described in the Methods and formulas of [R] suest as the robust variance from a “stacked model”. Actually, gsem can estimate these kinds of “stacked models”, even if the estimation samples are not the same and eventually overlap. By using the option vce(robust), we can replicate the results from suest if the models are available for gsem. In addition, gsem allows us to combine results from some estimation commands that are not supported by suest, like models including random effects.

 

Example: Comparing parameters from two models

 

Let’s consider the childweight dataset, described in [ME] mixed. Consider the following models, where weights of boys and girls are modeled using the age and the age-squared:

. webuse childweight, clear
(Weight data on Asian children)

. regress  weight age c.age#c.age if girl == 0, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.985022   .6343855    12.59   0.000     6.725942    9.244101
             |
 c.age#c.age |   -1.74346   .2374504    -7.34   0.000    -2.214733   -1.272187
             |
       _cons |   3.684363   .3217223    11.45   0.000     3.045833    4.322893
------------------------------------------------------------------------------

. regress  weight age c.age#c.age if girl == 1, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.008066   .5164687    13.57   0.000     5.982746    8.033386
             |
 c.age#c.age |  -1.450582   .1930318    -7.51   0.000    -1.833798   -1.067365
             |
       _cons |   3.480933   .2616616    13.30   0.000     2.961469    4.000397
------------------------------------------------------------------------------

To test whether birthweights are the same for the two groups, we need to test whether the intercepts in the two regressions are the same. Using suest, we would proceed as follows:

. quietly regress weight age c.age#c.age if girl == 0, noheader

. estimates store boys

. quietly regress weight age c.age#c.age if girl == 1, noheader

. estimates store girls

. suest boys girls

Simultaneous results for boys, girls

                                                  Number of obs   =        198

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022   .4678417    17.07   0.000     7.068069    8.901975
             |
 c.age#c.age |   -1.74346   .2034352    -8.57   0.000    -2.142186   -1.344734
             |
       _cons |   3.684363   .1719028    21.43   0.000      3.34744    4.021286
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289   .1870822     2.55   0.011     .1103546    .8437032
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066   .4166916    16.82   0.000     6.191365    7.824766
             |
 c.age#c.age |  -1.450582   .1695722    -8.55   0.000    -1.782937   -1.118226
             |
       _cons |   3.480933   .1556014    22.37   0.000      3.17596    3.785906
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127   .1351769     0.07   0.943    -.2552292    .2746545
------------------------------------------------------------------------------

Invoking an estimation command with the option coeflegend will give us a legend we can use to refer to the parameters when we use postestimation commands like test.

. suest, coeflegend

Simultaneous results for boys, girls

                                                  Number of obs   =        198

------------------------------------------------------------------------------
             |      Coef.  Legend
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022  _b[boys_mean:age]
             |
 c.age#c.age |   -1.74346  _b[boys_mean:c.age#c.age]
             |
       _cons |   3.684363  _b[boys_mean:_cons]
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289  _b[boys_lnvar:_cons]
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066  _b[girls_mean:age]
             |
 c.age#c.age |  -1.450582  _b[girls_mean:c.age#c.age]
             |
       _cons |   3.480933  _b[girls_mean:_cons]
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127  _b[girls_lnvar:_cons]
------------------------------------------------------------------------------

. test  _b[boys_mean:_cons] = _b[girls_mean:_cons]

 ( 1)  [boys_mean]_cons - [girls_mean]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

We find no evidence that the intercepts are different.

Now, let’s replicate those results by using the gsem command. We generate the variable weightboy, a copy of weight for boys and missing otherwise, and the variable weightgirl, a copy of weight for girls and missing otherwise.

. quietly generate weightboy = weight if girl == 0

. quietly generate weightgirl = weight if girl == 1

. gsem (weightboy <- age c.age#c.age) (weightgirl <- age c.age#c.age), ///
>      nolog vce(robust)

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |              Robust
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  .4678417   17.07  0.000     7.068069    8.901975
                 |
     c.age#c.age |   -1.74346  .2034352   -8.57  0.000    -2.142186   -1.344734
                 |
           _cons |   3.684363  .1719028   21.43  0.000      3.34744    4.021286
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  .4166916   16.82  0.000     6.191365    7.824766
                 |
     c.age#c.age |  -1.450582  .1695722   -8.55  0.000    -1.782937   -1.118226
                 |
           _cons |   3.480933  .1556014   22.37  0.000      3.17596    3.785906
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  .3014028                    1.071012    2.280821
var(e.weightgirl)|    .978849  .1364603                    .7448187    1.286414
-------------------------------------------------------------------------------

. gsem, coeflegend

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |      Coef.  Legend
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  _b[weightboy:age]
                 |
     c.age#c.age |   -1.74346  _b[weightboy:c.age#c.age]
                 |
           _cons |   3.684363  _b[weightboy:_cons]
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  _b[weightgirl:age]
                 |
     c.age#c.age |  -1.450582  _b[weightgirl:c.age#c.age]
                 |
           _cons |   3.480933  _b[weightgirl:_cons]
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  _b[var(e.weightboy):_cons]
var(e.weightgirl)|    .978849  _b[var(e.weightgirl):_cons]
-------------------------------------------------------------------------------

. test  _b[weightgirl:_cons]=  _b[weightboy:_cons]

 ( 1)  - [weightboy]_cons + [weightgirl]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

gsem allowed us to fit models on different subsets simultaneously. By default, the model is assumed to be a linear regression, but several links and families are available; for example, you can combine two Poisson models or a multinomial logistic model with a regular logistic model. See [SEM] sem and gsem for details.

Here, I use the vce(robust) option to replicate the results for suest. However, when estimation samples don’t overlap, results from both estimations are assumed to be independent, and thus the option vce(robust) is not needed. When performing the estimation without the vce(robust) option, the joint covariance matrix will contain two blocks with the covariances from the original models and 0s outside those blocks.

 

An example with random effects

 

The childweight dataset contains repeated measures, and it is, in the documentation, analyzed used the mixed command, which allows us to account for the intra-individual correlation via random effects.

Now, let’s use the techniques described above to combine results from two random-effects models. Here are the two separate models:

. mixed weight age c.age#c.age if girl == 0 || id:, nolog

Mixed-effects ML regression                     Number of obs      =       100
Group variable: id                              Number of groups   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   1070.28
Log likelihood = -149.05479                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   8.328882   .4601093    18.10   0.000     7.427084    9.230679
             |
 c.age#c.age |  -1.859798   .1722784   -10.80   0.000    -2.197458   -1.522139
             |
       _cons |   3.525929   .2723617    12.95   0.000      2.99211    4.059749
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   .7607779   .2439115      .4058409    1.426133
-----------------------------+------------------------------------------------
               var(Residual) |   .7225673   .1236759      .5166365    1.010582
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    30.34 Prob >= chibar2 = 0.0000

. mixed weight age c.age#c.age if girl == 1 || id:, nolog

Mixed-effects ML regression                     Number of obs      =        98
Group variable: id                              Number of groups   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   2141.72
Log likelihood =  -114.3008                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.273082   .3167266    22.96   0.000     6.652309    7.893854
             |
 c.age#c.age |  -1.538309    .118958   -12.93   0.000    -1.771462   -1.305156
             |
       _cons |   3.354834   .2111793    15.89   0.000      2.94093    3.768738
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   .6925554   .1967582       .396848    1.208606
-----------------------------+------------------------------------------------
               var(Residual) |   .3034231   .0535359      .2147152    .4287799
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    47.42 Prob >= chibar2 = 0.0000

Random effects can be included in a gsem model by incorporating latent variables at the group level; these are the latent variables M1[id] and M2[id] below. By default, gsem will try to estimate a covariance when it sees two latent variables at the same level. This can be easily solved by restricting this covariance term to 0. Option vce(robust) should be used whenever we want to produce the mechanism used by suest.

. gsem (weightboy <- age c.age#c.age M1[id])   ///
>      (weightgirl <- age c.age#c.age M2[id]), ///
>      cov(M1[id]*M2[id]@0) vce(robust) nolog

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood = -263.35559

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weightgirl]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                 |              Robust
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   8.328882  .4211157   19.78  0.000      7.50351    9.154253
                 |
     c.age#c.age |  -1.859798  .1591742  -11.68  0.000    -2.171774   -1.547823
                 |
          M1[id] |          1 (constrained)
                 |
           _cons |   3.525929  .1526964   23.09  0.000      3.22665    3.825209
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.273082  .3067378   23.71  0.000     6.671887    7.874277
                 |
     c.age#c.age |  -1.538309   .120155  -12.80  0.000    -1.773808    -1.30281
                 |
          M2[id] |          1 (constrained)
                 |
           _cons |   3.354834  .1482248   22.63  0.000     3.064319     3.64535
-----------------+-------------------------------------------------------------
      var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
      var(M2[id])|   .6925553  .1850283                    .4102429    1.169144
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
var(e.weightgirl)|   .3034231  .0667975                    .1970877    .4671298
-------------------------------------------------------------------------------

Above, we have the joint output from the two models, which would allow us to perform tests among parameters in both models. Notice that option vce(robust) implies that standard errors will be clustered on the groups determined by id.

gsem, when called with the vce(robust) option, will complain if there are inconsistencies among the groups in the models (for example, if the random effects in both models were crossed).

 

Checking that you are fitting the same model

 

In the previous model, gsem‘s default covariance structure included a term that wasn’t in the original two models, so we needed to include an additional restriction. This can be easy to spot in a simple model, but if you don’t want to rely just on a visual inspection, you can write a small loop to make sure that all the estimates in the joint model are actually also in the original models.

Let’s see an example with random effects, this time with overlapping data.

. *fit first model and save the estimates
. gsem (weightboy <- age c.age#c.age M1[id]), nolog

Generalized structural equation model             Number of obs   =        100
Log likelihood = -149.05479

 ( 1)  [weightboy]M1[id] = 1
-------------------------------------------------------------------------------
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4609841   18.07   0.000     7.425369    9.232394
                |
    c.age#c.age |  -1.859798  .1725233  -10.78   0.000    -2.197938   -1.521659
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .2726322   12.93   0.000      2.99158    4.060279
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2439114                     .4058407    1.426132
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1236759                     .5166366    1.010582
-------------------------------------------------------------------------------

. mat b1 = e(b)

. *fit second model and save the estimates
. gsem (weight <- age M2[id]), nolog

Generalized structural equation model             Number of obs   =        198
Log likelihood = -348.32402

 ( 1)  [weight]M2[id] = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight <-    |
         age |   3.389281   .1152211    29.42   0.000     3.163452    3.615111
             |
      M2[id] |          1  (constrained)
             |
       _cons |   5.156913   .1803059    28.60   0.000      4.80352    5.510306
-------------+----------------------------------------------------------------
  var(M2[id])|   .6076662   .2040674                      .3146395    1.173591
-------------+----------------------------------------------------------------
var(e.weight)|   1.524052   .1866496                      1.198819    1.937518
------------------------------------------------------------------------------

. mat b2 = e(b)

. *stack estimates from first and second models
. mat stacked = b1, b2

. *estimate joint model and save results
. gsem (weightboy <- age c.age#c.age M1[id]) ///
>      (weight <- age M2[id]), cov(M1[id]*M2[id]@0) vce(robust) nolog

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood = -497.37881

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weight]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                |              Robust
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4211157   19.78   0.000      7.50351    9.154253
                |
    c.age#c.age |  -1.859798  .1591742  -11.68   0.000    -2.171774   -1.547823
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .1526964   23.09   0.000      3.22665    3.825209
----------------+--------------------------------------------------------------
weight <-       |
            age |   3.389281  .1157835   29.27   0.000      3.16235    3.616213
                |
         M2[id] |          1 (constrained)
                |
          _cons |   5.156913  .1345701   38.32   0.000      4.89316    5.420665
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
     var(M2[id])|   .6076662     .1974                     .3214791    1.148623
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
   var(e.weight)|   1.524052  .1705637                     1.223877    1.897849
-------------------------------------------------------------------------------

. mat b = e(b)

. *verify that estimates from the joint model are the same as
. *from models 1 and 2
. local stripes : colfullnames(b)

. foreach l of local stripes{
  2.    matrix  r1 =  b[1,"`l'"]
  3.    matrix r2 = stacked[1,"`l'"]
  4.    assert reldif(el(r1,1,1), el(r2,1,1))<1e-5
  5. }

The loop above verifies that all the labels in the second model correspond to estimates in the first and that the estimates are actually the same. If you omit the restriction for the variance in the joint model, the assert command will produce an error.

 

Technical note

 

As documented in [U] 20.21.2 Correlated errors: Cluster-robust standard errors, the formula for the robust estimator of the variance is

\[
V_{robust} = \hat V(\sum_{j=1}^N u'_ju_j) \hat V
\]

where \(N\) is the number of observations, \(\hat V\) is the conventional estimator of the variance, and for each observation \(j\), \(u_j\) is a row vector (with as many columns as parameters), which represents the contribution of this observation to the gradient. (If we stack the rows \(u_j\), the columns of this matrix are the scores.)

When we apply suest, the matrix \(\hat V\) is constructed as the stacked block-diagonal conventional variance estimates from the original submodels; this is the variance you will see if you apply gsem to the joint model without the vce(robust) option. The \(u_j\) values used by suest are now the values from both estimations, so we have as many \(u_j\) values as the sum of observations in the two original models and each row contains as many columns as the total number of parameters in both models. This is the exact operation that gsem, vce(robust) does.

When random effects are present, standard errors will be clustered on groups. Instead of observation-level contributions to the gradient, we would use cluster-level contributions. This means that observations in the two models would need to be clustered in a consistent manner; observations that are common to the two estimations would need to be in the same cluster in the two estimations.

Categories: Statistics Tags: , , ,

Fitting ordered probit models with endogenous covariates with Stata’s gsem command


The new command gsem allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates.

 

Parameterizations for an ordinal probit model

 
The ordinal probit model is used to model ordinal dependent variables. In the usual parameterization, we assume that there is an underlying linear regression, which relates an unobserved continuous variable \(y^*\) to the covariates \(x\).

\[y^*_{i} = x_{i}\gamma + u_i\]

The observed dependent variable \(y\) relates to \(y^*\) through a series of cut-points \(-\infty =\kappa_0<\kappa_1<\dots< \kappa_m=+\infty\) , as follows:

\[y_{i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{i} \leq \kappa_j\]

Provided that the variance of \(u_i\) can’t be identified from the observed data, it is assumed to be equal to one. However, we can consider a re-scaled parameterization for the same model; a straightforward way of seeing this, is by noting that, for any positive number \(M\):

\[\kappa_{j-1} < y^*_{i} \leq \kappa_j \iff
M\kappa_{j-1} < M y^*_{i} \leq M\kappa_j
\]

that is,

\[\kappa_{j-1} < x_i\gamma + u_i \leq \kappa_j \iff
M\kappa_{j-1}< x_i(M\gamma) + Mu_i \leq M\kappa_j
\]

In other words, if the model is identified, it can be represented by multiplying the unobserved variable \(y\) by a positive number, and this will mean that the standard error of the residual component, the coefficients, and the cut-points will be multiplied by this number.

Let me show you an example; I will first fit a standard ordinal probit model, both with oprobit and with gsem. Then, I will use gsem to fit an ordinal probit model where the residual term for the underlying linear regression has a standard deviation equal to 2. I will do this by introducing a latent variable \(L\), with variance 1, and coefficient \(\sqrt 3\). This will be added to the underlying latent residual, with variance 1; then, the ‘new’ residual term will have variance equal to \(1+((\sqrt 3)^2\times Var(L))= 4\), so the standard deviation will be 2. We will see that as a result, the coefficients, as well as the cut-points, will be multiplied by 2.

. sysuse auto, clear
(1978 Automobile Data)

. oprobit rep mpg disp , nolog

Ordered probit regression                         Number of obs   =         69
                                                  LR chi2(2)      =      14.68
                                                  Prob > chi2     =     0.0006
Log likelihood = -86.352646                       Pseudo R2       =     0.0783

------------------------------------------------------------------------------
       rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
-------------+----------------------------------------------------------------
       /cut1 |  -1.570496   1.146391                      -3.81738    .6763888
       /cut2 |  -.7295982   1.122361                     -2.929386     1.47019
       /cut3 |   .6580529   1.107838                     -1.513269    2.829375
       /cut4 |    1.60884   1.117905                     -.5822132    3.799892
------------------------------------------------------------------------------

. gsem (rep <- mpg disp, oprobit), nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.352646

--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
  displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -1.570496   1.146391    -1.37   0.171     -3.81738    .6763888
         /cut2 |  -.7295982   1.122361    -0.65   0.516    -2.929386     1.47019
         /cut3 |   .6580529   1.107838     0.59   0.553    -1.513269    2.829375
         /cut4 |    1.60884   1.117905     1.44   0.150    -.5822132    3.799892
--------------------------------------------------------------------------------

. local a = sqrt(3)

. gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.353008

 ( 1)  [rep78]L = 1.732051
 ( 2)  [var(L)]_cons = 1
--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |    .099532     .07113     1.40   0.162    -.0398802    .2389442
  displacement |  -.0059739   .0043002    -1.39   0.165    -.0144022    .0024544
             L |   1.732051  (constrained)
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -3.138491   2.293613    -1.37   0.171     -7.63389    1.356907
         /cut2 |  -1.456712   2.245565    -0.65   0.517    -5.857938    2.944513
         /cut3 |   1.318568    2.21653     0.59   0.552     -3.02575    5.662887
         /cut4 |   3.220004   2.236599     1.44   0.150     -1.16365    7.603657
---------------+----------------------------------------------------------------
         var(L)|          1  (constrained)
--------------------------------------------------------------------------------

 

Ordinal probit model with endogenous covariates

 
This model is defined analogously to the model fitted by -ivprobit- for probit models with endogenous covariates; we assume an underlying model with two equations,

\[
\begin{eqnarray}
y^*_{1i} =& y_{2i} \beta + x_{1i} \gamma + u_i & \\
y_{2i} =& x_{1i} \pi_1 + x_{2i} \pi_2 + v_i & \,\,\,\,\,\, (1)
\end{eqnarray}
\]

where \(u_i \sim N(0, 1) \), \(v_i\sim N(0,s^2) \), and \(corr(u_i, v_i) = \rho\).

We don’t observe \(y^*_{1i}\); instead, we observe a discrete variable \(y_{1i}\), such as, for a set of cut-points (to be estimated) \(\kappa_0 = -\infty < \kappa_1 < \kappa_2 \dots < \kappa_m = +\infty \),

\[y_{1i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{1i} \leq \kappa_j \]

 

The parameterization we will use

 
I will re-scale the first equation, preserving the correlation. That is, I will consider the following system:

\[
\begin{eqnarray}
z^*_{1i} =&
y_{2i}b +x_{1i}c + t_i + \alpha L_i &\\
y_{2i} = &x_{1i}\pi_1 + x_{2i}\pi_2 + w_i + \alpha L_i & \,\,\,\,\,\, (2)
\end{eqnarray}
\]

where \(t_i, w_i, L_i\) are independent, \(t_i \sim N(0, 1)\) , \(w_i \sim N(0,\sigma^2)\), \(L_i \sim N(0, 1)\)

\[y_{1i} = j {\mbox{ if }} \lambda_{j-1} < z^*_{1i} \leq \lambda_j \]

By introducing a latent variable in both equations, I am modeling a correlation between the error terms. The fist equation is a re-scaled version of the original equation, that is, \(z^*_1 = My^*_1\),

\[ y_{2i}b +x_{1i}c + t_i + \alpha_i L_i
= M(y_{2i}\beta) +M x_{1i}\gamma + M u_i \]

This implies that
\[M u_i = t_i + \alpha_i L_i, \]
where \(Var(u_i) = 1\) and \(Var(t_i + \alpha L_i) = 1 + \alpha^2\), so the scale is \(M = \sqrt{1+\alpha^2} \).

The second equation remains the same, we just express \(v_i\) as \(w_i + \alpha L_i\). Now, after estimating the system (2), we can recover the parameters in (1) as follows:

\[\beta = \frac{1}{\sqrt{1+ \alpha^2}} b\]
\[\gamma = \frac{1}{\sqrt{1+ \alpha^2}} c\]
\[\kappa_j = \frac{1}{\sqrt{1+ \alpha^2}} \lambda_j \]

\[V(v_i) = V(w_i + \alpha L_i) =V(w_i) + \alpha^2\].

\[\rho = Cov(t_i + \alpha L_i, w_i + \alpha L_i) =
\frac{\alpha^2}{(\sqrt{1+\alpha^2}\sqrt{V(w_i)+\alpha^2)}}\]

Note: This parameterization assumes that the correlation is positive; for negative values of the correlation, \(L\) should be included in the second equation with a negative sign (that is, L@(-a) instead of L@a). When trying to perform the estimation with the wrong sign, the model most likely won’t achieve convergence. Otherwise, you will see a coefficient for L that is virtually zero. In Stata 13.1 we have included features that allow you to fit the model without this restriction. However, this time we will use the older parameterization, which will allow you to visualize the different components more easily.

 

Simulating data, and performing the estimation

 

clear
set seed 1357
set obs 10000
forvalues i = 1(1)5 {
    gen x`i' =2* rnormal() + _n/1000
}

mat C = [1,.5 \ .5, 1]
drawnorm z1 z2, cov(C)

gen y2 = 0
forvalues i = 1(1)5 {
    replace y2 = y2 + x`i'
}
replace y2 = y2 + z2

gen y1star = y2 + x1 + x2 + z1
gen xb1 = y2 + x1 + x2

gen y1 = 4
replace y1 = 3 if xb1 + z1 <=.8
replace y1 = 2 if xb1 + z1 <=.3
replace y1 = 1 if xb1 + z1 <=-.3
replace y1 = 0 if xb1 + z1 <=-.8

gsem (y1 <- y2 x1 x2 L@a, oprobit) (y2 <- x1 x2 x3 x4 x5 L@a), var(L@1)

local y1 y1
local y2 y2

local xaux  x1 x2 x3 x4 x5
local xmain  y2 x1 x2

local s2 sqrt(1+_b[`y1':L]^2)
foreach v in `xmain'{
    local trans `trans' (`y1'_`v': _b[`y1':`v']/`s2')
}

foreach v in `xaux' _cons {
    local trans `trans' (`y2'_`v': _b[`y2':`v'])
}

qui tab `y1' if e(sample)
local ncuts = r(r)-1
forvalues i = 1(1) `ncuts'{
    local trans `trans' (cut_`i': _b[`y1'_cut`i':_cons]/`s2')
}

local s1 sqrt(  _b[var(e.`y2'):_cons]  +_b[`y1':L]^2)

local trans `trans' (sig_2: `s1')
local trans `trans' (rho_12: _b[`y1':L]^2/(`s1'*`s2'))
nlcom `trans'

 

Results

 
This is the output from gsem:

Generalized structural equation model             Number of obs   =      10000
Log likelihood = -14451.117

 ( 1)  [y1]L - [y2]L = 0
 ( 2)  [var(L)]_cons = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 <-        |
          y2 |   1.379511   .0775028    17.80   0.000     1.227608    1.531414
          x1 |   1.355687   .0851558    15.92   0.000     1.188785    1.522589
          x2 |   1.346323   .0833242    16.16   0.000      1.18301    1.509635
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
-------------+----------------------------------------------------------------
y2 <-        |
          x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
          x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
          x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
          x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
          x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
       _cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
-------------+----------------------------------------------------------------
y1           |
       /cut1 |  -1.131155   .1157771    -9.77   0.000    -1.358074   -.9042358
       /cut2 |  -.5330973   .1079414    -4.94   0.000    -.7446585    -.321536
       /cut3 |   .2722794   .1061315     2.57   0.010     .0642654    .4802933
       /cut4 |     .89394   .1123013     7.96   0.000     .6738334    1.114047
-------------+----------------------------------------------------------------
       var(L)|          1  (constrained)
-------------+----------------------------------------------------------------
    var(e.y2)|   .3823751    .074215                      .2613848    .5593696
------------------------------------------------------------------------------

These are the results we obtain when we transform the values reported by gsem to the original parameterization:

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       y1_y2 |   1.088455   .0608501    17.89   0.000     .9691909    1.207719
       y1_x1 |   1.069657   .0642069    16.66   0.000      .943814    1.195501
       y1_x2 |   1.062269   .0619939    17.14   0.000      .940763    1.183774
       y2_x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
       y2_x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
       y2_x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
       y2_x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
       y2_x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
    y2__cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
       cut_1 |   -.892498   .0895971    -9.96   0.000    -1.068105   -.7168909
       cut_2 |  -.4206217   .0841852    -5.00   0.000    -.5856218   -.2556217
       cut_3 |   .2148325   .0843737     2.55   0.011     .0494632    .3802018
       cut_4 |    .705332   .0905974     7.79   0.000     .5277644    .8828997
       sig_2 |   .9943267    .007031   141.42   0.000     .9805462    1.008107
      rho_12 |   .4811176   .0477552    10.07   0.000     .3875191     .574716
------------------------------------------------------------------------------

The estimates are quite close to the values used for the simulation. If you try to perform the estimation with the wrong sign for the coefficient for L, you will get a number that is virtually zero (if you get convergence at all). In this case, the evaluator is telling us that the best value it can find, provided the restrictions we have imposed, is zero. If you see such results, you may want to try the opposite sign. If both give a zero coefficient, it means that this is the solution, and there is not endogeneity at all. If one of them is not zero, it means that the non-zero value is the solution. As stated before, in Stata 13.1, the model can be fitted without this restriction.