Archive

Author Archive

Using gsem to combine estimation results


gsem is a very flexible command that allows us to fit very sophisticated models. However, it is also useful in situations that involve simple models.

For example, when we want to compare parameters among two or more models, we usually use suest, which combines the estimation results under one parameter vector and creates a simultaneous covariance matrix of the robust type. This covariance estimate is described in the Methods and formulas of [R] suest as the robust variance from a “stacked model”. Actually, gsem can estimate these kinds of “stacked models”, even if the estimation samples are not the same and eventually overlap. By using the option vce(robust), we can replicate the results from suest if the models are available for gsem. In addition, gsem allows us to combine results from some estimation commands that are not supported by suest, like models including random effects.

 

Example: Comparing parameters from two models

 

Let’s consider the childweight dataset, described in [ME] mixed. Consider the following models, where weights of boys and girls are modeled using the age and the age-squared:

. webuse childweight, clear
(Weight data on Asian children)

. regress  weight age c.age#c.age if girl == 0, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.985022   .6343855    12.59   0.000     6.725942    9.244101
             |
 c.age#c.age |   -1.74346   .2374504    -7.34   0.000    -2.214733   -1.272187
             |
       _cons |   3.684363   .3217223    11.45   0.000     3.045833    4.322893
------------------------------------------------------------------------------

. regress  weight age c.age#c.age if girl == 1, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.008066   .5164687    13.57   0.000     5.982746    8.033386
             |
 c.age#c.age |  -1.450582   .1930318    -7.51   0.000    -1.833798   -1.067365
             |
       _cons |   3.480933   .2616616    13.30   0.000     2.961469    4.000397
------------------------------------------------------------------------------

To test whether birthweights are the same for the two groups, we need to test whether the intercepts in the two regressions are the same. Using suest, we would proceed as follows:

. quietly regress weight age c.age#c.age if girl == 0, noheader

. estimates store boys

. quietly regress weight age c.age#c.age if girl == 1, noheader

. estimates store girls

. suest boys girls

Simultaneous results for boys, girls

                                                  Number of obs   =        198

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022   .4678417    17.07   0.000     7.068069    8.901975
             |
 c.age#c.age |   -1.74346   .2034352    -8.57   0.000    -2.142186   -1.344734
             |
       _cons |   3.684363   .1719028    21.43   0.000      3.34744    4.021286
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289   .1870822     2.55   0.011     .1103546    .8437032
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066   .4166916    16.82   0.000     6.191365    7.824766
             |
 c.age#c.age |  -1.450582   .1695722    -8.55   0.000    -1.782937   -1.118226
             |
       _cons |   3.480933   .1556014    22.37   0.000      3.17596    3.785906
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127   .1351769     0.07   0.943    -.2552292    .2746545
------------------------------------------------------------------------------

Invoking an estimation command with the option coeflegend will give us a legend we can use to refer to the parameters when we use postestimation commands like test.

. suest, coeflegend

Simultaneous results for boys, girls

                                                  Number of obs   =        198

------------------------------------------------------------------------------
             |      Coef.  Legend
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022  _b[boys_mean:age]
             |
 c.age#c.age |   -1.74346  _b[boys_mean:c.age#c.age]
             |
       _cons |   3.684363  _b[boys_mean:_cons]
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289  _b[boys_lnvar:_cons]
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066  _b[girls_mean:age]
             |
 c.age#c.age |  -1.450582  _b[girls_mean:c.age#c.age]
             |
       _cons |   3.480933  _b[girls_mean:_cons]
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127  _b[girls_lnvar:_cons]
------------------------------------------------------------------------------

. test  _b[boys_mean:_cons] = _b[girls_mean:_cons]

 ( 1)  [boys_mean]_cons - [girls_mean]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

We find no evidence that the intercepts are different.

Now, let’s replicate those results by using the gsem command. We generate the variable weightboy, a copy of weight for boys and missing otherwise, and the variable weightgirl, a copy of weight for girls and missing otherwise.

. quietly generate weightboy = weight if girl == 0

. quietly generate weightgirl = weight if girl == 1

. gsem (weightboy <- age c.age#c.age) (weightgirl <- age c.age#c.age), ///
>      nolog vce(robust)

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |              Robust
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  .4678417   17.07  0.000     7.068069    8.901975
                 |
     c.age#c.age |   -1.74346  .2034352   -8.57  0.000    -2.142186   -1.344734
                 |
           _cons |   3.684363  .1719028   21.43  0.000      3.34744    4.021286
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  .4166916   16.82  0.000     6.191365    7.824766
                 |
     c.age#c.age |  -1.450582  .1695722   -8.55  0.000    -1.782937   -1.118226
                 |
           _cons |   3.480933  .1556014   22.37  0.000      3.17596    3.785906
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  .3014028                    1.071012    2.280821
var(e.weightgirl)|    .978849  .1364603                    .7448187    1.286414
-------------------------------------------------------------------------------

. gsem, coeflegend

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |      Coef.  Legend
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  _b[weightboy:age]
                 |
     c.age#c.age |   -1.74346  _b[weightboy:c.age#c.age]
                 |
           _cons |   3.684363  _b[weightboy:_cons]
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  _b[weightgirl:age]
                 |
     c.age#c.age |  -1.450582  _b[weightgirl:c.age#c.age]
                 |
           _cons |   3.480933  _b[weightgirl:_cons]
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  _b[var(e.weightboy):_cons]
var(e.weightgirl)|    .978849  _b[var(e.weightgirl):_cons]
-------------------------------------------------------------------------------

. test  _b[weightgirl:_cons]=  _b[weightboy:_cons]

 ( 1)  - [weightboy]_cons + [weightgirl]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

gsem allowed us to fit models on different subsets simultaneously. By default, the model is assumed to be a linear regression, but several links and families are available; for example, you can combine two Poisson models or a multinomial logistic model with a regular logistic model. See [SEM] sem and gsem for details.

Here, I use the vce(robust) option to replicate the results for suest. However, when estimation samples don’t overlap, results from both estimations are assumed to be independent, and thus the option vce(robust) is not needed. When performing the estimation without the vce(robust) option, the joint covariance matrix will contain two blocks with the covariances from the original models and 0s outside those blocks.

 

An example with random effects

 

The childweight dataset contains repeated measures, and it is, in the documentation, analyzed used the mixed command, which allows us to account for the intra-individual correlation via random effects.

Now, let’s use the techniques described above to combine results from two random-effects models. Here are the two separate models:

. mixed weight age c.age#c.age if girl == 0 || id:, nolog

Mixed-effects ML regression                     Number of obs      =       100
Group variable: id                              Number of groups   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   1070.28
Log likelihood = -149.05479                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   8.328882   .4601093    18.10   0.000     7.427084    9.230679
             |
 c.age#c.age |  -1.859798   .1722784   -10.80   0.000    -2.197458   -1.522139
             |
       _cons |   3.525929   .2723617    12.95   0.000      2.99211    4.059749
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   .7607779   .2439115      .4058409    1.426133
-----------------------------+------------------------------------------------
               var(Residual) |   .7225673   .1236759      .5166365    1.010582
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    30.34 Prob >= chibar2 = 0.0000

. mixed weight age c.age#c.age if girl == 1 || id:, nolog

Mixed-effects ML regression                     Number of obs      =        98
Group variable: id                              Number of groups   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   2141.72
Log likelihood =  -114.3008                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.273082   .3167266    22.96   0.000     6.652309    7.893854
             |
 c.age#c.age |  -1.538309    .118958   -12.93   0.000    -1.771462   -1.305156
             |
       _cons |   3.354834   .2111793    15.89   0.000      2.94093    3.768738
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   .6925554   .1967582       .396848    1.208606
-----------------------------+------------------------------------------------
               var(Residual) |   .3034231   .0535359      .2147152    .4287799
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =    47.42 Prob >= chibar2 = 0.0000

Random effects can be included in a gsem model by incorporating latent variables at the group level; these are the latent variables M1[id] and M2[id] below. By default, gsem will try to estimate a covariance when it sees two latent variables at the same level. This can be easily solved by restricting this covariance term to 0. Option vce(robust) should be used whenever we want to produce the mechanism used by suest.

. gsem (weightboy <- age c.age#c.age M1[id])   ///
>      (weightgirl <- age c.age#c.age M2[id]), ///
>      cov(M1[id]*M2[id]@0) vce(robust) nolog

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood = -263.35559

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weightgirl]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                 |              Robust
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   8.328882  .4211157   19.78  0.000      7.50351    9.154253
                 |
     c.age#c.age |  -1.859798  .1591742  -11.68  0.000    -2.171774   -1.547823
                 |
          M1[id] |          1 (constrained)
                 |
           _cons |   3.525929  .1526964   23.09  0.000      3.22665    3.825209
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.273082  .3067378   23.71  0.000     6.671887    7.874277
                 |
     c.age#c.age |  -1.538309   .120155  -12.80  0.000    -1.773808    -1.30281
                 |
          M2[id] |          1 (constrained)
                 |
           _cons |   3.354834  .1482248   22.63  0.000     3.064319     3.64535
-----------------+-------------------------------------------------------------
      var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
      var(M2[id])|   .6925553  .1850283                    .4102429    1.169144
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
var(e.weightgirl)|   .3034231  .0667975                    .1970877    .4671298
-------------------------------------------------------------------------------

Above, we have the joint output from the two models, which would allow us to perform tests among parameters in both models. Notice that option vce(robust) implies that standard errors will be clustered on the groups determined by id.

gsem, when called with the vce(robust) option, will complain if there are inconsistencies among the groups in the models (for example, if the random effects in both models were crossed).

 

Checking that you are fitting the same model

 

In the previous model, gsem‘s default covariance structure included a term that wasn’t in the original two models, so we needed to include an additional restriction. This can be easy to spot in a simple model, but if you don’t want to rely just on a visual inspection, you can write a small loop to make sure that all the estimates in the joint model are actually also in the original models.

Let’s see an example with random effects, this time with overlapping data.

. *fit first model and save the estimates
. gsem (weightboy <- age c.age#c.age M1[id]), nolog

Generalized structural equation model             Number of obs   =        100
Log likelihood = -149.05479

 ( 1)  [weightboy]M1[id] = 1
-------------------------------------------------------------------------------
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4609841   18.07   0.000     7.425369    9.232394
                |
    c.age#c.age |  -1.859798  .1725233  -10.78   0.000    -2.197938   -1.521659
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .2726322   12.93   0.000      2.99158    4.060279
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2439114                     .4058407    1.426132
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1236759                     .5166366    1.010582
-------------------------------------------------------------------------------

. mat b1 = e(b)

. *fit second model and save the estimates
. gsem (weight <- age M2[id]), nolog

Generalized structural equation model             Number of obs   =        198
Log likelihood = -348.32402

 ( 1)  [weight]M2[id] = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight <-    |
         age |   3.389281   .1152211    29.42   0.000     3.163452    3.615111
             |
      M2[id] |          1  (constrained)
             |
       _cons |   5.156913   .1803059    28.60   0.000      4.80352    5.510306
-------------+----------------------------------------------------------------
  var(M2[id])|   .6076662   .2040674                      .3146395    1.173591
-------------+----------------------------------------------------------------
var(e.weight)|   1.524052   .1866496                      1.198819    1.937518
------------------------------------------------------------------------------

. mat b2 = e(b)

. *stack estimates from first and second models
. mat stacked = b1, b2

. *estimate joint model and save results
. gsem (weightboy <- age c.age#c.age M1[id]) ///
>      (weight <- age M2[id]), cov(M1[id]*M2[id]@0) vce(robust) nolog

Generalized structural equation model             Number of obs   =        198
Log pseudolikelihood = -497.37881

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weight]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                |              Robust
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4211157   19.78   0.000      7.50351    9.154253
                |
    c.age#c.age |  -1.859798  .1591742  -11.68   0.000    -2.171774   -1.547823
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .1526964   23.09   0.000      3.22665    3.825209
----------------+--------------------------------------------------------------
weight <-       |
            age |   3.389281  .1157835   29.27   0.000      3.16235    3.616213
                |
         M2[id] |          1 (constrained)
                |
          _cons |   5.156913  .1345701   38.32   0.000      4.89316    5.420665
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
     var(M2[id])|   .6076662     .1974                     .3214791    1.148623
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
   var(e.weight)|   1.524052  .1705637                     1.223877    1.897849
-------------------------------------------------------------------------------

. mat b = e(b)

. *verify that estimates from the joint model are the same as
. *from models 1 and 2
. local stripes : colfullnames(b)

. foreach l of local stripes{
  2.    matrix  r1 =  b[1,"`l'"]
  3.    matrix r2 = stacked[1,"`l'"]
  4.    assert reldif(el(r1,1,1), el(r2,1,1))<1e-5
  5. }

The loop above verifies that all the labels in the second model correspond to estimates in the first and that the estimates are actually the same. If you omit the restriction for the variance in the joint model, the assert command will produce an error.

 

Technical note

 

As documented in [U] 20.21.2 Correlated errors: Cluster-robust standard errors, the formula for the robust estimator of the variance is

\[
V_{robust} = \hat V(\sum_{j=1}^N u'_ju_j) \hat V
\]

where \(N\) is the number of observations, \(\hat V\) is the conventional estimator of the variance, and for each observation \(j\), \(u_j\) is a row vector (with as many columns as parameters), which represents the contribution of this observation to the gradient. (If we stack the rows \(u_j\), the columns of this matrix are the scores.)

When we apply suest, the matrix \(\hat V\) is constructed as the stacked block-diagonal conventional variance estimates from the original submodels; this is the variance you will see if you apply gsem to the joint model without the vce(robust) option. The \(u_j\) values used by suest are now the values from both estimations, so we have as many \(u_j\) values as the sum of observations in the two original models and each row contains as many columns as the total number of parameters in both models. This is the exact operation that gsem, vce(robust) does.

When random effects are present, standard errors will be clustered on groups. Instead of observation-level contributions to the gradient, we would use cluster-level contributions. This means that observations in the two models would need to be clustered in a consistent manner; observations that are common to the two estimations would need to be in the same cluster in the two estimations.

Categories: Statistics Tags: , , ,

Using resampling methods to detect influential points


As stated in the documentation for jackknife, an often forgotten utility for this command is the detection of overly influential observations.

Some commands, like logit or stcox, come with their own set of prediction tools to detect influential points. However, these kinds of predictions can be computed for virtually any regression command. In particular, we will see that the dfbeta statistics can be easily computed for any command that accepts the jackknife prefix. dfbeta statistics allow us to visualize how influential some observations are compared with the rest, concerning a specific parameter.

We will also compute Cook’s likelihood displacement, which is an overall measure of influence, and it can also be compared with a specific threshold.

 

Using jackknife to compute dfbeta

 

The main task of jackknife is to fit the model while suppressing one observation at a time, which allows us to see how much results change when each observation is suppressed; in other words, it allows us to see how much each observation influences the results. A very intuitive measure of influence is dfbeta, which is the amount that a particular parameter changes when an observation is suppressed. There will be one dfbeta variable for each parameter. If \(\hat\beta\) is the estimate for parameter \(\beta\) obtained from the full data and \( \hat\beta_{(i)} \) is the corresponding estimate obtained when the \(i\)th observation is suppressed, then the \(i\)th element of variable dfbeta is obtained as

\[dfbeta = \hat\beta - \hat\beta_{(i)}\]

Parameters \(\hat\beta\) are saved by the estimation commands in matrix e(b) and also can be obtained using the _b notation, as we will show below. The leave-one-out values \(\hat\beta_{(i)}\) can be saved in a new file by using the option saving() with jackknife. With these two elements, we can compute the dfbeta values for each variable.

Let’s see an example with the probit command.

. sysuse auto, clear
(1978 Automobile Data)

. *preserve original dataset
. preserve

. *generate a variable with the original observation number
. gen obs =_n

. probit foreign mpg weight

Iteration 0:   log likelihood =  -45.03321
Iteration 1:   log likelihood = -27.914626
Iteration 2:   log likelihood = -26.858074
Iteration 3:   log likelihood = -26.844197
Iteration 4:   log likelihood = -26.844189
Iteration 5:   log likelihood = -26.844189

Probit regression                                 Number of obs   =         74
                                                  LR chi2(2)      =      36.38
                                                  Prob > chi2     =     0.0000
Log likelihood = -26.844189                       Pseudo R2       =     0.4039

------------------------------------------------------------------------------
     foreign |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -.1039503   .0515689    -2.02   0.044    -.2050235   -.0028772
      weight |  -.0023355   .0005661    -4.13   0.000     -.003445   -.0012261
       _cons |   8.275464   2.554142     3.24   0.001     3.269437    13.28149
------------------------------------------------------------------------------

. *keep the estimation sample so each observation will be matched
. *with the corresponding replication
. keep if e(sample)
(0 observations deleted)

. *use jackknife to generate the replications, and save the values in
. *file b_replic
. jackknife, saving(b_replic, replace):  probit foreign mpg weight
(running probit on estimation sample)

Jackknife replications (74)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
........................

Probit regression                               Number of obs      =        74
                                                Replications       =        74
                                                F(   2,     73)    =     10.36
                                                Prob > F           =    0.0001
Log likelihood = -26.844189                     Pseudo R2          =    0.4039

------------------------------------------------------------------------------
             |              Jackknife
     foreign |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -.1039503   .0831194    -1.25   0.215     -.269607    .0617063
      weight |  -.0023355   .0006619    -3.53   0.001    -.0036547   -.0010164
       _cons |   8.275464   3.506085     2.36   0.021     1.287847    15.26308
------------------------------------------------------------------------------

. *verify that all the replications were successful
. assert e(N_misreps) ==0

. merge 1:1 _n using b_replic

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                                74  (_merge==3)
    -----------------------------------------

. *see how values from replications are stored
. describe, fullnames

Contains data from .../auto.dta
  obs:            74                          1978 Automobile Data
 vars:            17                          13 Apr 2013 17:45
 size:         4,440                          (_dta has notes)
--------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------
make            str18   %-18s                 Make and Model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair Record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn Circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear Ratio
foreign         byte    %8.0g      origin     Car type
obs             float   %9.0g
foreign_b_mpg   float   %9.0g                 [foreign]_b[mpg]
foreign_b_weight
                float   %9.0g                 [foreign]_b[weight]
foreign_b_cons  float   %9.0g                 [foreign]_b[_cons]
_merge          byte    %23.0g     _merge
--------------------------------------------------------------------------------
Sorted by:
     Note:  dataset has changed since last saved

. *compute the dfbeta for each covariate
. foreach var in mpg weight {
  2.  gen dfbeta_`var' = (_b[`var'] -foreign_b_`var')
  3. }

. gen dfbeta_cons = (_b[_cons] - foreign_b_cons)

. label var obs "observation number"
. label var dfbeta_mpg "dfbeta for mpg"
. label var dfbeta_weight "dfbeta for weight"
. label var dfbeta_cons "dfbeta for the constant"

. *plot dfbeta values for variable mpg
. scatter dfbeta_mpg obs, mlabel(obs) title("dfbeta values for variable mpg")

. *restore original dataset
. restore

dfbeta_mpg

Based on the impact on the coefficient for variable mpg, observation 71 seems to be the most influential. We could create a similar plot for each parameter.

jackknife prints a dot for each successful replication and an ‘x’ for each replication that ends with an error. By looking at the output immediately following the jackknife command, we can see that all the replications were successful. However, we added an assert line in the code to avoid relying on visual inspection. If some replications failed, we would need to explore the reasons.

 

A computational shortcut to obtain the dfbeta values

 

The command jackknife allows us to save the leave-one-out values in a different file. To use these, we would need to do some data management and merge the two files. On the other hand, the same command called with the option keep saves pseudovalues, which are defined as follows:

\[\hat{\beta}_i^* = N\hat\beta - (N-1)\hat\beta_{(i)} \]

where \(N\) is the number of observations involved in the computation, returned as e(N). Therefore, using the pseudovalues, \(\beta_{(i)}\) values can be computed as \[\hat\beta_{(i)} = \frac{ N \hat\beta - \hat\beta^*_i}{N-1} \]

Also, dfbeta values can be computed directly from the pseudovalues as \[ \hat\beta - \hat\beta_{(i)} = \frac{\hat\beta_{i}^* -\hat\beta} {N-1} \]

Using the pseudovalues instead of the leave-one-out values simplifies our program because we don’t have to worry about matching each pseudovalue to the correct observation.

Let’s reproduce the previous example.

. sysuse auto, clear
(1978 Automobile Data)

. jackknife, keep: probit foreign  mpg weight
(running probit on estimation sample)

Jackknife replications (74)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
........................

Probit regression                               Number of obs      =        74
                                                Replications       =        74
                                                F(   2,     73)    =     10.36
                                                Prob > F           =    0.0001
Log likelihood = -26.844189                     Pseudo R2          =    0.4039

------------------------------------------------------------------------------
             |              Jackknife
     foreign |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -.1039503   .0831194    -1.25   0.215     -.269607    .0617063
      weight |  -.0023355   .0006619    -3.53   0.001    -.0036547   -.0010164
       _cons |   8.275464   3.506085     2.36   0.021     1.287847    15.26308
------------------------------------------------------------------------------

. *see how pseudovalues are stored
. describe, fullnames

Contains data from /Users/isabelcanette/Desktop/stata_mar18/309/ado/base/a/auto.
> dta
  obs:            74                          1978 Automobile Data
 vars:            15                          13 Apr 2013 17:45
 size:         4,070                          (_dta has notes)
--------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------
make            str18   %-18s                 Make and Model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair Record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g                 Length (in.)
turn            int     %8.0g                 Turn Circle (ft.)
displacement    int     %8.0g                 Displacement (cu. in.)
gear_ratio      float   %6.2f                 Gear Ratio
foreign         byte    %8.0g      origin     Car type
foreign_b_mpg   float   %9.0g                 pseudovalues: [foreign]_b[mpg]
foreign_b_weight
                float   %9.0g                 pseudovalues: [foreign]_b[weight]
foreign_b_cons  float   %9.0g                 pseudovalues: [foreign]_b[_cons]
--------------------------------------------------------------------------------
Sorted by:  foreign
     Note:  dataset has changed since last saved

. *verify that all the replications were successful
. assert e(N_misreps)==0

. *compute the dfbeta for each covariate
. local N = e(N)

. foreach var in  mpg weight {
  2. gen dfbeta_`var' = (foreign_b_`var' - _b[`var'])/(`N'-1)
  3. }

. gen dfbeta_`cons' = (foreign_b_cons - _b[_cons])/(`N'-1)

. *plot deff values for variable weight
. gen obs = _n

. label var obs "observation number"

. label var dfbeta_mpg "dfbeta for mpg"

. scatter dfbeta_mpg obs, mlabel(obs) title("dfbeta values for variable mpg")

dfbeta2_mpg

 

Dfbeta for grouped data

 

If you have panel data or a situation where each individual is represented by a group of observations (for example, conditional logit or survival models), you might be interested in influential groups. In this case, you would look at the changes on the parameters when each group is suppressed. Let’s see an example with xtlogit.

. webuse towerlondon, clear
. xtset family

. jackknife, cluster(family) idcluster(newclus) keep: xtlogit dtlm difficulty
. assert e(N_misreps)==0

The group-level pseudovalues will be saved on the first observations corresponding to each group, and there will be missing values on the rest. To compute the dfbeta value for the coefficient for difficulty, we type

. local N = e(N_clust)
. gen dfbeta_difficulty = (dtlm_b_difficulty - _b[difficulty])/(`N'-1)

We can then plot those values:

. scatter dfbeta_difficulty newclus, mlabel(family) ///
       title("dfbeta values for variable difficulty") xtitle("family")

dfbeta_group

Option idcluster() for jackknife generates a new variable that assigns consecutive integers to the clusters; using this variable produces a plot where families are equally spaced on the horizontal axis.

As before, we can see that some groups are more influential than others. It would require some research to find out whether this is a problem.

 

Likelihood displacement

 

If we want a global measure of influence (that is, not tied to a particular parameter), we can compute the likelihood displacement values. We consider the likelihood displacement value as defined by Cook (1986):

\[LD_i = 2[L(\hat\theta) - L(\hat\theta_{(i)})] \]

where \(L\) is the log-likelihood function (evaluated on the full dataset), \(\hat\theta\) is the set of parameter estimates obtained from the full dataset, and \(\hat\theta_{(i)}\) is the set of the parameter estimates obtained when leaving out the \(i\)th observation. Notice that what changes is the parameter vector. The log-likelihood function is always evaluated on the whole sample; provided that \(\hat\theta\) is the set of parameters that maximizes the log likelihood, the log-likelihood displacement is always positive. Cook suggested, as a confidence region for this value, the interval \([0, \chi^2_p(\alpha))\), where \(\chi^2_p(\alpha)\) is the (\(1-\alpha\)) quantile from a chi-squared distribution with \(p\) degrees of freedom, and \(p\) is the number of parameters in \(\theta\).

To perform our assessment based on the likelihood displacement, we will need to do the following:

  1. Create an \(N\times p\) matrix B, where the \(i\)th row contains the vector of parameter estimates obtained by leaving the \(i\)th observation out.
  2. Create a new variable L1 such that its \(i\)th observation contains the log likelihood evaluated at the parameter estimates in the \(i\)th row of matrix B.
  3. Use variable L1 to obtain the LD matrix, containing the likelihood displacement values.
  4. Construct a plot for the values in LD, and add the \(\chi^2_p(\alpha)\) as a reference.

Let’s do it with our probit model.

 

Step 1.

We first create the macro cmdline containing the command line for the model we want to use. We fit the model and save the original log likelihood in macro ll0.

With a loop, the leave-one-out parameters are saved in consecutive rows of matrix B. It is useful to have those values in a matrix, because we will then extract each row to evaluate the log likelihood at those values.

**********Step 1
sysuse auto, clear
set more off
local cmdline probit foreign weight mpg
`cmdline'
keep if e(sample)
local ll0 = e(ll)
mat b0 = e(b)
mat b = b0

local N = _N

forvalues i = 1(1)`N'{
   `cmdline' if _n !=`i'
   mat b1 = e(b)
   mat b = b \ b1
}

mat B = b[2...,1...]
mat list B

 

Step 2.

In each iteration of a loop, a row from B is stored as matrix b. To evaluate the log likelihood at these values, the trick is to use them as initial values and invoke the command with 0 iterations. This can be done for any command that is based on ml.

**********Step 2

gen L1 = .

forvalues i = 1(1)`N'{
    mat b = B[`i',1...]
    `cmdline', from(b) iter(0)
    local ll = e(ll)
    replace L1 = `ll' in `i'
}

 

Step 3.

Using variable L1 and the macro with the original log likelihood, we compute Cook’s likehood displacement.

**********Step 3

gen LD = 2*(`ll0' - L1)

 

Step 4.

Create the plot, using as a reference the 90% quantile for the \(\chi^2\) distribution. \(p\) is the number of columns in matrix b0 (or equivalently, the number of columns in matrix B).

**********Step 4

local k = colsof(b0)
gen upper_bound = invchi2tail(`k', .1)
gen n = _n

twoway scatter LD n, mlabel(n) || line upper_bound n, ///
title("Likelihood displacement")

ldispl

We can see that observation 71 is the most influential, and its likelihood displacement value is within the range we would normally expect.

Reference

Cook, D. 1986. Assessment of local influence. Journal of the Royal Statistical Society, Series B 48: 133–169.

Fitting ordered probit models with endogenous covariates with Stata’s gsem command


The new command gsem allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates.

 

Parameterizations for an ordinal probit model

 
The ordinal probit model is used to model ordinal dependent variables. In the usual parameterization, we assume that there is an underlying linear regression, which relates an unobserved continuous variable \(y^*\) to the covariates \(x\).

\[y^*_{i} = x_{i}\gamma + u_i\]

The observed dependent variable \(y\) relates to \(y^*\) through a series of cut-points \(-\infty =\kappa_0<\kappa_1<\dots< \kappa_m=+\infty\) , as follows:

\[y_{i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{i} \leq \kappa_j\]

Provided that the variance of \(u_i\) can’t be identified from the observed data, it is assumed to be equal to one. However, we can consider a re-scaled parameterization for the same model; a straightforward way of seeing this, is by noting that, for any positive number \(M\):

\[\kappa_{j-1} < y^*_{i} \leq \kappa_j \iff
M\kappa_{j-1} < M y^*_{i} \leq M\kappa_j
\]

that is,

\[\kappa_{j-1} < x_i\gamma + u_i \leq \kappa_j \iff
M\kappa_{j-1}< x_i(M\gamma) + Mu_i \leq M\kappa_j
\]

In other words, if the model is identified, it can be represented by multiplying the unobserved variable \(y\) by a positive number, and this will mean that the standard error of the residual component, the coefficients, and the cut-points will be multiplied by this number.

Let me show you an example; I will first fit a standard ordinal probit model, both with oprobit and with gsem. Then, I will use gsem to fit an ordinal probit model where the residual term for the underlying linear regression has a standard deviation equal to 2. I will do this by introducing a latent variable \(L\), with variance 1, and coefficient \(\sqrt 3\). This will be added to the underlying latent residual, with variance 1; then, the ‘new’ residual term will have variance equal to \(1+((\sqrt 3)^2\times Var(L))= 4\), so the standard deviation will be 2. We will see that as a result, the coefficients, as well as the cut-points, will be multiplied by 2.

. sysuse auto, clear
(1978 Automobile Data)

. oprobit rep mpg disp , nolog

Ordered probit regression                         Number of obs   =         69
                                                  LR chi2(2)      =      14.68
                                                  Prob > chi2     =     0.0006
Log likelihood = -86.352646                       Pseudo R2       =     0.0783

------------------------------------------------------------------------------
       rep78 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
-------------+----------------------------------------------------------------
       /cut1 |  -1.570496   1.146391                      -3.81738    .6763888
       /cut2 |  -.7295982   1.122361                     -2.929386     1.47019
       /cut3 |   .6580529   1.107838                     -1.513269    2.829375
       /cut4 |    1.60884   1.117905                     -.5822132    3.799892
------------------------------------------------------------------------------

. gsem (rep <- mpg disp, oprobit), nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.352646

--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |   .0497185   .0355452     1.40   0.162    -.0199487    .1193858
  displacement |  -.0029884   .0021498    -1.39   0.165     -.007202    .0012252
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -1.570496   1.146391    -1.37   0.171     -3.81738    .6763888
         /cut2 |  -.7295982   1.122361    -0.65   0.516    -2.929386     1.47019
         /cut3 |   .6580529   1.107838     0.59   0.553    -1.513269    2.829375
         /cut4 |    1.60884   1.117905     1.44   0.150    -.5822132    3.799892
--------------------------------------------------------------------------------

. local a = sqrt(3)

. gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog

Generalized structural equation model             Number of obs   =         69
Log likelihood = -86.353008

 ( 1)  [rep78]L = 1.732051
 ( 2)  [var(L)]_cons = 1
--------------------------------------------------------------------------------
               |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <-       |
           mpg |    .099532     .07113     1.40   0.162    -.0398802    .2389442
  displacement |  -.0059739   .0043002    -1.39   0.165    -.0144022    .0024544
             L |   1.732051  (constrained)
---------------+----------------------------------------------------------------
rep78          |
         /cut1 |  -3.138491   2.293613    -1.37   0.171     -7.63389    1.356907
         /cut2 |  -1.456712   2.245565    -0.65   0.517    -5.857938    2.944513
         /cut3 |   1.318568    2.21653     0.59   0.552     -3.02575    5.662887
         /cut4 |   3.220004   2.236599     1.44   0.150     -1.16365    7.603657
---------------+----------------------------------------------------------------
         var(L)|          1  (constrained)
--------------------------------------------------------------------------------

 

Ordinal probit model with endogenous covariates

 
This model is defined analogously to the model fitted by -ivprobit- for probit models with endogenous covariates; we assume an underlying model with two equations,

\[
\begin{eqnarray}
y^*_{1i} =& y_{2i} \beta + x_{1i} \gamma + u_i & \\
y_{2i} =& x_{1i} \pi_1 + x_{2i} \pi_2 + v_i & \,\,\,\,\,\, (1)
\end{eqnarray}
\]

where \(u_i \sim N(0, 1) \), \(v_i\sim N(0,s^2) \), and \(corr(u_i, v_i) = \rho\).

We don’t observe \(y^*_{1i}\); instead, we observe a discrete variable \(y_{1i}\), such as, for a set of cut-points (to be estimated) \(\kappa_0 = -\infty < \kappa_1 < \kappa_2 \dots < \kappa_m = +\infty \),

\[y_{1i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{1i} \leq \kappa_j \]

 

The parameterization we will use

 
I will re-scale the first equation, preserving the correlation. That is, I will consider the following system:

\[
\begin{eqnarray}
z^*_{1i} =&
y_{2i}b +x_{1i}c + t_i + \alpha L_i &\\
y_{2i} = &x_{1i}\pi_1 + x_{2i}\pi_2 + w_i + \alpha L_i & \,\,\,\,\,\, (2)
\end{eqnarray}
\]

where \(t_i, w_i, L_i\) are independent, \(t_i \sim N(0, 1)\) , \(w_i \sim N(0,\sigma^2)\), \(L_i \sim N(0, 1)\)

\[y_{1i} = j {\mbox{ if }} \lambda_{j-1} < z^*_{1i} \leq \lambda_j \]

By introducing a latent variable in both equations, I am modeling a correlation between the error terms. The fist equation is a re-scaled version of the original equation, that is, \(z^*_1 = My^*_1\),

\[ y_{2i}b +x_{1i}c + t_i + \alpha_i L_i
= M(y_{2i}\beta) +M x_{1i}\gamma + M u_i \]

This implies that
\[M u_i = t_i + \alpha_i L_i, \]
where \(Var(u_i) = 1\) and \(Var(t_i + \alpha L_i) = 1 + \alpha^2\), so the scale is \(M = \sqrt{1+\alpha^2} \).

The second equation remains the same, we just express \(v_i\) as \(w_i + \alpha L_i\). Now, after estimating the system (2), we can recover the parameters in (1) as follows:

\[\beta = \frac{1}{\sqrt{1+ \alpha^2}} b\]
\[\gamma = \frac{1}{\sqrt{1+ \alpha^2}} c\]
\[\kappa_j = \frac{1}{\sqrt{1+ \alpha^2}} \lambda_j \]

\[V(v_i) = V(w_i + \alpha L_i) =V(w_i) + \alpha^2\].

\[\rho = Cov(t_i + \alpha L_i, w_i + \alpha L_i) =
\frac{\alpha^2}{(\sqrt{1+\alpha^2}\sqrt{V(w_i)+\alpha^2)}}\]

Note: This parameterization assumes that the correlation is positive; for negative values of the correlation, \(L\) should be included in the second equation with a negative sign (that is, L@(-a) instead of L@a). When trying to perform the estimation with the wrong sign, the model most likely won’t achieve convergence. Otherwise, you will see a coefficient for L that is virtually zero. In Stata 13.1 we have included features that allow you to fit the model without this restriction. However, this time we will use the older parameterization, which will allow you to visualize the different components more easily.

 

Simulating data, and performing the estimation

 

clear
set seed 1357
set obs 10000
forvalues i = 1(1)5 {
    gen x`i' =2* rnormal() + _n/1000
}

mat C = [1,.5 \ .5, 1]
drawnorm z1 z2, cov(C)

gen y2 = 0
forvalues i = 1(1)5 {
    replace y2 = y2 + x`i'
}
replace y2 = y2 + z2

gen y1star = y2 + x1 + x2 + z1
gen xb1 = y2 + x1 + x2

gen y1 = 4
replace y1 = 3 if xb1 + z1 <=.8
replace y1 = 2 if xb1 + z1 <=.3
replace y1 = 1 if xb1 + z1 <=-.3
replace y1 = 0 if xb1 + z1 <=-.8

gsem (y1 <- y2 x1 x2 L@a, oprobit) (y2 <- x1 x2 x3 x4 x5 L@a), var(L@1)

local y1 y1
local y2 y2

local xaux  x1 x2 x3 x4 x5
local xmain  y2 x1 x2

local s2 sqrt(1+_b[`y1':L]^2)
foreach v in `xmain'{
    local trans `trans' (`y1'_`v': _b[`y1':`v']/`s2')
}

foreach v in `xaux' _cons {
    local trans `trans' (`y2'_`v': _b[`y2':`v'])
}

qui tab `y1' if e(sample)
local ncuts = r(r)-1
forvalues i = 1(1) `ncuts'{
    local trans `trans' (cut_`i': _b[`y1'_cut`i':_cons]/`s2')
}

local s1 sqrt(  _b[var(e.`y2'):_cons]  +_b[`y1':L]^2)

local trans `trans' (sig_2: `s1')
local trans `trans' (rho_12: _b[`y1':L]^2/(`s1'*`s2'))
nlcom `trans'

 

Results

 
This is the output from gsem:

Generalized structural equation model             Number of obs   =      10000
Log likelihood = -14451.117

 ( 1)  [y1]L - [y2]L = 0
 ( 2)  [var(L)]_cons = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 <-        |
          y2 |   1.379511   .0775028    17.80   0.000     1.227608    1.531414
          x1 |   1.355687   .0851558    15.92   0.000     1.188785    1.522589
          x2 |   1.346323   .0833242    16.16   0.000      1.18301    1.509635
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
-------------+----------------------------------------------------------------
y2 <-        |
          x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
          x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
          x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
          x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
          x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
           L |   .7786594   .0479403    16.24   0.000     .6846982    .8726206
       _cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
-------------+----------------------------------------------------------------
y1           |
       /cut1 |  -1.131155   .1157771    -9.77   0.000    -1.358074   -.9042358
       /cut2 |  -.5330973   .1079414    -4.94   0.000    -.7446585    -.321536
       /cut3 |   .2722794   .1061315     2.57   0.010     .0642654    .4802933
       /cut4 |     .89394   .1123013     7.96   0.000     .6738334    1.114047
-------------+----------------------------------------------------------------
       var(L)|          1  (constrained)
-------------+----------------------------------------------------------------
    var(e.y2)|   .3823751    .074215                      .2613848    .5593696
------------------------------------------------------------------------------

These are the results we obtain when we transform the values reported by gsem to the original parameterization:

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       y1_y2 |   1.088455   .0608501    17.89   0.000     .9691909    1.207719
       y1_x1 |   1.069657   .0642069    16.66   0.000      .943814    1.195501
       y1_x2 |   1.062269   .0619939    17.14   0.000      .940763    1.183774
       y2_x1 |   .9901353   .0044941   220.32   0.000      .981327    .9989435
       y2_x2 |   1.006836   .0044795   224.76   0.000      .998056    1.015615
       y2_x3 |   1.004249   .0044657   224.88   0.000     .9954963    1.013002
       y2_x4 |   .9976541   .0044783   222.77   0.000     .9888767    1.006431
       y2_x5 |   .9987587   .0044736   223.26   0.000     .9899907    1.007527
    y2__cons |   .0002758   .0192417     0.01   0.989    -.0374372    .0379887
       cut_1 |   -.892498   .0895971    -9.96   0.000    -1.068105   -.7168909
       cut_2 |  -.4206217   .0841852    -5.00   0.000    -.5856218   -.2556217
       cut_3 |   .2148325   .0843737     2.55   0.011     .0494632    .3802018
       cut_4 |    .705332   .0905974     7.79   0.000     .5277644    .8828997
       sig_2 |   .9943267    .007031   141.42   0.000     .9805462    1.008107
      rho_12 |   .4811176   .0477552    10.07   0.000     .3875191     .574716
------------------------------------------------------------------------------

The estimates are quite close to the values used for the simulation. If you try to perform the estimation with the wrong sign for the coefficient for L, you will get a number that is virtually zero (if you get convergence at all). In this case, the evaluator is telling us that the best value it can find, provided the restrictions we have imposed, is zero. If you see such results, you may want to try the opposite sign. If both give a zero coefficient, it means that this is the solution, and there is not endogeneity at all. If one of them is not zero, it means that the non-zero value is the solution. As stated before, in Stata 13.1, the model can be fitted without this restriction.

A tip to debug your nl/nlsur function evaluator program

If you have a bug in your evaluator program, nl will produce, most probably, the following error:

your program returned 198
verify that your program is a function evaluator program
r(198);

The error indicates that your program cannot be evaluated.

The best way to spot any issues in your evaluator program is to run it interactively. You just need to define your sample (usually observations where none of the variables are missing), and a matrix with values for your parameters. Let me show you an example with nlces2. This is the code to fit the CES production function, from the documentation for the nl command:

cscript
program nlces2
version 12
syntax varlist(min=3 max=3) if, at(name)
local logout : word 1 of `varlist'
local capital : word 2 of `varlist'
local labor : word 3 of `varlist'
// Retrieve parameters out of at matrix
tempname b0 rho delta
scalar `b0' = `at'[1, 1]
scalar `rho' = `at'[1, 2]
scalar `delta' = `at'[1, 3]
tempvar kterm lterm
generate double `kterm' = `delta'*`capital'^(-1*`rho') `if'
generate double `lterm' = (1-`delta')*`labor'^(-1*`rho') `if'
// Fill in dependent variable
replace `logout' = `b0' - 1/`rho'*ln(`kterm' + `lterm') `if'
end

webuse production, clear
nl ces2 @ lnoutput capital labor, parameters(b0 rho delta) ///
               initial(b0 0 rho 1 delta 0.5)

Now, let me show you how to run it interactively:

webuse production, clear
*generate a variable to restrict my sample to observations
*with non-missing values in my variables
egen u = rowmiss(lnoutput capital labor)

*generate a matrix with parameters where I will evaluate my function
mat M = (0,1,.5)
gen nloutput_new = 1 
nlces2 nloutput_new capital labor if u==0, at(M)  

This will evaluate the program only once, using the parameters in matrix M. Notice that I generated a new variable to use as my dependent variable. This is because the program nlces2, when run by itself, will modify the dependent variable.
When you run this program by itself, you will obtain a more specific error message. You can add debugging code to this program, and you can also use the trace setting to see how each step is executed. Type help trace to learn about this setting.

Another possible source of error (which will generate error r(480) when run from nl) is when an evaluator function produces missing values for observations in the sample. If this is the case, you will see those missing values in the variable nloutput_new, i.e., in the variable you entered as dependent when running your evaluator by itself. You can then add debugging code, for example, using codebook or summarize to examine the different parts that contribute to the substitution performed in the dependent variable.

For example, after the line that generates `kterm’, I could write

summarize `kterm' if u == 0

to see if this variable contains any missing values in my sample.

This method can also be used to debug your function evaluator programs for nlsur. In order to preserve your dataset, you need to use copies for all the dependent variables in your model.

Categories: Programming Tags: , , ,

Positive log-likelihood values happen

From time to time, we get a question from a user puzzled about getting a positive log likelihood for a certain estimation. We get so used to seeing negative log-likelihood values all the time that we may wonder what caused them to be positive.

First, let me point out that there is nothing wrong with a positive log likelihood.

The likelihood is the product of the density evaluated at the observations. Usually, the density takes values that are smaller than one, so its logarithm will be negative. However, this is not true for every distribution.

For example, let’s think of the density of a normal distribution with a small standard deviation, let’s say 0.1.

. di normalden(0,0,.1)
3.9894228

This density will concentrate a large area around zero, and therefore will take large values around this point. Naturally, the logarithm of this value will be positive.

. di log(3.9894228)
1.3836466

In model estimation, the situation is a bit more complex. When you fit a model to a dataset, the log likelihood will be evaluated at every observation. Some of these evaluations may turn out to be positive, and some may turn out to be negative. The sum of all of them is reported. Let me show you an example.

I will start by simulating a dataset appropriate for a linear model.

clear
program drop _all
set seed 1357
set obs 100
gen x1 = rnormal()
gen x2 = rnormal()
gen y = 2*x1 + 3*x2 +1 + .06*rnormal()

I will borrow the code for mynormal_lf from the book Maximum Likelihood Estimation with Stata (W. Gould, J. Pitblado, and B. Poi, 2010, Stata Press) in order to fit my model via maximum likelihood.

program mynormal_lf
        version 11.1
        args lnf mu lnsigma
        quietly replace `lnf' = ln(normalden($ML_y1,`mu',exp(`lnsigma')))
end

ml model lf  mynormal_lf  (y = x1 x2) (lnsigma:)
ml max, nolog

The following table will be displayed:

.   ml max, nolog

                                                  Number of obs   =        100
                                                  Wald chi2(2)    =  456919.97
Log likelihood =  152.37127                       Prob > chi2     =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
eq1          |   
          x1 |   1.995834    .005117   390.04   0.000     1.985805    2.005863
          x2 |   3.014579   .0059332   508.08   0.000      3.00295    3.026208
       _cons |   .9990202   .0052961   188.63   0.000       .98864      1.0094
-------------+----------------------------------------------------------------
lnsigma      |  
       _cons |  -2.942651   .0707107   -41.62   0.000    -3.081242   -2.804061
------------------------------------------------------------------------------

We can see that the estimates are close enough to our original parameters, and also that the log likelihood is positive.

We can obtain the log likelihood for each observation by substituting the estimates in the log-likelihood formula:

. predict double xb

. gen double lnf = ln(normalden(y, xb, exp([lnsigma]_b[_cons])))

. summ lnf, detail

                             lnf
-------------------------------------------------------------
      Percentiles      Smallest
 1%    -1.360689      -1.574499
 5%    -.0729971       -1.14688
10%     .4198644      -.3653152       Obs                 100
25%     1.327405      -.2917259       Sum of Wgt.         100

50%     1.868804                      Mean           1.523713
                        Largest       Std. Dev.      .7287953
75%     1.995713       2.023528
90%     2.016385       2.023544       Variance       .5311426
95%     2.021751       2.023676       Skewness      -2.035996
99%     2.023691       2.023706       Kurtosis       7.114586

. di r(sum)
152.37127

. gen f = exp(lnf)

. summ f, detail

                              f
-------------------------------------------------------------
      Percentiles      Smallest
 1%     .2623688       .2071112
 5%     .9296673       .3176263
10%      1.52623       .6939778       Obs                 100
25%     3.771652       .7469733       Sum of Wgt.         100

50%     6.480548                      Mean           5.448205
                        Largest       Std. Dev.      2.266741
75%     7.357449       7.564968
90%      7.51112        7.56509       Variance       5.138117
95%     7.551539       7.566087       Skewness      -.8968159
99%     7.566199        7.56631       Kurtosis       2.431257

We can see that some values for the log likelihood are negative, but most are positive, and that the sum is the value we already know. In the same way, most of the values of the likelihood are greater than one.

As an exercise, try the commands above with a bigger variance, say, 1. Now the density will be flatter, and there will be no values greater than one.

In short, if you have a positive log likelihood, there is nothing wrong with that, but if you check your dispersion parameters, you will find they are small.

Categories: Statistics Tags:

Including covariates in crossed-effects models

The manual entry for xtmixed documents all the official features in the command, and several applications. However, it would be impossible to address all the models that can be fitted with this command in a manual entry. I want to show you how to include covariates in a crossed-effects model.

Let me start by reviewing the crossed-effects notation for xtmixed. I will use the homework dataset from Kreft and de Leeuw (1998) (a subsample from the National Education Longitudinal Study of 1988). You can download the dataset from the webpage for Rabe-Hesketh & Skrondal (2008) (http://www.stata-press.com/data/mlmus2.html), and run all the examples in this entry.

If we want to fit a model with variable math (math grade) as outcome, and two crossed effects: variable region and variable urban, the standard syntax would be:

(1)   xtmixed math ||_all:R.region || _all: R.urban

The underlying model for this syntax is

math_ijk = b + u_i + v_j + eps_ijk

where i represents the region and j represents the level of variable urban, u_i are i.i.d, v_j are i.i.d, and eps_ijk are i.i.d, and all of them are independent from each other.

The standard notation for xtmixed assumes that levels are always nested. In order to fit non-nested models, we create an artificial level with only one category consisting of all the observations; in addition, we use the notation R.var, which indicates that we are including dummies for each category of variable var, while constraining the variances to be the same.

That is, if we write

xtmixed math  ||_all:R.region

we are just fitting the model:

xtmixed math || region:

but we are doing it in a very inefficient way. What we are doing is exactly the following:

generate one = 1
tab region, gen(id_reg)
xtmixed math || one: id_reg*, cov(identity) nocons

That is, instead of estimating one variance parameter, we are estimating four, and constraining them to be equal. Therefore, a more efficient way to fit our mixed model (1), would be:

xtmixed  math  ||_all:R.region || urban:

This will work because urban is nested in one. Therefore, if we want to include a covariate (also known as random slope) in one of the levels, we just need to place that level at the end and use the usual syntax for random slope, for example:

xtmixed math public || _all:R.region || urban: public

Now let’s assume that we want to include random coefficients in both levels; how would we do that? The trick is to use the _all notation to include a random coefficient in the model. For example, if we want to fit

(2) xtmixed math meanses || region: meanses

we are assuming that variable meanses (mean SES per school) has a different effect (random slope) for each region. This model can be expressed as

math_ik = x_ik*b + sigma_i + alpha_i*meanses_ik

where sigma_i are i.i.d, alpha_i are i.i.d, and sigmas and alphas are independent from each other. This model can be fitted by generating all the interactions of meanses with the regions, including a random alpha_i for each interaction, and restricting their variances to be equal. In other words, we can fit model (2) also as follows:

unab idvar: id_reg* 
foreach v of local idvar{
    gen inter`v' = meanses*`v'
}

xtmixed math  meanses ///
  || _all:inter*, cov(identity) nocons ///
  || _all: R.region

Finally, we can use all these tools to include random coefficients in both levels, for example:

xtmixed math parented meanses public || _all: R.region || ///
   _all:inter*, cov(identity) nocons || urban: public

References:
Kreft, I.G.G and de J. Leeuw. 1998. Introducing Multilevel Modeling. Sage.
Rabe-Hesketh, S. and A. Skrondal. 2008. Multilevel and Longitudinal Modeling Using Stata, Second Edition. Stata Press

Categories: Statistics Tags: ,