Home > Statistics > Using gmm to solve two-step estimation problems

Using gmm to solve two-step estimation problems

Two-step estimation problems can be solved using the gmm command.

When a two-step estimator produces consistent point estimates but inconsistent standard errors, it is known as the two-step-estimation problem. For instance, inverse-probability weighted (IPW) estimators are a weighted average in which the weights are estimated in the first step. Two-step estimators use first-step estimates to estimate the parameters of interest in a second step. The two-step-estimation problem arises because the second step ignores the estimation error in the first step.

One solution is to convert the two-step estimator into a one-step estimator. My favorite way to do this conversion is to stack the equations solved by each of the two estimators and solve them jointly. This one-step approach produces consistent point estimates and consistent standard errors. There is no two-step problem because all the computations are performed jointly. Newey (1984) derives and justifies this approach.

I’m going to illustrate this approach with the IPW example, but it can be used with any two-step problem as long as each step is continuous.

IPW estimators are frequently used to estimate the mean that would be observed if everyone in a population received a specified treatment, a quantity known as a potential-outcome mean (POM). A difference of POMs is called the average treatment effect (ATE). Aside from all that, it is the mechanics of the two-step IPW estimator that interest me here. IPW estimators are weighted averages of the outcome, and the weights are estimated in a first step. The weights used in the second step are the inverse of the estimated probability of treatment.

Let’s imagine we are analyzing an extract of the birthweight data used by Cattaneo (2010). In this dataset, bweight is the baby’s weight at birth, mbsmoke is 1 if the mother smoked while pregnant (and 0 otherwise), mmarried is 1 if the mother is married, and prenatal1 is 1 if the mother had a prenatal visit in the first trimester.

Let’s imagine we want to estimate the mean when all pregnant women smoked, which is to say, the POM for smoking. If we were doing substantive research, we would also estimate the POM when no pregnant women smoked. The difference between these estimated POMs would then estimate the ATE of smoking.

In the IPW estimator, we begin by estimating the probability weights for smoking. We fit a probit model of mbsmoke as a function of mmarried and prenatal1.

. use cattaneo2
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. probit mbsmoke mmarried prenatal1, vce(robust)

Iteration 0:   log pseudolikelihood = -2230.7484
Iteration 1:   log pseudolikelihood = -2102.6994
Iteration 2:   log pseudolikelihood = -2102.1437
Iteration 3:   log pseudolikelihood = -2102.1436

Probit regression                                 Number of obs   =       4642
                                                  Wald chi2(2)    =     259.42
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -2102.1436                 Pseudo R2       =     0.0577

------------------------------------------------------------------------------
             |               Robust
     mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |  -.6365472   .0478037   -13.32   0.000    -.7302407   -.5428537
   prenatal1 |  -.2144569   .0547583    -3.92   0.000    -.3217811   -.1071327
       _cons |  -.3226297   .0471906    -6.84   0.000    -.4151215   -.2301379
------------------------------------------------------------------------------

The results indicate that both mmarried and prenatal1 significantly predict whether the mother smoked while pregnant.

We want to calculate the inverse probabilities. We begin by getting the probabilities:

. predict double pr, pr

Now, we can obtain the inverse probabilities by typing

. generate double ipw = (mbsmoke==1)/pr

We can now perform the second step: calculate the mean for smokers by using the IPWs.

. mean bweight [pw=ipw]

Mean estimation                     Number of obs    =     864

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     bweight |   3162.868   21.71397      3120.249    3205.486
--------------------------------------------------------------
. mean bweight [pw=ipw] if mbsmoke

The point estimate reported by mean is consistent; the reported standard error is not. It is not because mean takes the weights as fixed when they were in fact estimated.

The stacked two-step—using gmm to solve the two-step-estimation problem—instead creates a one-step estimator that solves both steps simultaneously.

To do that, we have to find and then code the moment conditions.

So what are the moment conditions for the first-step maximum-likelihood probit? Maximum likelihood (ML) estimators obtain their parameter estimates by finding the parameters that set the means of the first derivatives with respect to each parameter to 0. The means of the first derivatives are the moments.

The moment conditions are that the means of the first derivatives equal 0. We can obtain those first derivatives for ourselves, or we can copy them from the Methods and formulas section of [R] probit:

\[
1/N\sum_{i=1}^N\frac{ \phi({\bf x}_i\boldsymbol{\beta}’)
\left\{d_i-\Theta\left({\bf
x}_i\boldsymbol{\beta}’\right)\right\}}{\Theta\left({\bf
x}_i\boldsymbol{\beta}’\right)
\left\{1-\Theta\left({\bf x}_i\boldsymbol{\beta}’\right)\right\}}{\bf x}_i’ = {\bf 0}
\]

where \(\phi()\) is the density function of the standard normal distribution, \(d_i\) is the binary variable that is 1 for treated individuals (and 0 otherwise), and \(\Theta()\) is the cumulative probability function of the standard normal.

What’s the point of these moment conditions? We are going to use the generalized method of moments (GMM) to solve for the ML probit estimates. GMM is an estimation framework that defines estimators that solve moment conditions. The GMM estimator that sets the mean of the first derivatives of the ML probit to 0 produces the same point estimates as the ML probit estimator.

Stata’s GMM estimator is the gmm command; see [R] gmm for an introduction.

The structure of these moment conditions greatly simplifies the problem. For each observation, the left-hand side is the product of a scalar subexpression, namely,

\[
\frac{\phi({\bf x}_i\boldsymbol{\beta}’)\{d_i-\Theta({\bf
x}_i\boldsymbol{\beta}’)\}}
{\Theta({\bf x}_i\boldsymbol{\beta}’)\{1-\Theta({\bf
x}_i\boldsymbol{\beta}’)\}}
\]

and the covariates \({\bf x}_i\). In GMM parlance, the variables that multiply the scalar expression are called instruments.

The gmm command that will solve these moment conditions is

. generate double cons = 1

. gmm (normalden({xb:mmarried prenatal1 cons})*(mbsmoke - normal({xb:}))/ ///
>         (normal({xb:})*(1-normal({xb:})) )),                            ///
>         instruments(mmarried prenatal1 )  winitial(identity) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  .61413428
Iteration 1:   GMM criterion Q(b) =  .00153235
Iteration 2:   GMM criterion Q(b) =  1.652e-06
Iteration 3:   GMM criterion Q(b) =  1.217e-12
Iteration 4:   GMM criterion Q(b) =  7.162e-25

GMM estimation

Number of parameters =   3
Number of moments    =   3
Initial weight matrix: Identity                       Number of obs  =    4642

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
/xb_mmarried |  -.6365472   .0477985   -13.32   0.000    -.7302306   -.5428638
/xb_prenat~1 |  -.2144569   .0547524    -3.92   0.000    -.3217696   -.1071442
    /xb_cons |  -.3226297   .0471855    -6.84   0.000    -.4151115   -.2301479
------------------------------------------------------------------------------
Instruments for equation 1: mmarried prenatal1 _cons

With gmm, we specify in parentheses the scalar expression, and we specify the covariates in the instruments() option. The unknown parameters are the implied coefficients on the variables specified in {xb:mmarried prenatal1 cons}. Note that we subsequently refer to the linear combination as {xb:}.

The winitial(identity) and onestep options help the solution-finding technique.

The point estimates and the standard errors produced by the gmm command match those reported by probit, ignoring numerical issues.

Now that we can use gmm to obtain our first-step estimates, we need to add the moment condition that defines the weighted average of the POM for smokers. The equation for the POM for smokers is

\[
{\rm POM} = 1/N\sum_{i=1}^{N}{{\bf mbsmoke}_i\over{\Phi({\bf x}_i\boldsymbol{\beta})}}
\]

Recall that the inverse weights are \(1/\Phi({\bf x}_i\boldsymbol{\beta})\) for smokers. When we solved this problem using a two-step estimator, we performed the second step only for smokers. We typed mean bweight [pw=ipw] if mbsmoke==1. We cannot use if mbsmoke==1 in the gmm command because the first step has to be performed over all the data. Instead, we set the weights to 0 in the second step for the nonsmokers. Multiplying \(1/\Phi({\bf x}_i\boldsymbol{\beta})\) by \({\bf mbsmoke}_i\) does that.

Anyway, the equation for the POM for smokers is

\[
{\rm POM} = 1/N\sum_{i=1}^{N}{{\bf mbsmoke}_i\over{\Phi({\bf x}_i\boldsymbol{\beta})}}\]

and the moment condition is therefore

\[
1/N\sum_{i=1}^{N}{{\bf mbsmoke}_i\over{\Phi({\bf x}_i\boldsymbol{\beta})}} – {\rm
POM} = 0
\]

In the gmm command below, I call the scalar expression for the probit moment conditions eq1, and I call the scalar expression for the POM weighted-average equation eq2. Both moment conditions have the scalar-expression-times-instrument structure, but the weighted-average moment expression is multiplied by a constant that is included as an instrument by default. In the weighted-average moment condition, parameter pom is the POM we wish to estimate.

. gmm (eq1: normalden({xb:mmarried prenatal1 cons})*                     ///
>         (mbsmoke - normal({xb:}))/(normal({xb:})*(1-normal({xb:})) ))  ///
>     (eq2: (mbsmoke/normal({xb:}))*(bweight - {pom})),                  ///
>     instruments(eq1:mmarried prenatal1 )                               ///
>     instruments(eq2: )                                                 ///
>     winitial(identity) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  1364234.7
Iteration 1:   GMM criterion Q(b) =  141803.69
Iteration 2:   GMM criterion Q(b) =  84836.523
Iteration 3:   GMM criterion Q(b) =  1073.6829
Iteration 4:   GMM criterion Q(b) =  .01215102
Iteration 5:   GMM criterion Q(b) =  1.196e-13
Iteration 6:   GMM criterion Q(b) =  2.815e-27

GMM estimation

Number of parameters =   4
Number of moments    =   4
Initial weight matrix: Identity                       Number of obs  =    4642

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
/xb_mmarried |  -.6365472   .0477985   -13.32   0.000    -.7302306   -.5428638
/xb_prenat~1 |  -.2144569   .0547524    -3.92   0.000    -.3217696   -.1071442
    /xb_cons |  -.3226297   .0471855    -6.84   0.000    -.4151115   -.2301479
        /pom |   3162.868   21.65827   146.04   0.000     3120.418    3205.317
------------------------------------------------------------------------------
Instruments for equation 1: mmarried prenatal1 _cons
Instruments for equation 2: _cons

In this output, both the point estimates and the standard errors are consistent!

They are consistent because we converted our two-step estimator into a one-step estimator.

Stata has a teffects command

What we have just done is reimplement Stata’s teffects command in a particular case. Results are identical:

. teffects ipw (bweight) (mbsmoke mmarried prenatal1, probit) , pom

Iteration 0:   EE criterion =  5.387e-22
Iteration 1:   EE criterion =  3.332e-27

Treatment-effects estimation                    Number of obs      =      4642
Estimator      : inverse-probability weights
Outcome model  : weighted mean
Treatment model: probit
------------------------------------------------------------------------------
             |               Robust
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
POmeans      |
     mbsmoke |
  nonsmoker  |   3401.441   9.528643   356.97   0.000     3382.765    3420.117
     smoker  |   3162.868   21.65827   146.04   0.000     3120.418    3205.317
------------------------------------------------------------------------------

Conclusion

To which problems can you apply this stacked two-step approach?

This approach of stacking the moment conditions is designed for two-step problems in which the number of parameters equals the number of sample moment conditions in each step. Such estimators are called exactly identified because the number of parameters is the same as the number of equations that they solve.

For exactly identified estimators, the point estimates produced by the stacked GMM are identical to the point estimates produced by the two-step estimator. The stacked GMM, however, produces consistent standard errors.

For estimators with more conditions than parameters, the stacked GMM also corrects the standard errors, but there are caveats that I’m not going to discuss here.

The stacked GMM requires that the moment conditions be continuously differentiable and satisfy standard regularity conditions. Smooth, regular ML estimators and least-squares estimators meet these requirements; see Newey (1984) for details.

The main practical hurdle is getting the moment conditions for the estimators in the different steps. If the steps involve ML, those first-derivative conditions can be directly translated to moment conditions. The calculus part is worked out in many textbooks, and sometimes even in the Stata manuals.

See [R] gmm for more information on how to use the gmm command.

References

Cattaneo, M. D. 2010. Efficient semiparametric estimation of multi-valued treatment effects under ignorability. Journal of Econometrics 155: 138–154.

Newey, W. K. 1984. A method of moments interpretation of sequential estimators. Economics Letters 14: 201–206.