## Using gmm to solve two-step estimation problems

Two-step estimation problems can be solved using the **gmm** command.

When a two-step estimator produces consistent point estimates but inconsistent standard errors, it is known as the two-step-estimation problem. For instance, inverse-probability weighted (IPW) estimators are a weighted average in which the weights are estimated in the first step. Two-step estimators use first-step estimates to estimate the parameters of interest in a second step. The two-step-estimation problem arises because the second step ignores the estimation error in the first step.

One solution is to convert the two-step estimator into a one-step estimator. My favorite way to do this conversion is to stack the equations solved by each of the two estimators and solve them jointly. This one-step approach produces consistent point estimates and consistent standard errors. There is no two-step problem because all the computations are performed jointly. Newey (1984) derives and justifies this approach.

I’m going to illustrate this approach with the IPW example, but it can be used with any two-step problem as long as each step is continuous.

IPW estimators are frequently used to estimate the mean that would be observed if everyone in a population received a specified treatment, a quantity known as a potential-outcome mean (POM). A difference of POMs is called the average treatment effect (ATE). Aside from all that, it is the mechanics of the two-step IPW estimator that interest me here. IPW estimators are weighted averages of the outcome, and the weights are estimated in a first step. The weights used in the second step are the inverse of the estimated probability of treatment.

Let’s imagine we are analyzing an extract of the birthweight data used by Cattaneo (2010). In this dataset, **bweight** is the baby’s weight at birth, **mbsmoke** is 1 if the mother smoked while pregnant (and 0 otherwise), **mmarried** is 1 if the mother is married, and **prenatal1** is 1 if the mother had a prenatal visit in the first trimester.

Let’s imagine we want to estimate the mean when all pregnant women smoked, which is to say, the POM for smoking. If we were doing substantive research, we would also estimate the POM when no pregnant women smoked. The difference between these estimated POMs would then estimate the ATE of smoking.

In the IPW estimator, we begin by estimating the probability weights for smoking. We fit a probit model of **mbsmoke** as a function of **mmarried** and **prenatal1**.

. use cattaneo2 (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154) . probit mbsmoke mmarried prenatal1, vce(robust) Iteration 0: log pseudolikelihood = -2230.7484 Iteration 1: log pseudolikelihood = -2102.6994 Iteration 2: log pseudolikelihood = -2102.1437 Iteration 3: log pseudolikelihood = -2102.1436 Probit regression Number of obs = 4642 Wald chi2(2) = 259.42 Prob > chi2 = 0.0000 Log pseudolikelihood = -2102.1436 Pseudo R2 = 0.0577 ------------------------------------------------------------------------------ | Robust mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | -.6365472 .0478037 -13.32 0.000 -.7302407 -.5428537 prenatal1 | -.2144569 .0547583 -3.92 0.000 -.3217811 -.1071327 _cons | -.3226297 .0471906 -6.84 0.000 -.4151215 -.2301379 ------------------------------------------------------------------------------

The results indicate that both **mmarried** and **prenatal1** significantly predict whether the mother smoked while pregnant.

We want to calculate the inverse probabilities. We begin by getting the probabilities:

. predict double pr, pr

Now, we can obtain the inverse probabilities by typing

. generate double ipw = (mbsmoke==1)/pr

We can now perform the second step: calculate the mean for smokers by using the IPWs.

. mean bweight [pw=ipw] Mean estimation Number of obs = 864 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ bweight | 3162.868 21.71397 3120.249 3205.486 -------------------------------------------------------------- . mean bweight [pw=ipw] if mbsmoke

The point estimate reported by **mean** is consistent; the reported standard error is not. It is not because **mean** takes the weights as fixed when they were in fact estimated.

The stacked two-step—using **gmm** to solve the two-step-estimation problem—instead creates a one-step estimator that solves both steps simultaneously.

To do that, we have to find and then code the moment conditions.

So what are the moment conditions for the first-step maximum-likelihood probit? Maximum likelihood (ML) estimators obtain their parameter estimates by finding the parameters that set the means of the first derivatives with respect to each parameter to 0. The means of the first derivatives are the moments.

The moment conditions are that the means of the first derivatives equal 0. We can obtain those first derivatives for ourselves, or we can copy them from the *Methods and formulas* section of **[R] probit**:

\[

1/N\sum_{i=1}^N\frac{ \phi({\bf x}_i\boldsymbol{\beta}’)

\left\{d_i-\Theta\left({\bf

x}_i\boldsymbol{\beta}’\right)\right\}}{\Theta\left({\bf

x}_i\boldsymbol{\beta}’\right)

\left\{1-\Theta\left({\bf x}_i\boldsymbol{\beta}’\right)\right\}}{\bf x}_i’ = {\bf 0}

\]

where \(\phi()\) is the density function of the standard normal distribution, \(d_i\) is the binary variable that is 1 for treated individuals (and 0 otherwise), and \(\Theta()\) is the cumulative probability function of the standard normal.

What’s the point of these moment conditions? We are going to use the generalized method of moments (GMM) to solve for the ML probit estimates. GMM is an estimation framework that defines estimators that solve moment conditions. The GMM estimator that sets the mean of the first derivatives of the ML probit to 0 produces the same point estimates as the ML probit estimator.

Stata’s GMM estimator is the **gmm** command; see **[R] gmm** for an introduction.

The structure of these moment conditions greatly simplifies the problem. For each observation, the left-hand side is the product of a scalar subexpression, namely,

\[

\frac{\phi({\bf x}_i\boldsymbol{\beta}’)\{d_i-\Theta({\bf

x}_i\boldsymbol{\beta}’)\}}

{\Theta({\bf x}_i\boldsymbol{\beta}’)\{1-\Theta({\bf

x}_i\boldsymbol{\beta}’)\}}

\]

and the covariates \({\bf x}_i\). In GMM parlance, the variables that multiply the scalar expression are called instruments.

The **gmm** command that will solve these moment conditions is

. generate double cons = 1 . gmm (normalden({xb:mmarried prenatal1 cons})*(mbsmoke - normal({xb:}))/ /// > (normal({xb:})*(1-normal({xb:})) )), /// > instruments(mmarried prenatal1 ) winitial(identity) onestep Step 1 Iteration 0: GMM criterion Q(b) = .61413428 Iteration 1: GMM criterion Q(b) = .00153235 Iteration 2: GMM criterion Q(b) = 1.652e-06 Iteration 3: GMM criterion Q(b) = 1.217e-12 Iteration 4: GMM criterion Q(b) = 7.162e-25 GMM estimation Number of parameters = 3 Number of moments = 3 Initial weight matrix: Identity Number of obs = 4642 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /xb_mmarried | -.6365472 .0477985 -13.32 0.000 -.7302306 -.5428638 /xb_prenat~1 | -.2144569 .0547524 -3.92 0.000 -.3217696 -.1071442 /xb_cons | -.3226297 .0471855 -6.84 0.000 -.4151115 -.2301479 ------------------------------------------------------------------------------ Instruments for equation 1: mmarried prenatal1 _cons

With **gmm**, we specify in parentheses the scalar expression, and we specify the covariates in the **instruments()** option. The unknown parameters are the implied coefficients on the variables specified in **{xb:mmarried prenatal1 cons}**. Note that we subsequently refer to the linear combination as **{xb:}**.

The **winitial(identity)** and **onestep** options help the solution-finding technique.

The point estimates and the standard errors produced by the **gmm** command match those reported by **probit**, ignoring numerical issues.

Now that we can use **gmm** to obtain our first-step estimates, we need to add the moment condition that defines the weighted average of the POM for smokers. The equation for the POM for smokers is

\[

{\rm POM} = 1/N\sum_{i=1}^{N}{{\bf mbsmoke}_i\over{\Phi({\bf x}_i\boldsymbol{\beta})}}

\]

Recall that the inverse weights are \(1/\Phi({\bf x}_i\boldsymbol{\beta})\) for smokers. When we solved this problem using a two-step estimator, we performed the second step only for smokers. We typed **mean bweight [pw=ipw] if mbsmoke==1**. We cannot use **if mbsmoke==1** in the **gmm** command because the first step has to be performed over all the data. Instead, we set the weights to 0 in the second step for the nonsmokers. Multiplying \(1/\Phi({\bf x}_i\boldsymbol{\beta})\) by \({\bf mbsmoke}_i\) does that.

Anyway, the equation for the POM for smokers is

\[

{\rm POM} = 1/N\sum_{i=1}^{N}{{\bf mbsmoke}_i\over{\Phi({\bf x}_i\boldsymbol{\beta})}}\]

and the moment condition is therefore

\[

1/N\sum_{i=1}^{N}{{\bf mbsmoke}_i\over{\Phi({\bf x}_i\boldsymbol{\beta})}} – {\rm

POM} = 0

\]

In the **gmm** command below, I call the scalar expression for the probit moment conditions **eq1**, and I call the scalar expression for the POM weighted-average equation **eq2**. Both moment conditions have the scalar-expression-times-instrument structure, but the weighted-average moment expression is multiplied by a constant that is included as an instrument by default. In the weighted-average moment condition, parameter **pom** is the POM we wish to estimate.

. gmm (eq1: normalden({xb:mmarried prenatal1 cons})* /// > (mbsmoke - normal({xb:}))/(normal({xb:})*(1-normal({xb:})) )) /// > (eq2: (mbsmoke/normal({xb:}))*(bweight - {pom})), /// > instruments(eq1:mmarried prenatal1 ) /// > instruments(eq2: ) /// > winitial(identity) onestep Step 1 Iteration 0: GMM criterion Q(b) = 1364234.7 Iteration 1: GMM criterion Q(b) = 141803.69 Iteration 2: GMM criterion Q(b) = 84836.523 Iteration 3: GMM criterion Q(b) = 1073.6829 Iteration 4: GMM criterion Q(b) = .01215102 Iteration 5: GMM criterion Q(b) = 1.196e-13 Iteration 6: GMM criterion Q(b) = 2.815e-27 GMM estimation Number of parameters = 4 Number of moments = 4 Initial weight matrix: Identity Number of obs = 4642 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /xb_mmarried | -.6365472 .0477985 -13.32 0.000 -.7302306 -.5428638 /xb_prenat~1 | -.2144569 .0547524 -3.92 0.000 -.3217696 -.1071442 /xb_cons | -.3226297 .0471855 -6.84 0.000 -.4151115 -.2301479 /pom | 3162.868 21.65827 146.04 0.000 3120.418 3205.317 ------------------------------------------------------------------------------ Instruments for equation 1: mmarried prenatal1 _cons Instruments for equation 2: _cons

In this output, both the point estimates and the standard errors are consistent!

They are consistent because we converted our two-step estimator into a one-step estimator.

## Stata has a teffects command

What we have just done is reimplement Stata’s **teffects** command in a particular case. Results are identical:

. teffects ipw (bweight) (mbsmoke mmarried prenatal1, probit) , pom Iteration 0: EE criterion = 5.387e-22 Iteration 1: EE criterion = 3.332e-27 Treatment-effects estimation Number of obs = 4642 Estimator : inverse-probability weights Outcome model : weighted mean Treatment model: probit ------------------------------------------------------------------------------ | Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- POmeans | mbsmoke | nonsmoker | 3401.441 9.528643 356.97 0.000 3382.765 3420.117 smoker | 3162.868 21.65827 146.04 0.000 3120.418 3205.317 ------------------------------------------------------------------------------

## Conclusion

To which problems can you apply this stacked two-step approach?

This approach of stacking the moment conditions is designed for two-step problems in which the number of parameters equals the number of sample moment conditions in each step. Such estimators are called exactly identified because the number of parameters is the same as the number of equations that they solve.

For exactly identified estimators, the point estimates produced by the stacked GMM are identical to the point estimates produced by the two-step estimator. The stacked GMM, however, produces consistent standard errors.

For estimators with more conditions than parameters, the stacked GMM also corrects the standard errors, but there are caveats that I’m not going to discuss here.

The stacked GMM requires that the moment conditions be continuously differentiable and satisfy standard regularity conditions. Smooth, regular ML estimators and least-squares estimators meet these requirements; see Newey (1984) for details.

The main practical hurdle is getting the moment conditions for the estimators in the different steps. If the steps involve ML, those first-derivative conditions can be directly translated to moment conditions. The calculus part is worked out in many textbooks, and sometimes even in the Stata manuals.

See **[R] gmm** for more information on how to use the **gmm** command.

### References

Cattaneo, M. D. 2010. Efficient semiparametric estimation of multi-valued treatment effects under ignorability. *Journal of Econometrics* 155: 138–154.

Newey, W. K. 1984. A method of moments interpretation of sequential estimators. *Economics Letters* 14: 201–206.