## Using gmm to solve two-step estimation problems

Two-step estimation problems can be solved using the **gmm** command.

When a two-step estimator produces consistent point estimates but inconsistent standard errors, it is known as the two-step-estimation problem. For instance, inverse-probability weighted (IPW) estimators are a weighted average in which the weights are estimated in the first step. Two-step estimators use first-step estimates to estimate the parameters of interest in a second step. The two-step-estimation problem arises because the second step ignores the estimation error in the first step.

One solution is to convert the two-step estimator into a one-step estimator. My favorite way to do this conversion is to stack the equations solved by each of the two estimators and solve them jointly. This one-step approach produces consistent point estimates and consistent standard errors. There is no two-step problem because all the computations are performed jointly. Newey (1984) derives and justifies this approach.

I’m going to illustrate this approach with the IPW example, but it can be used with any two-step problem as long as each step is continuous.

IPW estimators are frequently used to estimate the mean that would be observed if everyone in a population received a specified treatment, a quantity known as a potential-outcome mean (POM). A difference of POMs is called the average treatment effect (ATE). Aside from all that, it is the mechanics of the two-step IPW estimator that interest me here. IPW estimators are weighted averages of the outcome, and the weights are estimated in a first step. The weights used in the second step are the inverse of the estimated probability of treatment.

Let’s imagine we are analyzing an extract of the birthweight data used by Cattaneo (2010). In this dataset, **bweight** is the baby’s weight at birth, **mbsmoke** is 1 if the mother smoked while pregnant (and 0 otherwise), **mmarried** is 1 if the mother is married, and **prenatal1** is 1 if the mother had a prenatal visit in the first trimester.

Let’s imagine we want to estimate the mean when all pregnant women smoked, which is to say, the POM for smoking. If we were doing substantive research, we would also estimate the POM when no pregnant women smoked. The difference between these estimated POMs would then estimate the ATE of smoking.

In the IPW estimator, we begin by estimating the probability weights for smoking. We fit a probit model of **mbsmoke** as a function of **mmarried** and **prenatal1**.

. use cattaneo2 (Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154) . probit mbsmoke mmarried prenatal1, vce(robust) Iteration 0: log pseudolikelihood = -2230.7484 Iteration 1: log pseudolikelihood = -2102.6994 Iteration 2: log pseudolikelihood = -2102.1437 Iteration 3: log pseudolikelihood = -2102.1436 Probit regression Number of obs = 4642 Wald chi2(2) = 259.42 Prob > chi2 = 0.0000 Log pseudolikelihood = -2102.1436 Pseudo R2 = 0.0577 ------------------------------------------------------------------------------ | Robust mbsmoke | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mmarried | -.6365472 .0478037 -13.32 0.000 -.7302407 -.5428537 prenatal1 | -.2144569 .0547583 -3.92 0.000 -.3217811 -.1071327 _cons | -.3226297 .0471906 -6.84 0.000 -.4151215 -.2301379 ------------------------------------------------------------------------------

The results indicate that both **mmarried** and **prenatal1** significantly predict whether the mother smoked while pregnant.

We want to calculate the inverse probabilities. We begin by getting the probabilities:

. predict double pr, pr

Now, we can obtain the inverse probabilities by typing

. generate double ipw = (mbsmoke==1)/pr

We can now perform the second step: calculate the mean for smokers by using the IPWs.

. mean bweight [pw=ipw] Mean estimation Number of obs = 864 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ bweight | 3162.868 21.71397 3120.249 3205.486 -------------------------------------------------------------- . mean bweight [pw=ipw] if mbsmoke

The point estimate reported by **mean** is consistent; the reported standard error Read more…