Author Archive

Using gmm to solve two-step estimation problems

Two-step estimation problems can be solved using the gmm command.

When a two-step estimator produces consistent point estimates but inconsistent standard errors, it is known as the two-step-estimation problem. For instance, inverse-probability weighted (IPW) estimators are a weighted average in which the weights are estimated in the first step. Two-step estimators use first-step estimates to estimate the parameters of interest in a second step. The two-step-estimation problem arises because the second step ignores the estimation error in the first step.

One solution is to convert the two-step estimator into a one-step estimator. My favorite way to do this conversion is to stack the equations solved by each of the two estimators and solve them jointly. This one-step approach produces consistent point estimates and consistent standard errors. There is no two-step problem because all the computations are performed jointly. Newey (1984) derives and justifies this approach.

I’m going to illustrate this approach with the IPW example, but it can be used with any two-step problem as long as each step is continuous.

IPW estimators are frequently used to estimate the mean that would be observed if everyone in a population received a specified treatment, a quantity known as a potential-outcome mean (POM). A difference of POMs is called the average treatment effect (ATE). Aside from all that, it is the mechanics of the two-step IPW estimator that interest me here. IPW estimators are weighted averages of the outcome, and the weights are estimated in a first step. The weights used in the second step are the inverse of the estimated probability of treatment.

Let’s imagine we are analyzing an extract of the birthweight data used by Cattaneo (2010). In this dataset, bweight is the baby’s weight at birth, mbsmoke is 1 if the mother smoked while pregnant (and 0 otherwise), mmarried is 1 if the mother is married, and prenatal1 is 1 if the mother had a prenatal visit in the first trimester.

Let’s imagine we want to estimate the mean when all pregnant women smoked, which is to say, the POM for smoking. If we were doing substantive research, we would also estimate the POM when no pregnant women smoked. The difference between these estimated POMs would then estimate the ATE of smoking.

In the IPW estimator, we begin by estimating the probability weights for smoking. We fit a probit model of mbsmoke as a function of mmarried and prenatal1.

. use cattaneo2
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. probit mbsmoke mmarried prenatal1, vce(robust)

Iteration 0:   log pseudolikelihood = -2230.7484
Iteration 1:   log pseudolikelihood = -2102.6994
Iteration 2:   log pseudolikelihood = -2102.1437
Iteration 3:   log pseudolikelihood = -2102.1436

Probit regression                                 Number of obs   =       4642
                                                  Wald chi2(2)    =     259.42
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -2102.1436                 Pseudo R2       =     0.0577

             |               Robust
     mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    mmarried |  -.6365472   .0478037   -13.32   0.000    -.7302407   -.5428537
   prenatal1 |  -.2144569   .0547583    -3.92   0.000    -.3217811   -.1071327
       _cons |  -.3226297   .0471906    -6.84   0.000    -.4151215   -.2301379

The results indicate that both mmarried and prenatal1 significantly predict whether the mother smoked while pregnant.

We want to calculate the inverse probabilities. We begin by getting the probabilities:

. predict double pr, pr

Now, we can obtain the inverse probabilities by typing

. generate double ipw = (mbsmoke==1)/pr

We can now perform the second step: calculate the mean for smokers by using the IPWs.

. mean bweight [pw=ipw]

Mean estimation                     Number of obs    =     864

             |       Mean   Std. Err.     [95% Conf. Interval]
     bweight |   3162.868   21.71397      3120.249    3205.486
. mean bweight [pw=ipw] if mbsmoke

The point estimate reported by mean is consistent; the reported standard error Read more…

New Wooldridge edition just made available

Insiders have been waiting for the second edition of Econometric Analysis of Cross Section and Panel Data by Jeffrey M. Wooldridge. I have a copy and really recommend it; later I will write a review as to why.

The book is available at the Stata bookstore and the MIT Press bookstore. It is $84 at our bookstore and $94 at MIT. The book is not yet available from Amazon.