Home > Statistics > Testing model specification and using the program version of gmm

Testing model specification and using the program version of gmm

This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

The command gmm is used to estimate the parameters of a model using the generalized method of moments (GMM). GMM can be used to estimate the parameters of models that have more identification conditions than parameters, overidentified models. The specification of these models can be evaluated using Hansen’s J statistic (Hansen, 1982).

We use gmm to estimate the parameters of a Poisson model with an endogenous regressor. More instruments than regressors are available, so the model is overidentified. We then use estat overid to calculate Hansen’s J statistic and test the validity of the overidentification restrictions.

In previous posts (see Estimating parameters by maximum likelihood and method of moments using mlexp and gmm and Understanding the generalized method of moments (GMM): A simple example), the interactive version of gmm has been used to estimate simple single-equation models. For more complex models, it can be easier to use the moment-evaluator program version of gmm. We demonstrate how to use this version of gmm.

Poisson model with endogenous regressors

In this post, the Poisson regression of \(y_i\) on exogenous \({\bf x}_i\) and endogenous \({\bf y}_i\) has the form
\begin{equation*}
E(y_i \vert {\bf x}_i,{\bf y}_{2,i},\epsilon_i)= \exp({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}) + \epsilon_i
\end{equation*}
where \(\epsilon_i\) is a zero-mean error term. The endogenous regressors \({\bf y}_{2,i}\) may be correlated with \(\epsilon_i\). This is the same formulation used by ivpoisson with additive errors; see [R] ivpoisson for more details. For more information on Poisson models with endogenous regressors, see Mullahy (1997), Cameron and Trivedi (2013), Windmeijer and Santos Silva (1997), and Wooldridge (2010).

Moment conditions are expected values that specify the model parameters in terms of the true moments. GMM finds the parameter values that are closest to satisfying the sample equivalent of the moment conditions. In this model, we define moment conditions using an error function,
\begin{equation*}
u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2) = y_i – \exp({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i})
\end{equation*}

Let \({\bf x}_{2,i}\) be additional exogenous variables. These are not correlated with \(\epsilon_i\), but are correlated with \({\bf y}_{2,i}\). Combining them with \({\bf x}_i\), we have the instruments \({\bf z}_i = (\begin{matrix} {\bf x}_{i} & {\bf x}_{2,i}\end{matrix})\). So the moment conditions are
\begin{equation*}
E({\bf z}_i u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)) = {\bf 0}
\end{equation*}

Suppose there are \(k\) parameters in \({\boldsymbol \beta}_1\) and \({\boldsymbol \beta}_2\) and \(q\) instruments. When \(q>k\), there are more moment conditions than parameters. The model is overidentified. Here GMM finds parameter estimates that solve weighted moment conditions. GMM minimizes
\[
Q({{\boldsymbol \beta}_1},{\boldsymbol \beta}_2) = \left\{\frac{1}{N}\sum\nolimits_i {{\bf z}}_i
u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)\right\}
{\bf W}
\left\{\frac{1}{N}\sum\nolimits_i {{\bf z}}_i u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)\right\}’
\]
for \(q\times q\) weight matrix \({\bf W}\).

Overidentification test

When the model is correctly specified,
\begin{equation*}
E({\bf z}_i u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)) = {\bf 0}
\end{equation*}

In this case, if \({\bf W}\) is an optimal weight matrix, it is equal to the inverse of the covariance matrix of the moment conditions. Here we have
\[
{\bf W}^{-1} = E\{{\bf z}_i’ u_{i}({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)
u_{i}({\boldsymbol \beta}_1,{\boldsymbol \beta}_2) {\bf z}_i\}
\]

Hansen’s test evaluates the null hypothesis that an overidentified model is correctly specified. The test statistic \(J = N Q(\hat{\boldsymbol \beta}_1, \hat{\boldsymbol \beta}_2)\) is used. If \({\bf W}\) is an optimal weight matrix, under the null hypothesis, Hansen’s J statistic has a \(\chi^2(q-k)\) distribution.

The two-step and iterated estimators used by gmm provide estimates of the optimal W. For overidentified models, the estat overid command calculates Hansen’s J statistic after these estimators are used.

Moment-evaluator program

We define a program that can be called by gmm in calculating the moment conditions for Poisson models with endogenous regressors. See Programming an estimation command in Stata: A map to posted entries for more information about programming in Stata. The program calculates the error function \(u_i\), and gmm generates the moment conditions by multiplying by the instruments \({\bf z}_i\).

To solve the weighted moment conditions, gmm must take the derivative of the moment conditions with respect to the parameters. Using the chain rule, these are the derivatives of the error functions multiplied by the instruments. Users may specify these derivatives themselves, or gmm will calculate the derivatives numerically. Users can gain speed and numeric stability by properly specifying the derivatives themselves.

When linear forms of the parameters are estimated, users may specify derivatives to gmm in terms of the linear form (prediction). The chain rule is then used by gmm to determine the derivatives of the error function \(u_i\) with respect to the parameters. Our error function \(u_i\) is a function of the linear prediction \({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}\).

The program gmm_ivpois calculates the error function \(u_i\) and the derivative of \(u_i\) in terms of the linear prediction \({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}\).

program gmm_ivpois
    version 14.1
    syntax varlist [if], at(name) depvar(varlist) rhs(varlist) ///
           [derivatives(varlist)]
    tempvar m
    quietly gen double `m' = 0 `if'
    local i = 1
    foreach var of varlist `rhs' {
        quietly replace `m' = `m' + `var'*`at'[1,`i'] `if'
        local i = `i' + 1
    }
    quietly replace `m' = `m' + `at'[1,`i'] `if'
    quietly replace `varlist' = `depvar' - exp(`m') `if'
    if "`derivatives'" == "" {
         exit
    }
    replace `derivatives' = -exp(`m')
end

Lines 3–4 of gmm_ivpois contain the syntax statement that parses the arguments to the program. All moment-evaluator programs must accept a varlist, the if condition, and the at() option. The varlist corresponds to variables that store the values of the error functions. The program gmm_ivpois will calculate the error function and store it in the specified varlist. The at() option is specified with the name of a matrix that contains the model parameters. The if condition specifies the observations for which estimation is performed.

The program also requires the options depvar() and rhs(). The name of the dependent variable is specified in the depvar() option. The regressors are specified in the rhs() option.

On line 4, derivatives() is optional. The variable name specified here corresponds to the derivative of the error function with respect to the linear prediction.

The linear prediction of the regressors is stored in the temporary variable m over lines 6–12. On line 13, we give the value of the error function to the specified varlist. Lines 14–16 allow the program to exit if derivatives() is not specified. Otherwise, on line 17, we store the value of the derivative of the error function with respect to the linear prediction in the variable specified in derivatives().

The data

We simulate data from a Poisson regression with an endogenous covariate, and then we use gmm and the gmm_ivpois program to estimate the parameters of the regression. We will then use estat overid to check the specification of the model. We simulate a random sample of 3,000 observations.

. set seed  45

. set obs 3000
number of observations (_N) was 0, now 3,000

. generate x = rnormal()*.8 + .5

. generate z = rchi2(1)

. generate w = rnormal()*.5

. matrix cm = (1, .9 \ .9, 1)

. matrix sd = (.5,.8)

. drawnorm e u, corr(cm) sd(sd)

We generate the exogenous covariates \(x\), \(z\), and \(w\). The variable \(x\) will be a regressor, while \(z\) and \(w\) will be extra instruments. Then we use drawnorm to draw the errors \(e\) and \(u\). The errors are positively correlated.

. generate y2 = exp(.2*x + .1*z + .3*w -1 + u)

. generate y = exp(.5*x + .2*y2+1) + e

We generate the endogenous regressor \(y2\) as a lognormal regression on the instruments. The outcome of interest \(y\) has an exponential mean on \(x\) and \(y2\), with \(e\) as an additive error. As \(e\) is correlated with \(u\), \(y2\) is correlated with \(e\).

Estimating the model parameters

Now we use gmm to estimate the parameters of the Poisson regression with endogenous covariates. The name of our moment-evaluator program is listed to the right of gmm. The instruments that gmm will use to form the moment conditions are listed in instruments(). We specify the options depvar() and rhs() with the appropriate variables. They will be passed on to gmm_ivpois.

The parameters are specified as the linear form y in the parameters() option, while we specify haslfderivatives to inform gmm that gmm_ivpois provides derivatives of this linear form. The option nequations() tells gmm how many error functions to expect.

. gmm gmm_ivpois, depvar(y) rhs(x y2)             ///
>         haslfderivatives instruments(x z w)     ///                            
>         parameters({y: x y2 _cons}) nequations(1)

Step 1
Iteration 0:   GMM criterion Q(b) =  14.960972
Iteration 1:   GMM criterion Q(b) =  3.3038486
Iteration 2:   GMM criterion Q(b) =  .59045217
Iteration 3:   GMM criterion Q(b) =  .00079862
Iteration 4:   GMM criterion Q(b) =  .00001419
Iteration 5:   GMM criterion Q(b) =  .00001418

Step 2
Iteration 0:   GMM criterion Q(b) =   .0000567
Iteration 1:   GMM criterion Q(b) =  .00005648
Iteration 2:   GMM criterion Q(b) =  .00005648

GMM estimation

Number of parameters =   3
Number of moments    =   4
Initial weight matrix: Unadjusted                 Number of obs   =      3,000
GMM weight matrix:     Robust

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .5006366   .0033273   150.46   0.000     .4941151     .507158
          y2 |   .2007893   .0075153    26.72   0.000     .1860597    .2155189
       _cons |   1.000717   .0063414   157.81   0.000      .988288    1.013146
------------------------------------------------------------------------------
Instruments for equation 1: x z w _cons

Our coefficients are significant. However, the model could still be misspecified.

Overidentification test

We use estat overid to compute Hansen’s J statistic.

. estat overid

  Test of overidentifying restriction:

  Hansen's J chi2(1) = .169449 (p = 0.6806)

The J statistic equals 0.17. In addition to computing Hansen’s J, estat overid provides a test against misspecification of the model. In this case, we have one more instrument than regressor, so the J statistic has a \(\chi^2(1)\) distribution. The probability of obtaining a \(\chi^2(1)\) value greater than 0.17 is given in parentheses. This probability—the p-value of the test—is large and so we fail to reject the null hypothesis that the model is properly specified.

Conclusion

We have demonstrated how to estimate the parameters of a Poisson regression with an endogenous regressor using the moment-evaluator program version of gmm. We have also demonstrated how to use estat overid to test for model misspecification after estimation of an overidentified model in gmm. See [R] gmm and [R] gmm postestimation for more information.

References

Cameron, A. C., and P. K. Trivedi. 2013. Regression Analysis of Count Data. 2nd ed. New York: Cambridge University Press.

Hansen, L. P. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–1054.

Mullahy, J. 1997. Instrumental-variable estimation of count data models: Applications to models of cigarette smoking behavior. Review of Economics and Statistics 79: 586–593.

Windmeijer, F., and J. M. C. Santos Silva. 1997. Endogeneity in count data models: An application to demand for health care. Journal of Applied Econometrics 12: 281–294.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press.