Home > Statistics > Estimating parameters by maximum likelihood and method of moments using mlexp and gmm

Estimating parameters by maximum likelihood and method of moments using mlexp and gmm

\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}
\newcommand{\ebi}{\boldsymbol{\epsilon}_i}
\newcommand{\Sigmab}{\boldsymbol{\Sigma}}
\newcommand{\Omegab}{\boldsymbol{\Omega}}
\newcommand{\Lambdab}{\boldsymbol{\Lambda}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\gammab}{\boldsymbol{\gamma}}
\newcommand{\Gammab}{\boldsymbol{\Gamma}}
\newcommand{\deltab}{\boldsymbol{\delta}}
\newcommand{\xib}{\boldsymbol{\xi}}
\newcommand{\iotab}{\boldsymbol{\iota}}
\newcommand{\xb}{{\bf x}}
\newcommand{\xbit}{{\bf x}_{it}}
\newcommand{\xbi}{{\bf x}_{i}}
\newcommand{\zb}{{\bf z}}
\newcommand{\zbi}{{\bf z}_i}
\newcommand{\wb}{{\bf w}}
\newcommand{\yb}{{\bf y}}
\newcommand{\ub}{{\bf u}}
\newcommand{\Gb}{{\bf G}}
\newcommand{\Hb}{{\bf H}}
\newcommand{\thetab}{\boldsymbol{\theta}}
\newcommand{\XBI}{{\bf x}_{i1},\ldots,{\bf x}_{iT}}
\newcommand{\Sb}{{\bf S}} \newcommand{\Xb}{{\bf X}}
\newcommand{\Xtb}{\tilde{\bf X}}
\newcommand{\Wb}{{\bf W}}
\newcommand{\Ab}{{\bf A}}
\newcommand{\Bb}{{\bf B}}
\newcommand{\Zb}{{\bf Z}}
\newcommand{\Eb}{{\bf E}}\) This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

Overview

We provide an introduction to parameter estimation by maximum likelihood and method of moments using mlexp and gmm, respectively (see [R] mlexp and [R] gmm). We include some background about these estimation techniques; see Pawitan (2001, Casella and Berger (2002), Cameron and Trivedi (2005), and Wooldridge (2010) for more details.

Maximum likelihood (ML) estimation finds the parameter values that make the observed data most probable. The parameters maximize the log of the likelihood function that specifies the probability of observing a particular set of data given a model.

Method of moments (MM) estimators specify population moment conditions and find the parameters that solve the equivalent sample moment conditions. MM estimators usually place fewer restrictions on the model than ML estimators, which implies that MM estimators are less efficient but more robust than ML estimators.

Using mlexp to estimate probit model parameters

A probit model for the binary dependent variable \(y\) conditional on covariates \(\xb\) with coefficients \(\betab\) is

\[\begin{equation}
y = \begin{cases}
1 & \mbox{ if } \xb\betab’ + \epsilon > 0\\
0 & \mbox{ otherwise }
\end{cases}
\end{equation}\]

where \(\epsilon\) has a standard normal distribution. The log-likelihood function for the probit model is

\[\begin{equation}\label{E:b1}
\ln\{L(\betab;\xb,y)\}= \sum_{i=1}^N y_i \ln\Phi(\xb_{i}\betab’)
+ (1-y_i) \ln\Phi(-\xb_{i}\betab’)
\end{equation}\]

where \(\Phi\) denotes the cumulative standard normal.

We now use mlexp to estimate the coefficients of a probit model. We have data on whether an individual belongs to a union (union), the individual’s age (age), and the highest grade completed (grade).

. webuse union
(NLS Women 14-24 in 1968)

. mlexp ( union*lnnormal({b1}*age + {b2}*grade + {b0})    ///
>         + (1-union)*lnnormal(-({b1}*age + {b2}*grade + {b0})) )

initial:       log likelihood = -18160.456
alternative:   log likelihood = -1524604.4
rescale:       log likelihood = -14097.135
rescale eq:    log likelihood =  -14063.38
Iteration 0:   log likelihood =  -14063.38  
Iteration 1:   log likelihood = -13796.715  
Iteration 2:   log likelihood = -13796.336  
Iteration 3:   log likelihood = -13796.336  

Maximum likelihood estimation

Log likelihood = -13796.336                     Number of obs     =     26,200

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         /b1 |   .0051821   .0013471     3.85   0.000     .0025418    .0078224
         /b2 |   .0373899   .0035814    10.44   0.000     .0303706    .0444092
         /b0 |  -1.404697   .0587797   -23.90   0.000    -1.519903   -1.289491
------------------------------------------------------------------------------

Defining a linear combination of the covariates makes it easier to specify the model and to read the output:

. mlexp ( union*lnnormal({xb:age grade _cons}) + (1-union)*lnnormal(-{xb:}) )

initial:       log likelihood = -18160.456
alternative:   log likelihood = -14355.672
rescale:       log likelihood = -14220.454
Iteration 0:   log likelihood = -14220.454  
Iteration 1:   log likelihood = -13797.767  
Iteration 2:   log likelihood = -13796.336  
Iteration 3:   log likelihood = -13796.336  

Maximum likelihood estimation

Log likelihood = -13796.336                     Number of obs     =     26,200

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0051821   .0013471     3.85   0.000     .0025418    .0078224
       grade |   .0373899   .0035814    10.44   0.000     .0303706    .0444092
       _cons |  -1.404697   .0587797   -23.90   0.000    -1.519903   -1.289491
------------------------------------------------------------------------------

Using gmm to estimate parameters by MM

ML specifies a functional form for the distribution of \(y\) conditional on \(\xb\). Specifying \(\Eb[y|\xb]=\Phi(\xb\betab’)\) is less restrictive because it imposes structure only on the first conditional moment instead of on all the conditional moments. Under correct model specification, the ML estimator is more efficient than the MM
estimator because it correctly specifies the conditional mean and all other conditional moments.

The model assumption \(\Eb[y|\xb]=\Phi(\xb\betab’)\) implies the moment conditions \(\Eb[\{y-\Phi(\xb\betab’)\}\xb] = {\bf 0}\). The sample moment equivalent is

\[\sum_{i=1}^N [\{y_i-\Phi(\xb_i\betab’)\}\xb_i] = {\bf 0}\]

In the gmm command below, we specify the residuals \(y_i-\Phi(\xb_i\betab’)\) inside the parentheses and the variables that multiply them, known as instruments, in the option instruments().

. gmm ( union - normal({xb:age grade _cons}) ), instruments(age grade) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  .07831137  
Iteration 1:   GMM criterion Q(b) =  .00004813  
Iteration 2:   GMM criterion Q(b) =  5.333e-09  
Iteration 3:   GMM criterion Q(b) =  5.789e-17  

note: model is exactly identified

GMM estimation 

Number of parameters =   3
Number of moments    =   3
Initial weight matrix: Unadjusted                 Number of obs   =     26,200

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0051436   .0013349     3.85   0.000     .0025272      .00776
       grade |   .0383185   .0038331    10.00   0.000     .0308058    .0458312
       _cons |  -1.415623   .0609043   -23.24   0.000    -1.534994   -1.296253
------------------------------------------------------------------------------
Instruments for equation 1: age grade _cons

The point estimates are similar to the ML estimates because both estimators are consistent.

Using gmm to estimate parameters by ML

When we maximize a log-likelihood function, we find the parameters that set the first derivative to 0. For example, setting the first derivative of the probit log-likelihood function with respect to \(\betab\) to 0 in the sample yields

\[\begin{equation}\label{E:b2}
\frac{\partial \ln\{L(\beta;\xb,y)\}}{\partial \betab} =
\sum_{i=1}^N \left\{y_i \frac{\phi(\xb_{i}\betab’)}{\Phi(\xb_{i}\betab’)}
– (1-y_i) \frac{\phi(-\xb_{i}\betab’)}{\Phi(-\xb_{i}\betab’)}\right\}
\xb_{i} = {\bf 0}
\end{equation}\]

Below, we use gmm to find the parameters that solve these sample moment conditions:

. gmm ( union*normalden({xb:age grade _cons})/normal({xb:})       ///
>         -(1-union)*normalden(-{xb:})/normal(-{xb:}) ),          ///
>         instruments(age grade) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  .19941827  
Iteration 1:   GMM criterion Q(b) =  .00012506  
Iteration 2:   GMM criterion Q(b) =  2.260e-09  
Iteration 3:   GMM criterion Q(b) =  7.369e-19  

note: model is exactly identified

GMM estimation 

Number of parameters =   3
Number of moments    =   3
Initial weight matrix: Unadjusted                 Number of obs   =     26,200

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0051821    .001339     3.87   0.000     .0025577    .0078065
       grade |   .0373899   .0037435     9.99   0.000     .0300528     .044727
       _cons |  -1.404697   .0601135   -23.37   0.000    -1.522517   -1.286876
------------------------------------------------------------------------------
Instruments for equation 1: age grade _cons

The point estimates match those reported by mlexp. The standard errors differ because gmm reports robust standard errors.

Summary

We showed how to easily estimate the probit model parameters by ML and by MM using mlexp and gmm, respectively. We also showed that you can estimate these parameters using restrictions imposed by conditional distributions or using weaker conditional moment restrictions. Finally, we illustrated that the equations imposed by the conditional distributions can be viewed as sample moment restrictions.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics Methods and Applications. 1st ed. New York: Cambridge University Press.

Casella, G., and R. L. Berger. 2002. Statistical Inference. 2nd ed. Pacific Grove, CA: Duxbury.

Pawitan, Y. 2001. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford: Oxford University Press.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. MIT Press.