Estimating parameters by maximum likelihood and method of moments using mlexp and gmm
\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}
\newcommand{\ebi}{\boldsymbol{\epsilon}_i}
\newcommand{\Sigmab}{\boldsymbol{\Sigma}}
\newcommand{\Omegab}{\boldsymbol{\Omega}}
\newcommand{\Lambdab}{\boldsymbol{\Lambda}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\gammab}{\boldsymbol{\gamma}}
\newcommand{\Gammab}{\boldsymbol{\Gamma}}
\newcommand{\deltab}{\boldsymbol{\delta}}
\newcommand{\xib}{\boldsymbol{\xi}}
\newcommand{\iotab}{\boldsymbol{\iota}}
\newcommand{\xb}{{\bf x}}
\newcommand{\xbit}{{\bf x}_{it}}
\newcommand{\xbi}{{\bf x}_{i}}
\newcommand{\zb}{{\bf z}}
\newcommand{\zbi}{{\bf z}_i}
\newcommand{\wb}{{\bf w}}
\newcommand{\yb}{{\bf y}}
\newcommand{\ub}{{\bf u}}
\newcommand{\Gb}{{\bf G}}
\newcommand{\Hb}{{\bf H}}
\newcommand{\thetab}{\boldsymbol{\theta}}
\newcommand{\XBI}{{\bf x}_{i1},\ldots,{\bf x}_{iT}}
\newcommand{\Sb}{{\bf S}} \newcommand{\Xb}{{\bf X}}
\newcommand{\Xtb}{\tilde{\bf X}}
\newcommand{\Wb}{{\bf W}}
\newcommand{\Ab}{{\bf A}}
\newcommand{\Bb}{{\bf B}}
\newcommand{\Zb}{{\bf Z}}
\newcommand{\Eb}{{\bf E}}\) This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.
Overview
We provide an introduction to parameter estimation by maximum likelihood and method of moments using mlexp and gmm, respectively (see [R] mlexp and [R] gmm). We include some background about these estimation techniques; see Pawitan (2001, Casella and Berger (2002), Cameron and Trivedi (2005), and Wooldridge (2010) for more details.
Maximum likelihood (ML) estimation finds the parameter values that make the observed data most probable. The parameters maximize the log of the likelihood function that specifies the probability of observing a particular set of data given a model.
Method of moments (MM) estimators specify population moment conditions and find the parameters that solve the equivalent sample moment conditions. MM estimators usually place fewer restrictions on the model than ML estimators, which implies that MM estimators are less efficient but more robust than ML estimators.
Using mlexp to estimate probit model parameters
A probit model for the binary dependent variable \(y\) conditional on covariates \(\xb\) with coefficients \(\betab\) is
\[\begin{equation}
y = \begin{cases}
1 & \mbox{ if } \xb\betab’ + \epsilon > 0\\
0 & \mbox{ otherwise }
\end{cases}
\end{equation}\]
where \(\epsilon\) has a standard normal distribution. The log-likelihood function for the probit model is
\[\begin{equation}\label{E:b1}
\ln\{L(\betab;\xb,y)\}= \sum_{i=1}^N y_i \ln\Phi(\xb_{i}\betab’)
+ (1-y_i) \ln\Phi(-\xb_{i}\betab’)
\end{equation}\]
where \(\Phi\) denotes the cumulative standard normal.
We now use mlexp to estimate the coefficients of a probit model. We have data on whether an individual belongs to a union (union), the individual’s age (age), and the highest grade completed (grade).
. webuse union (NLS Women 14-24 in 1968) . mlexp ( union*lnnormal({b1}*age + {b2}*grade + {b0}) /// > + (1-union)*lnnormal(-({b1}*age + {b2}*grade + {b0})) ) initial: log likelihood = -18160.456 alternative: log likelihood = -1524604.4 rescale: log likelihood = -14097.135 rescale eq: log likelihood = -14063.38 Iteration 0: log likelihood = -14063.38 Iteration 1: log likelihood = -13796.715 Iteration 2: log likelihood = -13796.336 Iteration 3: log likelihood = -13796.336 Maximum likelihood estimation Log likelihood = -13796.336 Number of obs = 26,200 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /b1 | .0051821 .0013471 3.85 0.000 .0025418 .0078224 /b2 | .0373899 .0035814 10.44 0.000 .0303706 .0444092 /b0 | -1.404697 .0587797 -23.90 0.000 -1.519903 -1.289491 ------------------------------------------------------------------------------
Defining a linear combination of the covariates makes it easier to specify the model and to read the output:
. mlexp ( union*lnnormal({xb:age grade _cons}) + (1-union)*lnnormal(-{xb:}) ) initial: log likelihood = -18160.456 alternative: log likelihood = -14355.672 rescale: log likelihood = -14220.454 Iteration 0: log likelihood = -14220.454 Iteration 1: log likelihood = -13797.767 Iteration 2: log likelihood = -13796.336 Iteration 3: log likelihood = -13796.336 Maximum likelihood estimation Log likelihood = -13796.336 Number of obs = 26,200 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0051821 .0013471 3.85 0.000 .0025418 .0078224 grade | .0373899 .0035814 10.44 0.000 .0303706 .0444092 _cons | -1.404697 .0587797 -23.90 0.000 -1.519903 -1.289491 ------------------------------------------------------------------------------
Using gmm to estimate parameters by MM
ML specifies a functional form for the distribution of \(y\) conditional on \(\xb\). Specifying \(\Eb[y|\xb]=\Phi(\xb\betab’)\) is less restrictive because it imposes structure only on the first conditional moment instead of on all the conditional moments. Under correct model specification, the ML estimator is more efficient than the MM
estimator because it correctly specifies the conditional mean and all other conditional moments.
The model assumption \(\Eb[y|\xb]=\Phi(\xb\betab’)\) implies the moment conditions \(\Eb[\{y-\Phi(\xb\betab’)\}\xb] = {\bf 0}\). The sample moment equivalent is
\[\sum_{i=1}^N [\{y_i-\Phi(\xb_i\betab’)\}\xb_i] = {\bf 0}\]
In the gmm command below, we specify the residuals \(y_i-\Phi(\xb_i\betab’)\) inside the parentheses and the variables that multiply them, known as instruments, in the option instruments().
. gmm ( union - normal({xb:age grade _cons}) ), instruments(age grade) onestep Step 1 Iteration 0: GMM criterion Q(b) = .07831137 Iteration 1: GMM criterion Q(b) = .00004813 Iteration 2: GMM criterion Q(b) = 5.333e-09 Iteration 3: GMM criterion Q(b) = 5.789e-17 note: model is exactly identified GMM estimation Number of parameters = 3 Number of moments = 3 Initial weight matrix: Unadjusted Number of obs = 26,200 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0051436 .0013349 3.85 0.000 .0025272 .00776 grade | .0383185 .0038331 10.00 0.000 .0308058 .0458312 _cons | -1.415623 .0609043 -23.24 0.000 -1.534994 -1.296253 ------------------------------------------------------------------------------ Instruments for equation 1: age grade _cons
The point estimates are similar to the ML estimates because both estimators are consistent.
Using gmm to estimate parameters by ML
When we maximize a log-likelihood function, we find the parameters that set the first derivative to 0. For example, setting the first derivative of the probit log-likelihood function with respect to \(\betab\) to 0 in the sample yields
\[\begin{equation}\label{E:b2}
\frac{\partial \ln\{L(\beta;\xb,y)\}}{\partial \betab} =
\sum_{i=1}^N \left\{y_i \frac{\phi(\xb_{i}\betab’)}{\Phi(\xb_{i}\betab’)}
– (1-y_i) \frac{\phi(-\xb_{i}\betab’)}{\Phi(-\xb_{i}\betab’)}\right\}
\xb_{i} = {\bf 0}
\end{equation}\]
Below, we use gmm to find the parameters that solve these sample moment conditions:
. gmm ( union*normalden({xb:age grade _cons})/normal({xb:}) /// > -(1-union)*normalden(-{xb:})/normal(-{xb:}) ), /// > instruments(age grade) onestep Step 1 Iteration 0: GMM criterion Q(b) = .19941827 Iteration 1: GMM criterion Q(b) = .00012506 Iteration 2: GMM criterion Q(b) = 2.260e-09 Iteration 3: GMM criterion Q(b) = 7.369e-19 note: model is exactly identified GMM estimation Number of parameters = 3 Number of moments = 3 Initial weight matrix: Unadjusted Number of obs = 26,200 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0051821 .001339 3.87 0.000 .0025577 .0078065 grade | .0373899 .0037435 9.99 0.000 .0300528 .044727 _cons | -1.404697 .0601135 -23.37 0.000 -1.522517 -1.286876 ------------------------------------------------------------------------------ Instruments for equation 1: age grade _cons
The point estimates match those reported by mlexp. The standard errors differ because gmm reports robust standard errors.
Summary
We showed how to easily estimate the probit model parameters by ML and by MM using mlexp and gmm, respectively. We also showed that you can estimate these parameters using restrictions imposed by conditional distributions or using weaker conditional moment restrictions. Finally, we illustrated that the equations imposed by the conditional distributions can be viewed as sample moment restrictions.
References
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics Methods and Applications. 1st ed. New York: Cambridge University Press.
Casella, G., and R. L. Berger. 2002. Statistical Inference. 2nd ed. Pacific Grove, CA: Duxbury.
Pawitan, Y. 2001. In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford: Oxford University Press.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. MIT Press.