Maximum likelihood estimation by mlexp: A chi-squared example
Overview
In this post, I show how to use mlexp to estimate the degree of freedom parameter of a chi-squared distribution by maximum likelihood (ML). One example is unconditional, and another example models the parameter as a function of covariates. I also show how to generate data from chi-squared distributions and I illustrate how to use simulation methods to understand an estimation technique.
The data
I want to show how to draw data from a \(\chi^2\) distribution, and I want to illustrate that the ML estimator produces estimates close to the truth, so I use simulated data.
In the output below, I draw a \(2,000\) observation random sample of data from a \(\chi^2\) distribution with \(2\) degrees of freedom, denoted by \(\chi^2(2)\), and I summarize the results.
Example 1: Generating \(\chi^2(2)\) data
. drop _all . set obs 2000 number of observations (_N) was 0, now 2,000 . set seed 12345 . generate y = rchi2(2) . summarize y Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- y | 2,000 2.030865 1.990052 .0028283 13.88213
The mean and variance of the \(\chi^2(2)\) distribution are \(2\) and \(4\), respectively. The sample mean of \(2.03\) and the sample variance of \(3.96=1.99^2\) are close to the true values. I set the random-number seed to \(12345\) so that you can replicate my example; type help seed for details.
mlexp and the log-likelihood function
The log-likelihood function for the ML estimator for the degree of freedom parameter \(d\) of a \(\chi^2(d)\) distribution is
\[
{\mathcal L}(d) = \sum_{i=1}^N \ln[f(y_i,d)]
\]
where \(f(y_i,d)\) is the density function for the \(\chi^2(d)\) distribution. See Trivedi, 2005 and Wooldridge, 2010 for instructions to ML.
The mlexp command estimates parameters by maximizing the specified log-likelihood function. You specify the contribution of an observation to the log-likelihood function inside parentheses, and you enclose parameters inside the curly braces \(\{\) and \(\}\). I use mlexp to estimate \(d\) in example 2.
Example 2: Using mlexp to estimate \(d\)
. mlexp ( ln(chi2den({d},y)) ) initial: log likelihood = -(could not be evaluated) feasible: log likelihood = -5168.1594 rescale: log likelihood = -3417.1592 Iteration 0: log likelihood = -3417.1592 Iteration 1: log likelihood = -3416.7063 Iteration 2: log likelihood = -3416.7063 Maximum likelihood estimation Log likelihood = -3416.7063 Number of obs = 2,000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /d | 2.033457 .0352936 57.62 0.000 1.964283 2.102631 ------------------------------------------------------------------------------
The estimate of \(d\) is very close to the true value of \(2.0\), as expected.
Modeling the degree of freedom as a function of a covariate
When using ML in applied research, we almost always want to model the parameters of a distribution as a function of covariates. Below, I draw a covariate \(x\) from Uniform(0,3) distribution, specify that \(d=1+x\), and draw \(y\) from a \(\chi^2(d)\) distribution conditional on \(x\). Having drawn data from the DGP, I estimate the parameters using mlexp.
Example 3: Using mlexp to estimate \(d=a+b x_i\)
. drop _all . set obs 2000 number of observations (_N) was 0, now 2,000 . set seed 12345 . generate x = runiform(0,3) . generate d = 1 + x . generate y = rchi2(d) . mlexp ( ln(chi2den({b}*x +{a},y)) ) initial: log likelihood = -(could not be evaluated) feasible: log likelihood = -4260.0685 rescale: log likelihood = -3597.6271 rescale eq: log likelihood = -3597.6271 Iteration 0: log likelihood = -3597.6271 Iteration 1: log likelihood = -3596.5383 Iteration 2: log likelihood = -3596.538 Maximum likelihood estimation Log likelihood = -3596.538 Number of obs = 2,000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- /b | 1.061621 .0430846 24.64 0.000 .9771766 1.146065 /a | .9524136 .0545551 17.46 0.000 .8454876 1.05934 ------------------------------------------------------------------------------
The estimates of \(1.06\) and \(0.95\) are close to their true values.
mlexp makes this process easier by forming a linear combination of variables that you specify.
Example 4: A linear combination in mlexp
. mlexp ( ln(chi2den({xb: x _cons},y)) ) initial: log likelihood = -(could not be evaluated) feasible: log likelihood = -5916.7648 rescale: log likelihood = -3916.6106 Iteration 0: log likelihood = -3916.6106 Iteration 1: log likelihood = -3621.2905 Iteration 2: log likelihood = -3596.5845 Iteration 3: log likelihood = -3596.538 Iteration 4: log likelihood = -3596.538 Maximum likelihood estimation Log likelihood = -3596.538 Number of obs = 2,000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 1.061621 .0430846 24.64 0.000 .9771766 1.146065 _cons | .9524138 .0545551 17.46 0.000 .8454878 1.05934 ------------------------------------------------------------------------------
The estimates are the same as in example 3, but the command was easier to write and the output is easier to read.
Done and undone
I have shown how to generate data from a \(\chi^2(d)\) distribution when \(d\) is a fixed number or a linear function of a covariate and how to estimate \(d\) or the parameters of the model for \(d\) by using mlexp.
The examples discussed above show how to use mlexp and illustrate an example of conditional maximum likelihood estimation.
mlexp can do much more than I have discussed here; see [R] mlexp for more details. Estimating the parameters of a conditional distribution is only the beginning of any research project. I will discuss interpreting these parameters in a future post.
References
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.