Home > Statistics > Maximum likelihood estimation by mlexp: A chi-squared example

Maximum likelihood estimation by mlexp: A chi-squared example

Overview

In this post, I show how to use mlexp to estimate the degree of freedom parameter of a chi-squared distribution by maximum likelihood (ML). One example is unconditional, and another example models the parameter as a function of covariates. I also show how to generate data from chi-squared distributions and I illustrate how to use simulation methods to understand an estimation technique.

The data

I want to show how to draw data from a \(\chi^2\) distribution, and I want to illustrate that the ML estimator produces estimates close to the truth, so I use simulated data.

In the output below, I draw a \(2,000\) observation random sample of data from a \(\chi^2\) distribution with \(2\) degrees of freedom, denoted by \(\chi^2(2)\), and I summarize the results.

Example 1: Generating \(\chi^2(2)\) data

. drop _all

. set obs 2000
number of observations (_N) was 0, now 2,000

. set seed 12345

. generate y = rchi2(2)

. summarize y

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           y |      2,000    2.030865    1.990052   .0028283   13.88213

The mean and variance of the \(\chi^2(2)\) distribution are \(2\) and \(4\), respectively. The sample mean of \(2.03\) and the sample variance of \(3.96=1.99^2\) are close to the true values. I set the random-number seed to \(12345\) so that you can replicate my example; type help seed for details.

mlexp and the log-likelihood function

The log-likelihood function for the ML estimator for the degree of freedom parameter \(d\) of a \(\chi^2(d)\) distribution is

\[
{\mathcal L}(d) = \sum_{i=1}^N \ln[f(y_i,d)]
\]

where \(f(y_i,d)\) is the density function for the \(\chi^2(d)\) distribution. See Trivedi, 2005 and Wooldridge, 2010 for instructions to ML.

The mlexp command estimates parameters by maximizing the specified log-likelihood function. You specify the contribution of an observation to the log-likelihood function inside parentheses, and you enclose parameters inside the curly braces \(\{\) and \(\}\). I use mlexp to estimate \(d\) in example 2.

Example 2: Using mlexp to estimate \(d\)

. mlexp ( ln(chi2den({d},y)) )

initial:       log likelihood =     -  (could not be evaluated)
feasible:      log likelihood = -5168.1594
rescale:       log likelihood = -3417.1592
Iteration 0:   log likelihood = -3417.1592  
Iteration 1:   log likelihood = -3416.7063  
Iteration 2:   log likelihood = -3416.7063  

Maximum likelihood estimation

Log likelihood = -3416.7063                     Number of obs     =      2,000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          /d |   2.033457   .0352936    57.62   0.000     1.964283    2.102631
------------------------------------------------------------------------------

The estimate of \(d\) is very close to the true value of \(2.0\), as expected.

Modeling the degree of freedom as a function of a covariate

When using ML in applied research, we almost always want to model the parameters of a distribution as a function of covariates. Below, I draw a covariate \(x\) from Uniform(0,3) distribution, specify that \(d=1+x\), and draw \(y\) from a \(\chi^2(d)\) distribution conditional on \(x\). Having drawn data from the DGP, I estimate the parameters using mlexp.

Example 3: Using mlexp to estimate \(d=a+b x_i\)

. drop _all

. set obs 2000
number of observations (_N) was 0, now 2,000

. set seed 12345

. generate x = runiform(0,3)

. generate d = 1 + x

. generate y = rchi2(d)

. mlexp ( ln(chi2den({b}*x +{a},y)) )

initial:       log likelihood =     -  (could not be evaluated)
feasible:      log likelihood = -4260.0685
rescale:       log likelihood = -3597.6271
rescale eq:    log likelihood = -3597.6271
Iteration 0:   log likelihood = -3597.6271  
Iteration 1:   log likelihood = -3596.5383  
Iteration 2:   log likelihood =  -3596.538  

Maximum likelihood estimation

Log likelihood =  -3596.538                     Number of obs     =      2,000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          /b |   1.061621   .0430846    24.64   0.000     .9771766    1.146065
          /a |   .9524136   .0545551    17.46   0.000     .8454876     1.05934
------------------------------------------------------------------------------

The estimates of \(1.06\) and \(0.95\) are close to their true values.

mlexp makes this process easier by forming a linear combination of variables that you specify.

Example 4: A linear combination in mlexp

. mlexp ( ln(chi2den({xb: x _cons},y)) )

initial:       log likelihood =     -  (could not be evaluated)
feasible:      log likelihood = -5916.7648
rescale:       log likelihood = -3916.6106
Iteration 0:   log likelihood = -3916.6106  
Iteration 1:   log likelihood = -3621.2905  
Iteration 2:   log likelihood = -3596.5845  
Iteration 3:   log likelihood =  -3596.538  
Iteration 4:   log likelihood =  -3596.538  

Maximum likelihood estimation

Log likelihood =  -3596.538                     Number of obs     =      2,000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.061621   .0430846    24.64   0.000     .9771766    1.146065
       _cons |   .9524138   .0545551    17.46   0.000     .8454878     1.05934
------------------------------------------------------------------------------

The estimates are the same as in example 3, but the command was easier to write and the output is easier to read.

Done and undone

I have shown how to generate data from a \(\chi^2(d)\) distribution when \(d\) is a fixed number or a linear function of a covariate and how to estimate \(d\) or the parameters of the model for \(d\) by using mlexp.

The examples discussed above show how to use mlexp and illustrate an example of conditional maximum likelihood estimation.

mlexp can do much more than I have discussed here; see [R] mlexp for more details. Estimating the parameters of a conditional distribution is only the beginning of any research project. I will discuss interpreting these parameters in a future post.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.