## Maximum likelihood estimation by mlexp: A chi-squared example

Overview

In this post, I show how to use mlexp to estimate the degree of freedom parameter of a chi-squared distribution by maximum likelihood (ML). One example is unconditional, and another example models the parameter as a function of covariates. I also show how to generate data from chi-squared distributions and I illustrate how to use simulation methods to understand an estimation technique.

The data

I want to show how to draw data from a $$\chi^2$$ distribution, and I want to illustrate that the ML estimator produces estimates close to the truth, so I use simulated data.

In the output below, I draw a $$2,000$$ observation random sample of data from a $$\chi^2$$ distribution with $$2$$ degrees of freedom, denoted by $$\chi^2(2)$$, and I summarize the results.

Example 1: Generating $$\chi^2(2)$$ data

. drop _all

. set obs 2000
number of observations (_N) was 0, now 2,000

. set seed 12345

. generate y = rchi2(2)

. summarize y

Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
y |      2,000    2.030865    1.990052   .0028283   13.88213


The mean and variance of the $$\chi^2(2)$$ distribution are $$2$$ and $$4$$, respectively. The sample mean of $$2.03$$ and the sample variance of $$3.96=1.99^2$$ are close to the true values. I set the random-number seed to $$12345$$ so that you can replicate my example; type help seed for details.

mlexp and the log-likelihood function

The log-likelihood function for the ML estimator for the degree of freedom parameter $$d$$ of a $$\chi^2(d)$$ distribution is

${\mathcal L}(d) = \sum_{i=1}^N \ln[f(y_i,d)]$

where $$f(y_i,d)$$ is the density function for the $$\chi^2(d)$$ distribution. See Trivedi, 2005 and Wooldridge, 2010 for instructions to ML.

The mlexp command estimates parameters by maximizing the specified log-likelihood function. You specify the contribution of an observation to the log-likelihood function inside parentheses, and you enclose parameters inside the curly braces $$\{$$ and $$\}$$. I use mlexp to estimate $$d$$ in example 2.

Example 2: Using mlexp to estimate $$d$$

. mlexp ( ln(chi2den({d},y)) )

initial:       log likelihood =     -  (could not be evaluated)
feasible:      log likelihood = -5168.1594
rescale:       log likelihood = -3417.1592
Iteration 0:   log likelihood = -3417.1592
Iteration 1:   log likelihood = -3416.7063
Iteration 2:   log likelihood = -3416.7063

Maximum likelihood estimation

Log likelihood = -3416.7063                     Number of obs     =      2,000

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
/d |   2.033457   .0352936    57.62   0.000     1.964283    2.102631
------------------------------------------------------------------------------


The estimate of $$d$$ is very close to the true value of $$2.0$$, as expected.

Modeling the degree of freedom as a function of a covariate

When using ML in applied research, we almost always want to model the parameters of a distribution as a function of covariates. Below, I draw a covariate $$x$$ from Uniform(0,3) distribution, specify that $$d=1+x$$, and draw $$y$$ from a $$\chi^2(d)$$ distribution conditional on $$x$$. Having drawn data from the DGP, I estimate the parameters using mlexp.

Example 3: Using mlexp to estimate $$d=a+b x_i$$

. drop _all

. set obs 2000
number of observations (_N) was 0, now 2,000

. set seed 12345

. generate x = runiform(0,3)

. generate d = 1 + x

. generate y = rchi2(d)

. mlexp ( ln(chi2den({b}*x +{a},y)) )

initial:       log likelihood =     -  (could not be evaluated)
feasible:      log likelihood = -4260.0685
rescale:       log likelihood = -3597.6271
rescale eq:    log likelihood = -3597.6271
Iteration 0:   log likelihood = -3597.6271
Iteration 1:   log likelihood = -3596.5383
Iteration 2:   log likelihood =  -3596.538

Maximum likelihood estimation

Log likelihood =  -3596.538                     Number of obs     =      2,000

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
/b |   1.061621   .0430846    24.64   0.000     .9771766    1.146065
/a |   .9524136   .0545551    17.46   0.000     .8454876     1.05934
------------------------------------------------------------------------------


The estimates of $$1.06$$ and $$0.95$$ are close to their true values.

mlexp makes this process easier by forming a linear combination of variables that you specify.

Example 4: A linear combination in mlexp

. mlexp ( ln(chi2den({xb: x _cons},y)) )

initial:       log likelihood =     -  (could not be evaluated)
feasible:      log likelihood = -5916.7648
rescale:       log likelihood = -3916.6106
Iteration 0:   log likelihood = -3916.6106
Iteration 1:   log likelihood = -3621.2905
Iteration 2:   log likelihood = -3596.5845
Iteration 3:   log likelihood =  -3596.538
Iteration 4:   log likelihood =  -3596.538

Maximum likelihood estimation

Log likelihood =  -3596.538                     Number of obs     =      2,000

------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
x |   1.061621   .0430846    24.64   0.000     .9771766    1.146065
_cons |   .9524138   .0545551    17.46   0.000     .8454878     1.05934
------------------------------------------------------------------------------


The estimates are the same as in example 3, but the command was easier to write and the output is easier to read.

Done and undone

I have shown how to generate data from a $$\chi^2(d)$$ distribution when $$d$$ is a fixed number or a linear function of a covariate and how to estimate $$d$$ or the parameters of the model for $$d$$ by using mlexp.

The examples discussed above show how to use mlexp and illustrate an example of conditional maximum likelihood estimation.

mlexp can do much more than I have discussed here; see [R] mlexp for more details. Estimating the parameters of a conditional distribution is only the beginning of any research project. I will discuss interpreting these parameters in a future post.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

