Home > Programming > Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters

Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters

\(
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)I show how to use optimize() in Mata to maximize a Poisson log-likelihood function and to obtain estimators of the variance–covariance of the estimator (VCE) based on independent and identically distributed (IID) observations or on robust methods.

This is the eighteenth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Using optimize()

There are many optional choices that one may make when solving a nonlinear optimization problem, but there are very few that one must make. The optimize*() functions in Mata handle this problem by making a set of default choices for you, requiring that you specify a few things, and allowing you to change any of the default choices.

When I use optimize() to solve a nonlinear optimization problem, I do four steps.

  1. I create an optimize() object

      
    : S = optimize_init()

    which contains all the default choices

  2. I use some of the optimize_init_*(S) functions to put information about my optimization problem into S.
  3. I use

      
    : betahat = optimize(S)

    to perform the optimization.

  4. I use some of the optimize_result_*(S) functions to get the results, which optimize(S) stored in S.

Consider maximizing the log-likelihood function of a Poisson model. The contribution of the \(i\)th observation to the log-likelihood is

\[
f_i(\betab) = y_i{\bf x_i}\betab – \exp({\bf x}_i\betab’) – \ln( y_i !)
\]

where \(y_i\) is the dependent variable, \({\bf x}_i\) is the vector of covariates, and \(\betab\) is the row vector of parameters that we select to maximize the log-likelihood function given by \(F(\betab) =\sum_i f_i(\betab)\). I could drop \(ln(y_i!)\), because it does not depend on the parameters. I include it to make the value of the log-likelihood function the same as that reported by Stata. Stata includes these terms so that the values of the log-likelihood functions are comparable across models.

The code block 1 copies the data from Stata to Mata and computes the Poisson log-likelihood function at the vector of parameter values b, which has been set to the arbitrary starting values of .01 for each parameter.

Code block 1: An evaluator function for the Poisson log-likelihood

clear all
use accident3
mata:
     y = st_data(., "accidents")
     X = st_data(., "cvalue kids traffic")
     X = X,J(rows(X), 1, 1)
     b = J(1, cols(X), .01)
    xb = X*b'
    f  = sum(-exp(xb) + y:*xb - lnfactorial(y))
end

The Mata function plleval() in code block 2 puts the value of the Poisson log-likelihood function at the vector of parameter values b into val.

Code block 2: An evaluator function for the Poisson log-likelihood

mata:
void plleval(real scalar todo, real vector b, val, grad, hess)
{
    real vector  y, xb
    real matrix  X

     y = st_data(., "accidents")
     X = st_data(., "cvalue kids traffic")
     X = X,J(rows(X), 1, 1)
    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end

plleval() has the default syntax of an evaluator function that optimize() can call. Evaluator functions have a default syntax so that optimize() can call them, which it must do to find the maximum. After describing the default syntax, I will show how to use evaluators with extra arguments.

plleval() is void, it returns nothing. The real scalar todo allows optimize() to tell the evaluator function what it must compute. The real vector b is the current value of the parameter vector. val is not typed because no matter what it contains on input, it will contain the value of the objective function on output. grad is not typed because it will optionally contain the vector of first derivatives of the objective function at the current value of b on output. hess is not typed because it will optionally contain the matrix of second derivatives of the objective function at the current value of b on output. As plleval() illustrates, the objective function must put the value of the objective function into the third argument, but it need not compute either the vector of first derivatives or the matrix of second derivatives.

In example 1, I use optimize() to maximize the Poisson log-likelihood function computed in plleval().

Example 1: Using optimize() to estimate Poisson parameters

(Uses accident3.dta)

. clear all

. use accident3

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval(real scalar todo, real vector b, val, grad, hess)
> {
>     real vector  y, xb
>     real matrix  X
> 
>      y = st_data(., "accidents")
>      X = st_data(., "cvalue kids traffic")
>      X = X,J(rows(X), 1, 1)
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
note: argument todo unused
note: argument grad unused
note: argument hess unused

: 
: S  = optimize_init()

: optimize_init_evaluator(S, &plleval())

: optimize_init_params(S, J(1, 4, .01))

: bh = optimize(S)
Iteration 0:   f(p) = -851.18669  
Iteration 1:   f(p) = -556.66874  
Iteration 2:   f(p) = -555.81708  
Iteration 3:   f(p) = -555.81538  
Iteration 4:   f(p) = -555.81538  

: bh
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
    +---------------------------------------------------------+

: end
--------------------------------------------------------------------------------

After defining plleval(), I use optimize_init() to create the optimize() object S. I must put information about how to call plleval() and the vector of starting values into S. Typing

optimize_init_evaluator(S, &plleval())

puts the address of the evaluator function plleval() into S. By preceding the name of the evaluator function plleval() with an ampersand (&), I put the address of the evaluator function into S. optimize() requires that you put the function address instead of the function name because having the address speeds up finding the function. Typing

optimize_init_params(S, J(1, 4, .01))

puts the vector of starting values, J(1, 4, .01), into S.

Typing

bh = optimize(S)

causes optimize() to solve the optimization problem described in S, and it causes optimize() to put the vector of optimal parameters in bh. optimize() produces the default iteration log, because we did not change the default specification in S.

When optimize() has completed, the results are in S. For example, I display the bh returned by optimize() and use optimize_result_params(S) to display the result stored in S. I further illustrate by displaying the standard errors; optimize_result_V_oim() retrieves the observed-information-matrix (OIM) estimator of the variance–covariance of the estimator (VCE). Many other results are stored in S; type help mf optimize and look at the optimize_result*() functions for details.

Comparing the results in examples 1 and 2 shows that they are correct.

Example 2: Results from poisson

. poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
                                                LR chi2(3)        =     340.20
                                                Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
        kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
     traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
       _cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------

plleval() is slow because it copies the data from Stata to Mata every time optimize() calls it. I would much rather pass the data to the evaluator function, but this requires putting information about the syntax of the new evaluator function in S. For example, I would like to use the evaluator function plleval2(). In example 3, I use optimize_init_argument() to put information into S about the extra arguments accepted by the new evaluator function plleval2().

Code block 3: Passing data to the Poisson evaluator function

mata:
void plleval2(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb

    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end

Line 3 declares the extra arguments, the real vector y, and the real vector X. The extra arguments come between the inputs that must always be present, the real scalar todo and the real vector b, and the always-present outputs; val, grad, and hess.

Example 3 uses optimize() to maximize the Poisson objective function coded in plleval2().

Example 3: Using optional arguments to pass data

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval2(real scalar todo, real vector b,     ///
>               real vector y,    real matrix X,     ///
>               val, grad, hess)
> {
>     real vector  xb
>
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
note: argument todo unused
note: argument grad unused
note: argument hess unused

:
: y = st_data(., "accidents")

: X = st_data(., "cvalue kids traffic")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval2())

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
    +---------------------------------------------------------+

: end
--------------------------------------------------------------------------------

After defining plleval2(), I copy the data from Stata to Mata, and I use optimize_init() to put the default choices into the optimize() object S. When I typed

optimize_init_argument(S, 1, y)

I put information into S specifying that optimize() should pass y as the first extra argument to the evaluator function. When I typed

optimize_init_argument(S, 2, X)

I put information into S specifying that optimize() should pass X as the second extra argument to the evaluator function.

Analogous to example 1, typing

optimize_init_evaluator(S, &plleval2())

puts the address of plleval2() into S, and typing

optimize_init_params(S, J(1, 4, .01))

puts the vector of starting values, J(1, 4, .01), in S.

The results are the same as those in example 1.

Vector of observation–level contributions and robust VCE estimation

Robust estimators for the VCE of an estimator use the structure of observation-level contributions; see Wooldridge (2010, chapters 12 and 13) or Cameron and Trivedi (2005, chapter 5). When the evaluator function gives optimize() a vector of observation-level contributions, instead of a scalar summation, optimize() can use this structure to compute robust or cluster–robust estimators of the VCE.

Consider plleval3(), which puts the vector of observation-level contributions into val.

Code block 4: A vector of observation-level contributions

mata:
void plleval3(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb

    xb = X*b'
   val = -exp(xb) + y:*xb - lnfactorial(y)
}
end

To use plleval3(), I must put information in the optimize() object stating that the evaluator function computes a vector of observation-level contributions. In example 4, I use optimize_init_evaluatortype() to put this information into the optimize() object S.

Example 4: Robust VCE estimation

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval3(real scalar todo, real vector b,     ///
>               real vector y,    real matrix X,     ///
>               val, grad, hess)
> {
>     real vector  xb
>
>     xb = X*b'
>    val = -exp(xb) + y:*xb - lnfactorial(y)
> }
note: argument todo unused
note: argument grad unused
note: argument hess unused

:
:
: y = st_data(., "accidents")

: X = st_data(., "cvalue kids traffic")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval3())

: optimize_init_evaluatortype(S, "gf0")

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871527   -1.009017051    .1467114658    .5743542978  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483832   .0807960809    .031376176   .2839517337  |
    +---------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_robust(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .1096020124    .188666044    .092431746   .6045057623  |
    +---------------------------------------------------------+

: end
--------------------------------------------------------------------------------

After defining plleval3(), I copy the data, create the optimize() object S, put the specifications for the extra arguments y and X in S, and put the address of plleval3() into S. Typing

optimize_init_evaluatortype(S, “gf0”)

puts in S the information that the evaluator function returns a vector of observation-level contribution and that it computes zero derivatives, that is the evaluator function is type “gf0”. Given the vector structure, I can type

optimize_result_V_robust(S)

to compute a robust estimator of the VCE.

sqrt(diagonal(optimize_result_V_robust(S)))’

returns the robust standard errors, which are the same as those reported by poisson in example 5.

Example 5: Robust VCE estimation by poisson

. poisson accidents cvalue kids traffic, vce(robust)

Iteration 0:   log pseudolikelihood = -555.86605
Iteration 1:   log pseudolikelihood =  -555.8154
Iteration 2:   log pseudolikelihood = -555.81538

Poisson regression                              Number of obs     =        505
                                                Wald chi2(3)      =      99.76
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -555.81538               Pseudo R2         =     0.2343

------------------------------------------------------------------------------
             |               Robust
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .1096019    -5.98   0.000    -.8707029   -.4410712
        kids |  -1.009017    .188666    -5.35   0.000    -1.378795   -.6392382
     traffic |   .1467115   .0924316     1.59   0.112    -.0344512    .3278741
       _cons |    .574354   .6045047     0.95   0.342    -.6104535    1.759162
------------------------------------------------------------------------------

Done and undone

I showed how to use optimize() to maximize a Poisson log-likelihood function. I also showed how to obtain a robust estimator of the VCE by coding the evaluator function to compute a vector of observation-level contributions. In my next post, I show how to write a Stata command that uses Mata to estimate the parameters of a Poisson regression model.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.