## Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters

\(

\newcommand{\xb}{{\bf x}}

\newcommand{\betab}{\boldsymbol{\beta}}\)I show how to use **optimize()** in Mata to maximize a Poisson log-likelihood function and to obtain estimators of the variance–covariance of the estimator (**VCE**) based on independent and identically distributed (**IID**) observations or on robust methods.

This is the eighteenth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Using optimize()**

There are many optional choices that one may make when solving a nonlinear optimization problem, but there are very few that one must make. The **optimize*()** functions in Mata handle this problem by making a set of default choices for you, requiring that you specify a few things, and allowing you to change any of the default choices.

When I use **optimize()** to solve a nonlinear optimization problem, I do four steps.

- I create an
**optimize()**object

**: S = optimize_init()**which contains all the default choices

- I use some of the
**optimize_init_*(S)**functions to put information about my optimization problem into**S**. - I use

**: betahat = optimize(S)**to perform the optimization.

- I use some of the
**optimize_result_*(S)**functions to get the results, which**optimize(S)**stored in**S**.

Consider maximizing the log-likelihood function of a Poisson model. The contribution of the \(i\)th observation to the log-likelihood is

\[

f_i(\betab) = y_i{\bf x_i}\betab – \exp({\bf x}_i\betab’) – \ln( y_i !)

\]

where \(y_i\) is the dependent variable, \({\bf x}_i\) is the vector of covariates, and \(\betab\) is the row vector of parameters that we select to maximize the log-likelihood function given by \(F(\betab) =\sum_i f_i(\betab)\). I could drop \(ln(y_i!)\), because it does not depend on the parameters. I include it to make the value of the log-likelihood function the same as that reported by Stata. Stata includes these terms so that the values of the log-likelihood functions are comparable across models.

The code block 1 copies the data from Stata to Mata and computes the Poisson log-likelihood function at the vector of parameter values **b**, which has been set to the arbitrary starting values of .01 for each parameter.

**Code block 1: An evaluator function for the Poisson log-likelihood**

clear all use accident3 mata: y = st_data(., "accidents") X = st_data(., "cvalue kids traffic") X = X,J(rows(X), 1, 1) b = J(1, cols(X), .01) xb = X*b' f = sum(-exp(xb) + y:*xb - lnfactorial(y)) end

The Mata function **plleval()** in code block 2 puts the value of the Poisson log-likelihood function at the vector of parameter values **b** into **val**.

**Code block 2: An evaluator function for the Poisson log-likelihood **

mata: void plleval(real scalar todo, real vector b, val, grad, hess) { real vector y, xb real matrix X y = st_data(., "accidents") X = st_data(., "cvalue kids traffic") X = X,J(rows(X), 1, 1) xb = X*b' val = sum(-exp(xb) + y:*xb - lnfactorial(y)) } end

**plleval()** has the default syntax of an evaluator function that **optimize()** can call. Evaluator functions have a default syntax so that **optimize()** can call them, which it must do to find the maximum. After describing the default syntax, I will show how to use evaluators with extra arguments.

**plleval()** is **void**, it returns nothing. The **real** **scalar** **todo** allows **optimize()** to tell the evaluator function what it must compute. The **real** **vector** **b** is the current value of the parameter vector. **val** is not typed because no matter what it contains on input, it will contain the value of the objective function on output. **grad** is not typed because it will optionally contain the vector of first derivatives of the objective function at the current value of **b** on output. **hess** is not typed because it will optionally contain the matrix of second derivatives of the objective function at the current value of **b** on output. As **plleval()** illustrates, the objective function must put the value of the objective function into the third argument, but it need not compute either the vector of first derivatives or the matrix of second derivatives.

In example 1, I use **optimize()** to maximize the Poisson log-likelihood function computed in **plleval()**.

**Example 1: Using optimize() to estimate Poisson parameters **

(Uses accident3.dta)

. clear all . use accident3 . mata: ------------------------------------------------- mata (type end to exit) ------ : void plleval(real scalar todo, real vector b, val, grad, hess) > { > real vector y, xb > real matrix X > > y = st_data(., "accidents") > X = st_data(., "cvalue kids traffic") > X = X,J(rows(X), 1, 1) > xb = X*b' > val = sum(-exp(xb) + y:*xb - lnfactorial(y)) > } note: argument todo unused note: argument grad unused note: argument hess unused : : S = optimize_init() : optimize_init_evaluator(S, &plleval()) : optimize_init_params(S, J(1, 4, .01)) : bh = optimize(S) Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66874 Iteration 2: f(p) = -555.81708 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 : bh 1 2 3 4 +-------------------------------------------------------------+ 1 | -.6558871399 -1.009017051 .1467114648 .5743542793 | +-------------------------------------------------------------+ : optimize_result_params(S) 1 2 3 4 +-------------------------------------------------------------+ 1 | -.6558871399 -1.009017051 .1467114648 .5743542793 | +-------------------------------------------------------------+ : sqrt(diagonal(optimize_result_V_oim(S)))' 1 2 3 4 +---------------------------------------------------------+ 1 | .0706483931 .0807960852 .0313761961 .2839519366 | +---------------------------------------------------------+ : end --------------------------------------------------------------------------------

After defining **plleval()**, I use **optimize_init()** to create the **optimize()** object **S**. I must put information about how to call **plleval()** and the vector of starting values into **S**. Typing

**optimize_init_evaluator(S, &plleval())**

puts the address of the evaluator function **plleval()** into **S**. By preceding the name of the evaluator function **plleval()** with an ampersand (**&**), I put the address of the evaluator function into **S**. **optimize()** requires that you put the function address instead of the function name because having the address speeds up finding the function. Typing

**optimize_init_params(S, J(1, 4, .01))**

puts the vector of starting values, **J(1, 4, .01)**, into **S**.

Typing

**bh = optimize(S)**

causes **optimize()** to solve the optimization problem described in **S**, and it causes **optimize()** to put the vector of optimal parameters in **bh**. **optimize()** produces the default iteration log, because we did not change the default specification in **S**.

When **optimize()** has completed, the results are in **S**. For example, I display the **bh** returned by **optimize()** and use **optimize_result_params(S)** to display the result stored in **S**. I further illustrate by displaying the standard errors; **optimize_result_V_oim()** retrieves the observed-information-matrix (**OIM**) estimator of the variance–covariance of the estimator (**VCE**). Many other results are stored in **S**; type **help mf optimize** and look at the **optimize_result*()** functions for details.

Comparing the results in examples 1 and 2 shows that they are correct.

**Example 2: Results from poisson**

. poisson accidents cvalue kids traffic Iteration 0: log likelihood = -555.86605 Iteration 1: log likelihood = -555.8154 Iteration 2: log likelihood = -555.81538 Poisson regression Number of obs = 505 LR chi2(3) = 340.20 Prob > chi2 = 0.0000 Log likelihood = -555.81538 Pseudo R2 = 0.2343 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506594 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .574354 .2839515 2.02 0.043 .0178193 1.130889 ------------------------------------------------------------------------------

**plleval()** is slow because it copies the data from Stata to Mata every time **optimize()** calls it. I would much rather pass the data to the evaluator function, but this requires putting information about the syntax of the new evaluator function in **S**. For example, I would like to use the evaluator function **plleval2()**. In example 3, I use **optimize_init_argument()** to put information into **S** about the extra arguments accepted by the new evaluator function **plleval2()**.

**Code block 3: Passing data to the Poisson evaluator function**

mata: void plleval2(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = sum(-exp(xb) + y:*xb - lnfactorial(y)) } end

Line 3 declares the extra arguments, the **real vector y**, and the **real vector X**. The extra arguments come between the inputs that must always be present, the **real scalar todo** and the **real vector b**, and the always-present outputs; **val**, **grad**, and **hess**.

Example 3 uses **optimize()** to maximize the Poisson objective function coded in **plleval2()**.

**Example 3: Using optional arguments to pass data**

. mata: ------------------------------------------------- mata (type end to exit) ------ : void plleval2(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb > > xb = X*b' > val = sum(-exp(xb) + y:*xb - lnfactorial(y)) > } note: argument todo unused note: argument grad unused note: argument hess unused : : y = st_data(., "accidents") : X = st_data(., "cvalue kids traffic") : X = X,J(rows(X), 1, 1) : : S = optimize_init() : optimize_init_argument(S, 1, y) : optimize_init_argument(S, 2, X) : optimize_init_evaluator(S, &plleval2()) : optimize_init_params(S, J(1, 4, .01)) : : bh = optimize(S) Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66874 Iteration 2: f(p) = -555.81708 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 : optimize_result_params(S) 1 2 3 4 +-------------------------------------------------------------+ 1 | -.6558871399 -1.009017051 .1467114648 .5743542793 | +-------------------------------------------------------------+ : sqrt(diagonal(optimize_result_V_oim(S)))' 1 2 3 4 +---------------------------------------------------------+ 1 | .0706483931 .0807960852 .0313761961 .2839519366 | +---------------------------------------------------------+ : end --------------------------------------------------------------------------------

After defining **plleval2()**, I copy the data from Stata to Mata, and I use **optimize_init()** to put the default choices into the **optimize()** object **S**. When I typed

**optimize_init_argument(S, 1, y)**

I put information into **S** specifying that **optimize()** should pass **y** as the first extra argument to the evaluator function. When I typed

**optimize_init_argument(S, 2, X)**

I put information into **S** specifying that **optimize()** should pass **X** as the second extra argument to the evaluator function.

Analogous to example 1, typing

**optimize_init_evaluator(S, &plleval2())**

puts the address of **plleval2()** into **S**, and typing

**optimize_init_params(S, J(1, 4, .01))**

puts the vector of starting values, **J(1, 4, .01)**, in **S**.

The results are the same as those in example 1.

**Vector of observation–level contributions and robust VCE estimation**

Robust estimators for the **VCE** of an estimator use the structure of observation-level contributions; see Wooldridge (2010, chapters 12 and 13) or Cameron and Trivedi (2005, chapter 5). When the evaluator function gives **optimize()** a vector of observation-level contributions, instead of a scalar summation, **optimize()** can use this structure to compute robust or cluster–robust estimators of the **VCE**.

Consider **plleval3()**, which puts the vector of observation-level contributions into **val**.

**Code block 4: A vector of observation-level contributions**

mata: void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = -exp(xb) + y:*xb - lnfactorial(y) } end

To use **plleval3()**, I must put information in the **optimize()** object stating that the evaluator function computes a vector of observation-level contributions. In example 4, I use **optimize_init_evaluatortype()** to put this information into the **optimize()** object **S**.

**Example 4: Robust VCE estimation**

. mata: ------------------------------------------------- mata (type end to exit) ------ : void plleval3(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb > > xb = X*b' > val = -exp(xb) + y:*xb - lnfactorial(y) > } note: argument todo unused note: argument grad unused note: argument hess unused : : : y = st_data(., "accidents") : X = st_data(., "cvalue kids traffic") : X = X,J(rows(X), 1, 1) : : S = optimize_init() : optimize_init_argument(S, 1, y) : optimize_init_argument(S, 2, X) : optimize_init_evaluator(S, &plleval3()) : optimize_init_evaluatortype(S, "gf0") : optimize_init_params(S, J(1, 4, .01)) : : bh = optimize(S) Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66874 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 : optimize_result_params(S) 1 2 3 4 +-------------------------------------------------------------+ 1 | -.6558871527 -1.009017051 .1467114658 .5743542978 | +-------------------------------------------------------------+ : sqrt(diagonal(optimize_result_V_oim(S)))' 1 2 3 4 +---------------------------------------------------------+ 1 | .0706483832 .0807960809 .031376176 .2839517337 | +---------------------------------------------------------+ : sqrt(diagonal(optimize_result_V_robust(S)))' 1 2 3 4 +---------------------------------------------------------+ 1 | .1096020124 .188666044 .092431746 .6045057623 | +---------------------------------------------------------+ : end --------------------------------------------------------------------------------

After defining **plleval3()**, I copy the data, create the **optimize()** object **S**, put the specifications for the extra arguments **y** and **X** in **S**, and put the address of **plleval3()** into **S**. Typing

**optimize_init_evaluatortype(S, “gf0”)**

puts in **S** the information that the evaluator function returns a vector of observation-level contribution and that it computes zero derivatives, that is the evaluator function is type “gf0”. Given the vector structure, I can type

**optimize_result_V_robust(S)**

to compute a robust estimator of the **VCE**.

**sqrt(diagonal(optimize_result_V_robust(S)))’**

returns the robust standard errors, which are the same as those reported by **poisson** in example 5.

**Example 5: Robust VCE estimation by poisson**

. poisson accidents cvalue kids traffic, vce(robust) Iteration 0: log pseudolikelihood = -555.86605 Iteration 1: log pseudolikelihood = -555.8154 Iteration 2: log pseudolikelihood = -555.81538 Poisson regression Number of obs = 505 Wald chi2(3) = 99.76 Prob > chi2 = 0.0000 Log pseudolikelihood = -555.81538 Pseudo R2 = 0.2343 ------------------------------------------------------------------------------ | Robust accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .1096019 -5.98 0.000 -.8707029 -.4410712 kids | -1.009017 .188666 -5.35 0.000 -1.378795 -.6392382 traffic | .1467115 .0924316 1.59 0.112 -.0344512 .3278741 _cons | .574354 .6045047 0.95 0.342 -.6104535 1.759162 ------------------------------------------------------------------------------

**Done and undone**

I showed how to use **optimize()** to maximize a Poisson log-likelihood function. I also showed how to obtain a robust estimator of the **VCE** by coding the evaluator function to compute a vector of observation-level contributions. In my next post, I show how to write a Stata command that uses Mata to estimate the parameters of a Poisson regression model.

**References**

Cameron, A. C., and P. K. Trivedi. 2005. *Microeconometrics: Methods and Applications*. Cambridge: Cambridge University Press.

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, Massachusetts: MIT Press.