## Programming an estimation command in Stata: A poisson command using Mata

\(

\newcommand{\xb}{{\bf x}}

\newcommand{\betab}{\boldsymbol{\beta}}\)I discuss **mypoisson1**, which computes Poisson-regression results in Mata. The code in **mypoisson1.ado** is remarkably similar to the code in **myregress11.ado**, which computes ordinary least-squares (OLS) results in Mata, as I discussed in Programming an estimation command in Stata: An OLS command using Mata.

I build on previous posts. I use the structure of Stata programs that use Mata work functions that I discussed previously in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. You should be familiar with Poisson regression and using **optimize()**, which I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

**A poisson command with Mata computations**

The Stata command **mypoisson1** computes the results in Mata. The syntax of the **mypoisson1** command is

**mypoisson1** *depvar* *indepvars* [if] [in] [**,** ** noconstant**]

where *indepvars* can contain time-series variables. **mypoisson1** does not allow for factor variables because they complicate the program. I discuss these complications, and present solutions, in my next post.

In the remainder of this post, I discuss the code for **mypoisson1.ado**. I recommend that you click on the file name to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

**Code block 1: mypoisson1.ado**

*! version 1.0.0 31Jan2016 program define mypoisson1, eclass sortpreserve version 14.1 syntax varlist(numeric ts min=2) [if] [in] [, noCONStant ] marksample touse gettoken depvar indepvars : varlist _rmcoll `indepvars', `constant' forcedrop local indepvars "`r(varlist)'" tempname b V N rank mata: mywork("`depvar'", "`indepvars'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'") if "`constant'" == "" { local cnames "`indepvars' _cons" } else { local cnames "`indepvars'" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mypoisson1" ereturn display end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname) { real vector y, b, mo_v, cv real matrix X, V, Cm real scalar n, p, rank, ko y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval2()) optimize_init_params(S, J(1, p, .01)) b = optimize(S) V = optimize_result_V_oim(S) rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } void plleval2(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = sum(-exp(xb) + y:*xb - lnfactorial(y)) } end

**mypoisson1.ado** has the structure of Stata programs that compute their results in Mata, which I discussed in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. Lines 1–35 define the ado-command **mypoisson1**. Lines 37–83 define the Mata work function **mywork()** used in **mypoisson1** and the evaluator function **plleval2()** used in **mywork()**.

The ado-command **mypoisson1** has four parts:

- Lines 5–13 parse what the user typed, identify the sample, drop collinear variables from the list of independent variables, and create temporary names for Stata objects returned by our Mata work function.
- Lines 15–16 call the Mata work function.
- Lines 18–31 post the results returned by the Mata work function to
**e()**. - Line 33 displays the results.

The Mata work function **mywork()** has four parts.

- Lines 39–42 parse the arguments.
- Lines 45–47 declare vectors, matrices, and scalars that are local to
**mywork()**. - Lines 49–65 compute the results.
- Lines 67–70 copy the computed results to Stata, using the names that were passed in arguments.

Now, I discuss the ado-code in some detail. Lines 2–35 are almost the same as lines 2–35 of **myregress11**, which I discussed in Programming an estimation command in Stata: An OLS command using Mata. That **myregress11** handles factor variables but **mypoisson1** does not causes most of the differences. That **myregress11** stores the residual degrees of freedom but **mypoisson1** does not causes two minor differences.

**myregress11** uses the matrix inverter to handle cases of collinear independent variables. In the Poisson-regression case discussed here, collinear independent variables define an unconstrained problem without a unique solution, so **optimize()** cannot converge. **mypoisson1** drops collinear independent variables to avoid this problem. I discuss a better solution in my next post.

Lines 10–11 use **_rmcoll** to drop the collinear independent variables and store the list of linearly independent variables in the local macro **indepvars**.

Now, I discuss the Mata work function **mywork()** in some detail. The **mywork()** defined on lines 39–71 is remarkably similar to the **mywork()** function defined on lines 39–72 of **myregress11.ado**, discussed in Programming an estimation command in Stata: An OLS command using Mata. In **mypoisson1.ado**, lines 57–65 use **optimize()** to compute the results. In **myregress11.ado**, lines 58–64 use matrix computations to compute the results.

Lines 73–81 define the evaluator function **plleval2()**, which is used by **optimize()** to compute the results, as I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

Examples 1 and 2 illustrate that **mypoisson1** produces the same results as **poisson**.

**Example 1: mypoisson1**

**(Uses accident3.dta)**

. clear all . use accident3 . mypoisson1 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66874 Iteration 2: f(p) = -555.81708 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943554 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082077 _cons | .5743543 .2839519 2.02 0.043 .0178187 1.13089 ------------------------------------------------------------------------------

**Example 2: poisson**

poisson accidents cvalue kids traffic Iteration 0: log likelihood = -555.86605 Iteration 1: log likelihood = -555.8154 Iteration 2: log likelihood = -555.81538 Poisson regression Number of obs = 505 LR chi2(3) = 340.20 Prob > chi2 = 0.0000 Log likelihood = -555.81538 Pseudo R2 = 0.2343 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506594 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .574354 .2839515 2.02 0.043 .0178193 1.130889 ------------------------------------------------------------------------------

**Done and undone**

I discussed **mypoisson1**, which computes Poisson-regression results in Mata. I highlighted how similar the code in **mypoisson1.ado** is to the code in **myregress11.ado**.

I also discussed that **mypoisson1** drops collinear independent variables and noted that **mypoisson1** does not handle factor variables. In my next post, I discuss a solution to the problems caused by collinear independent variables and discuss a command that handles factor variables.