Programming an estimation command in Stata: A poisson command using Mata
\(
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)I discuss mypoisson1, which computes Poisson-regression results in Mata. The code in mypoisson1.ado is remarkably similar to the code in myregress11.ado, which computes ordinary least-squares (OLS) results in Mata, as I discussed in Programming an estimation command in Stata: An OLS command using Mata.
I build on previous posts. I use the structure of Stata programs that use Mata work functions that I discussed previously in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. You should be familiar with Poisson regression and using optimize(), which I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.
This is the nineteenth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
A poisson command with Mata computations
The Stata command mypoisson1 computes the results in Mata. The syntax of the mypoisson1 command is
mypoisson1 depvar indepvars [if] [in] [, noconstant]
where indepvars can contain time-series variables. mypoisson1 does not allow for factor variables because they complicate the program. I discuss these complications, and present solutions, in my next post.
In the remainder of this post, I discuss the code for mypoisson1.ado. I recommend that you click on the file name to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.
Code block 1: mypoisson1.ado
*! version 1.0.0 31Jan2016 program define mypoisson1, eclass sortpreserve version 14.1 syntax varlist(numeric ts min=2) [if] [in] [, noCONStant ] marksample touse gettoken depvar indepvars : varlist _rmcoll `indepvars', `constant' forcedrop local indepvars "`r(varlist)'" tempname b V N rank mata: mywork("`depvar'", "`indepvars'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'") if "`constant'" == "" { local cnames "`indepvars' _cons" } else { local cnames "`indepvars'" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mypoisson1" ereturn display end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname) { real vector y, b, mo_v, cv real matrix X, V, Cm real scalar n, p, rank, ko y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval2()) optimize_init_params(S, J(1, p, .01)) b = optimize(S) V = optimize_result_V_oim(S) rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } void plleval2(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = sum(-exp(xb) + y:*xb - lnfactorial(y)) } end
mypoisson1.ado has the structure of Stata programs that compute their results in Mata, which I discussed in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. Lines 1–35 define the ado-command mypoisson1. Lines 37–83 define the Mata work function mywork() used in mypoisson1 and the evaluator function plleval2() used in mywork().
The ado-command mypoisson1 has four parts:
- Lines 5–13 parse what the user typed, identify the sample, drop collinear variables from the list of independent variables, and create temporary names for Stata objects returned by our Mata work function.
- Lines 15–16 call the Mata work function.
- Lines 18–31 post the results returned by the Mata work function to e().
- Line 33 displays the results.
The Mata work function mywork() has four parts.
- Lines 39–42 parse the arguments.
- Lines 45–47 declare vectors, matrices, and scalars that are local to mywork().
- Lines 49–65 compute the results.
- Lines 67–70 copy the computed results to Stata, using the names that were passed in arguments.
Now, I discuss the ado-code in some detail. Lines 2–35 are almost the same as lines 2–35 of myregress11, which I discussed in Programming an estimation command in Stata: An OLS command using Mata. That myregress11 handles factor variables but mypoisson1 does not causes most of the differences. That myregress11 stores the residual degrees of freedom but mypoisson1 does not causes two minor differences.
myregress11 uses the matrix inverter to handle cases of collinear independent variables. In the Poisson-regression case discussed here, collinear independent variables define an unconstrained problem without a unique solution, so optimize() cannot converge. mypoisson1 drops collinear independent variables to avoid this problem. I discuss a better solution in my next post.
Lines 10–11 use _rmcoll to drop the collinear independent variables and store the list of linearly independent variables in the local macro indepvars.
Now, I discuss the Mata work function mywork() in some detail. The mywork() defined on lines 39–71 is remarkably similar to the mywork() function defined on lines 39–72 of myregress11.ado, discussed in Programming an estimation command in Stata: An OLS command using Mata. In mypoisson1.ado, lines 57–65 use optimize() to compute the results. In myregress11.ado, lines 58–64 use matrix computations to compute the results.
Lines 73–81 define the evaluator function plleval2(), which is used by optimize() to compute the results, as I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.
Examples 1 and 2 illustrate that mypoisson1 produces the same results as poisson.
Example 1: mypoisson1
(Uses accident3.dta)
. clear all . use accident3 . mypoisson1 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66874 Iteration 2: f(p) = -555.81708 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943554 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082077 _cons | .5743543 .2839519 2.02 0.043 .0178187 1.13089 ------------------------------------------------------------------------------
Example 2: poisson
poisson accidents cvalue kids traffic Iteration 0: log likelihood = -555.86605 Iteration 1: log likelihood = -555.8154 Iteration 2: log likelihood = -555.81538 Poisson regression Number of obs = 505 LR chi2(3) = 340.20 Prob > chi2 = 0.0000 Log likelihood = -555.81538 Pseudo R2 = 0.2343 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506594 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .574354 .2839515 2.02 0.043 .0178193 1.130889 ------------------------------------------------------------------------------
Done and undone
I discussed mypoisson1, which computes Poisson-regression results in Mata. I highlighted how similar the code in mypoisson1.ado is to the code in myregress11.ado.
I also discussed that mypoisson1 drops collinear independent variables and noted that mypoisson1 does not handle factor variables. In my next post, I discuss a solution to the problems caused by collinear independent variables and discuss a command that handles factor variables.