## Programming an estimation command in Stata: Handling factor variables in a poisson command using Mata

**mypoisson2.ado** handles factor variables and computes its Poisson regression results in Mata. I discuss the code for **mypoisson2.ado**, which I obtained by adding the method for handling factor variables discussed in Programming an estimation command in Stata: Handling factor variables in optimize() to **mypoisson1.ado**, discussed in Programming an estimation command in Stata: A poisson command using Mata.

This is the twenty-first post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**A Poisson command with Mata computations**

**mypoisson2** computes Poisson regression results in Mata. The syntax of the **mypoisson2** command is

**mypoisson2** *depvar indepvars* [*if*] [*in*] [**,** ** noconstant**]

where *indepvars* can contain factor variables or time-series variables.

In the remainder of this post, I discuss the code for **mypoisson2.ado**. I recommend that you click on the filename to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

**Code block 1: mypoisson2.ado**

*! version 2.0.0 07Feb2016 program define mypoisson2, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant ] marksample touse gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mypoisson1" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval2()) optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) V = optimize_result_V_oim(S) rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval2(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = sum(-exp(xb) + y:*xb - lnfactorial(y)) } end

As with programs that I have previously discussed, there is an ado part and Mata part. Lines 2–56 are the ado part; they define **mypoisson2** and the subroutine **getcinfo**. Lines 58–133 are the Mata part; they define the Mata work function **mywork()** used in **mypoisson2**, the **makeCt** function used in **mywork()**, and the evaluator function **plleval2()** used in **mywork()**.

The ado-command **mypoisson2** has the following parts:

- Lines 5–11 parse what the user typed, identify the sample, and create temporary names for Stata objects used in the computations or returned by our Mata work function.
- Lines 13–15 use the subroutine
**getcinfo**to get information about the user-specified covariates and then store this information in the local**cnames**and a Stata matrix. - Lines 17–18 call the Mata work function.
- Lines 20–30 post the results returned by the Mata work function to
**e()**. - Line 32 displays the results.

The Mata function **mywork()** has the following parts:

- Lines 60–65 parse the arguments.
- Lines 67–69 declare vectors, matrices, and scalars that are local to
**mywork()**. - Lines 71–90 compute the results.
- Lines 92–95 copy the computed results to Stata, using the names that were passed as arguments.

I now discuss the ado-code in some detail, focusing on only the aspects that are new to **mypoisson2.ado**.

The subroutine **getcinfo** encapsulates the computations performed in examples 3, 4, and 5 in Programming an estimation command in Stata: Handling factor variables in optimize(). **getcinfo** uses **_rmcoll** to identify which covariates must be omitted, stores the names of the covariates to be omitted in the local macro **cnames**, then uses **_ms_omit_info** to create a vector containing a 1 for omitted variables and a 0 otherwise. **getcinfo** puts **cnames** into **r(cnames)** and the vector identifying the omitted variables into **r(mo)**.

Lines 14–15 store the information put into **r()** by **getcinfo** in the local macro **cnames** and the Stata vector whose name is contained in the local macro **mo**. Lines 23–25 use **cnames** to put row names on the vector of point estimates and row and column names on the estimated variance–covariance matrix of the estimator (VCE). Line 18 passes the vector to **mywork()**.

I now discuss the Mata code in some detail, again focusing on only the new aspects. Line 79 gets the constraint matrix **Ct** needed to handle any omitted variables from **makeCt()**. Lines 98–121 define **makeCt()**, which encapsulates the computations that form **Cm** in example 6 in Programming an estimation command in Stata: Handling factor variables in optimize(). Line 86 uses **optimize_init_constraints()** to put **Ct** in the **optimize()** object. **Ct** contains a matrix with zero rows when there are no constraints, and putting a constraint matrix with zero rows into the **optimize()** object tells **optimize()** that there are no constraints.

The output in examples 1 and 2 confirms that **mypoisson2** produces the same results as **poisson** when a full set of indicator variables is included in a model with a constant term.

**Example 1: mypoisson2 results**

. clear all . use accident3 . mypoisson2 accidents cvalue ibn.kids traffic, noconstant Iteration 0: f(p) = -843.66874 Iteration 1: f(p) = -573.50561 Iteration 2: f(p) = -545.86215 Iteration 3: f(p) = -545.11765 Iteration 4: f(p) = -545.10899 Iteration 5: f(p) = -545.10898 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582923 .0703823 -9.35 0.000 -.7962391 -.5203456 | kids | 0 | .7157575 .282144 2.54 0.011 .1627653 1.26875 1 | -.9465934 .3111915 -3.04 0.002 -1.556518 -.3366693 2 | -.8589336 .3097583 -2.77 0.006 -1.466049 -.2518184 3 | -2.518175 .4366261 -5.77 0.000 -3.373947 -1.662404 | traffic | .1383977 .0307285 4.50 0.000 .078171 .1986243 ------------------------------------------------------------------------------

**Example 2: poisson results**

. poisson accidents cvalue ibn.kids traffic, noconstant Iteration 0: log likelihood = -1250.3959 Iteration 1: log likelihood = -553.73534 Iteration 2: log likelihood = -545.14915 Iteration 3: log likelihood = -545.10902 Iteration 4: log likelihood = -545.10898 Poisson regression Number of obs = 505 Wald chi2(6) = 285.69 Log likelihood = -545.10898 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582924 .0703822 -9.35 0.000 -.796239 -.5203457 | kids | 0 | .7157576 .2821434 2.54 0.011 .1627666 1.268749 1 | -.9465933 .311191 -3.04 0.002 -1.556516 -.3366701 2 | -.8589334 .3097578 -2.77 0.006 -1.466048 -.2518192 3 | -2.51817 .436625 -5.77 0.000 -3.37394 -1.662401 | traffic | .1383977 .0307284 4.50 0.000 .0781711 .1986242 ------------------------------------------------------------------------------

**Done and undone**

I discussed **mypoisson2**, which handles factor variables and uses Mata to compute Poisson regression results. In my next post, I add robust and cluster–robust estimators of the VCE.