Programming an estimation command in Stata: Handling factor variables in a poisson command using Mata
mypoisson2.ado handles factor variables and computes its Poisson regression results in Mata. I discuss the code for mypoisson2.ado, which I obtained by adding the method for handling factor variables discussed in Programming an estimation command in Stata: Handling factor variables in optimize() to mypoisson1.ado, discussed in Programming an estimation command in Stata: A poisson command using Mata.
This is the twenty-first post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
A Poisson command with Mata computations
mypoisson2 computes Poisson regression results in Mata. The syntax of the mypoisson2 command is
mypoisson2 depvar indepvars [if] [in] [, noconstant]
where indepvars can contain factor variables or time-series variables.
In the remainder of this post, I discuss the code for mypoisson2.ado. I recommend that you click on the filename to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.
*! version 2.0.0 07Feb2016 program define mypoisson2, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant ] marksample touse gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mypoisson1" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval2()) optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) V = optimize_result_V_oim(S) rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval2(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = sum(-exp(xb) + y:*xb - lnfactorial(y)) } end
As with programs that I have previously discussed, there is an ado part and Mata part. Lines 2–56 are the ado part; they define mypoisson2 and the subroutine getcinfo. Lines 58–133 are the Mata part; they define the Mata work function mywork() used in mypoisson2, the makeCt function used in mywork(), and the evaluator function plleval2() used in mywork().
The ado-command mypoisson2 has the following parts:
- Lines 5–11 parse what the user typed, identify the sample, and create temporary names for Stata objects used in the computations or returned by our Mata work function.
- Lines 13–15 use the subroutine getcinfo to get information about the user-specified covariates and then store this information in the local cnames and a Stata matrix.
- Lines 17–18 call the Mata work function.
- Lines 20–30 post the results returned by the Mata work function to e().
- Line 32 displays the results.
The Mata function mywork() has the following parts:
- Lines 60–65 parse the arguments.
- Lines 67–69 declare vectors, matrices, and scalars that are local to mywork().
- Lines 71–90 compute the results.
- Lines 92–95 copy the computed results to Stata, using the names that were passed as arguments.
I now discuss the ado-code in some detail, focusing on only the aspects that are new to mypoisson2.ado.
The subroutine getcinfo encapsulates the computations performed in examples 3, 4, and 5 in Programming an estimation command in Stata: Handling factor variables in optimize(). getcinfo uses _rmcoll to identify which covariates must be omitted, stores the names of the covariates to be omitted in the local macro cnames, then uses _ms_omit_info to create a vector containing a 1 for omitted variables and a 0 otherwise. getcinfo puts cnames into r(cnames) and the vector identifying the omitted variables into r(mo).
Lines 14–15 store the information put into r() by getcinfo in the local macro cnames and the Stata vector whose name is contained in the local macro mo. Lines 23–25 use cnames to put row names on the vector of point estimates and row and column names on the estimated variance–covariance matrix of the estimator (VCE). Line 18 passes the vector to mywork().
I now discuss the Mata code in some detail, again focusing on only the new aspects. Line 79 gets the constraint matrix Ct needed to handle any omitted variables from makeCt(). Lines 98–121 define makeCt(), which encapsulates the computations that form Cm in example 6 in Programming an estimation command in Stata: Handling factor variables in optimize(). Line 86 uses optimize_init_constraints() to put Ct in the optimize() object. Ct contains a matrix with zero rows when there are no constraints, and putting a constraint matrix with zero rows into the optimize() object tells optimize() that there are no constraints.
The output in examples 1 and 2 confirms that mypoisson2 produces the same results as poisson when a full set of indicator variables is included in a model with a constant term.
Example 1: mypoisson2 results
. clear all . use accident3 . mypoisson2 accidents cvalue ibn.kids traffic, noconstant Iteration 0: f(p) = -843.66874 Iteration 1: f(p) = -573.50561 Iteration 2: f(p) = -545.86215 Iteration 3: f(p) = -545.11765 Iteration 4: f(p) = -545.10899 Iteration 5: f(p) = -545.10898 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582923 .0703823 -9.35 0.000 -.7962391 -.5203456 | kids | 0 | .7157575 .282144 2.54 0.011 .1627653 1.26875 1 | -.9465934 .3111915 -3.04 0.002 -1.556518 -.3366693 2 | -.8589336 .3097583 -2.77 0.006 -1.466049 -.2518184 3 | -2.518175 .4366261 -5.77 0.000 -3.373947 -1.662404 | traffic | .1383977 .0307285 4.50 0.000 .078171 .1986243 ------------------------------------------------------------------------------
Example 2: poisson results
. poisson accidents cvalue ibn.kids traffic, noconstant Iteration 0: log likelihood = -1250.3959 Iteration 1: log likelihood = -553.73534 Iteration 2: log likelihood = -545.14915 Iteration 3: log likelihood = -545.10902 Iteration 4: log likelihood = -545.10898 Poisson regression Number of obs = 505 Wald chi2(6) = 285.69 Log likelihood = -545.10898 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582924 .0703822 -9.35 0.000 -.796239 -.5203457 | kids | 0 | .7157576 .2821434 2.54 0.011 .1627666 1.268749 1 | -.9465933 .311191 -3.04 0.002 -1.556516 -.3366701 2 | -.8589334 .3097578 -2.77 0.006 -1.466048 -.2518192 3 | -2.51817 .436625 -5.77 0.000 -3.37394 -1.662401 | traffic | .1383977 .0307284 4.50 0.000 .0781711 .1986242 ------------------------------------------------------------------------------
Done and undone
I discussed mypoisson2, which handles factor variables and uses Mata to compute Poisson regression results. In my next post, I add robust and cluster–robust estimators of the VCE.