Programming an estimation command in Stata: Allowing for robust or cluster–robust standard errors in a poisson command using Mata
mypoisson3.ado adds options for a robust or a cluster–robust estimator of the variance–covariance of the estimator (VCE) to mypoisson2.ado, which I discussed in Programming an estimation command in Stata: Handling factor variables in a poisson command using Mata. mypoisson3.ado parses the vce() option using the techniques I discussed in Programming an estimation command in Stata: Adding robust and cluster–robust VCEs to our Mata based OLS command. Below, I show how to use optimize() to compute the robust or cluster–robust VCE.
I only discuss what is new in the code for mypoisson3.ado, assuming that you are familiar with mypoisson2.ado.
This is the twenty-second post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
A poisson command with options for a robust or a cluster–robust VCE
mypoisson3 computes Poisson-regression results in Mata. The syntax of the mypoisson3 command is
mypoisson3 depvar indepvars [if] [in] [, vce(robust | cluster clustervar) noconstant]
where indepvars can contain factor variables or time-series variables.
In the remainder of this post, I discuss the code for mypoisson3.ado. I recommend that you click on the filename to download the code. To avoid scrolling, view the code in the Do-file Editor, or your favorite text editor, to see the line numbers.
*! version 3.0.0 21Feb2016 program define mypoisson3, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant vce(string) ] marksample touse _vce_parse `touse' , optlist(Robust) argoptlist(CLuster) : , vce(`vce') local vce "`r(vce)'" local clustervar "`r(cluster)'" if "`vce'" == "robust" | "`vce'" == "cluster" { local vcetype "Robust" } if "`clustervar'" != "" { capture confirm numeric variable `clustervar' if _rc { display in red "invalid vce() option" display in red "cluster variable {bf:`clustervar'} is " /// "string variable instead of a numeric variable" exit(198) } sort `clustervar' } gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'", "`vce'", "`clustervar'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local clustvar "`clustervar'" ereturn local cmd "mypoisson3" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo, string scalar vcetype, string scalar clustervar) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf0") optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) if (vcetype == "robust") { V = optimize_result_V_robust(S) } else if (vcetype == "cluster") { cvar = st_data(., clustervar, touse) optimize_init_cluster(S, cvar) V = optimize_result_V_robust(S) } else { // vcetype must IID V = optimize_result_V_oim(S) } rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = (-exp(xb) + y:*xb - lnfactorial(y)) } end
Only a few lines of mypoisson3.ado differ from their counterparts in mypoisson2.ado, and I put these changes into four groups.
- Line 5 allows vce() on the syntax command, and lines 8–23 parse this option.
I discussed the techniques used in these changes in Programming an estimation com-
mand in Stata: Adding robust and cluster–robust VCEs to our Mata based OLS command, when I used them in myregress12.ado. These lines- put the specified VCE in the local macro vce;
- put a label for the specified VCE in the local macro vcetype;
- put the name of a specified cluster variable in the local macro clustervar, and
- handle any errors when the user misspecifies the vce() option.
- Line 35 passes the contents of the local macros vce and clustervar to the Mata work function mywork().
- Lines 47–49 store the local macros vce, vcetype, and clustvar in e() results.
- Line 84 parses the new arguments vcetype and clustervar. The string scalar vcetype contains the type of VCE to be estimated, and the string scalar clustervar contains the name of the Stata variable containing the clusters, if specified.
- Lines 112–122 use the contents of vcetype to return an OIM, a robust, or a cluster–robust estimator of the VCE.
The contents of vcetype determine which optimize() function is called to compute the estimated VCE. If vcetype contains robust, line 113 uses optimize_result_V_robust() to compute a robust estimator of the VCE. If vcetype contains cluster, lines 116 and 117 put a copy of the Stata cluster variable in the optimize object, and then line 118 uses optimize_result_V_robust() to compute a cluster–robust estimator of the VCE. Finally, if vcetype is empty, line 121 uses optimize_result_V_oim() to compute the default correct-specification estimator of the VCE.
The output in examples 1 and 2 confirms that mypoisson3 produces the same results as poisson when the option vce(cluster id) is specified.
Example 1: mypoisson3 results
. clear all . use accident3 . mypoisson3 accidents cvalue i.kids traffic, vce(cluster id) Iteration 0: f(p) = -847.19028 Iteration 1: f(p) = -573.7331 Iteration 2: f(p) = -545.76673 Iteration 3: f(p) = -545.11357 Iteration 4: f(p) = -545.10898 Iteration 5: f(p) = -545.10898 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582924 .1128794 -5.83 0.000 -.8795319 -.4370529 | kids | 1 | -1.662351 .4309205 -3.86 0.000 -2.50694 -.8177623 2 | -1.574691 .4164515 -3.78 0.000 -2.390921 -.7584611 3 | -3.233933 .4685643 -6.90 0.000 -4.152302 -2.315564 | traffic | .1383976 .0876168 1.58 0.114 -.0333282 .3101235 _cons | .7157579 .5970943 1.20 0.231 -.4545254 1.886041 ------------------------------------------------------------------------------
Example 2: poisson results
. poisson accidents cvalue i.kids traffic, vce(cluster id) Iteration 0: log pseudolikelihood = -546.35782 Iteration 1: log pseudolikelihood = -545.11016 Iteration 2: log pseudolikelihood = -545.10898 Iteration 3: log pseudolikelihood = -545.10898 Poisson regression Number of obs = 505 Wald chi2(5) = 118.06 Prob > chi2 = 0.0000 Log pseudolikelihood = -545.10898 Pseudo R2 = 0.2491 (Std. Err. adjusted for 285 clusters in id) ------------------------------------------------------------------------------ | Robust accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582924 .1128793 -5.83 0.000 -.8795317 -.437053 | kids | 1 | -1.662351 .4309205 -3.86 0.000 -2.50694 -.8177622 2 | -1.574691 .4164515 -3.78 0.000 -2.390921 -.758461 3 | -3.233932 .4685642 -6.90 0.000 -4.152301 -2.315563 | traffic | .1383977 .0876167 1.58 0.114 -.0333279 .3101232 _cons | .7157576 .597093 1.20 0.231 -.4545232 1.886038 ------------------------------------------------------------------------------
Done and undone
I discussed mypoisson3, which has options for a robust or a cluster–robust estimator of the variance–covariance of the estimator. In my next post, I discuss how to have the evaluator function compute the derivatives to speed up the optimization.