## Programming an estimation command in Stata: Making predict work

I make **predict** work after **mypoisson5** by writing an ado-command that computes the predictions and by having **mypoisson5** store the name of this new ado-command in **e(predict)**. The ado-command that computes predictions using the parameter estimates computed by ado-command **mytest** should be named **mytest_p**, by convention. In the next section, I discuss **mypoisson5_p**, which computes predictions after **mypoisson5**. In section Storing the name of the prediction command in e(predict), I show that storing the name **mypoisson5_p** in **e(predict)** requires only a one-line change to **mypoisson4.ado**, which I discussed in Programming an estimation command in Stata: Adding analytical derivatives to a poisson command using Mata.

This is the twenty-fourth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**An ado-command that computes predictions**

The syntax of **mypoisson5_p** is

**mypoisson5_p** [*type*] *newvarname* [*if*] [*in*] [**, n xb**]

**mypoisson5_p** computes the expected number of counts when option **n** is specified, and it computes the linear predictions when option **xb** is specified. **n** is the default option, if neither **xb** or **n** is specified by the user. Despite the syntax diagram, the user may not specify both **xb** and **n**.

Now consider the code for this command in code block 1.

**mypoisson5_p.ado**

*! version 1.0.0 10Mar2016 program define mypoisson5_p version 14 syntax newvarname [if] [in] , [ xb n ] marksample touse, novarlist local nopts : word count `xb' `n' if `nopts' >1 { display "{err}only one statistic may be specified" exit 498 } if `nopts' == 0 { local n n display "expected counts" } if "`xb'" != "" { _predict `typlist' `varlist' if `touse' , xb } else { tempvar xbv quietly _predict double `xbv' if `touse' , xb generate `typlist' `varlist' = exp(`xbv') if `touse' } end

Line 5 uses **syntax** to parse the syntax specified in the syntax diagram above. Line 5 specifies that **mypoisson5_p** requires the name of a new variable, that it allows an **if** or **in** condition, and that it accepts the options **xb** and **n**. **syntax newvarname** specifies that the user must specify a name for a variable that is not in the dataset in memory. **syntax** stores the name of the new variable in the local macro **varlist**. If the user specifies a variable type in addition to the variable name, the type will be stored in the local macro **typlist**. For example, if the user specified

. mypoisson5_p double yhat

the local macro **varlist** would contain “yhat” and the local macro **typlist** would contain “double”. If the user does not specify a type, the local macro **typlist** contains nothing.

Line 7 uses **marksample** to create a sample-identification variable whose name will be in the local macro **touse**. Unlike the examples in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, I specified the option **novarlist** on **marksample** so that **marksample** will use only the user-specified **if** or **in** restrictions to create the sample-identification variable and not use the nonexistent observations in the new variable.

The options **xb** and **n** specify which statistic to compute. The syntax command on line 5 allows users to specify

- the
**xb**option, - the
**n**option, - both the
**xb**option and the**n**option, or - neither the
**xb**option nor the**n**option.

In case (1), the local macro **xb** will contain “xb” and the local macro **n** will contain nothing. In case (2), the local macro **xb** will contain nothing and the local macro **n** will contain “n”. In case (3), the local macro **xb** will contain “xb” and the local macro **n** will contain “n”. In case (4), the local macro **xb** will contain nothing and the local macro **n** will contain nothing.

The syntax diagram and its discussion imply that cases (1), (2), and (4) are valid, but that case (3) would be an error. Line 9 puts the number of options specified by the user in the local macro **nopts**. The rest of the code uses **nopts**, **xb**, and **n** to handle cases (1)–(4).

Lines 10–13 handle case (3) by exiting with a polite error message when **nopts** contains 2.

Lines 15–18 handle case (4) by putting “n” in the local macro **n** when **nopts** contains 0.

At this point, we have handled cases (3) and (4), and we use **xb** and **n** to handle cases (1) and (2), because either **xb** is not empty and **n** is empty, or **xb** is empty and **n** is not empty.

Lines 20–22 handle case (1) by using **_predict** to compute the **xb** predictions when the local macro **xb** is not empty. Note that the predictions are computed at the precision specified by the user.

Lines 23–27 handle case (2) by using **_predict** to compute **xb** in a temporary variable that is subsequently used to compute **n**. Note that the temporary variable for **xb** is always computed in double precision and that the **n** is computed at the precision specified by the user.

**Storing the name of the prediction command in e(predict)**

To compute the **xb** statistic, users type

. predict double yhat, xb

instead of typing

. mypoisson5_p double yhat, xb

This syntax works because the **predict** command uses the ado-command whose name is stored in **e(predict)**. On line 50 of **mypoisson5** in code block 2, I store “mypoisson5_p” in **e(predict)**. This addition is the only difference between **mypoisson5.ado** in code block 2 and **mypoisson4.ado** in code block 5 in Programming an estimation command in Stata: Adding analytical derivatives to a poisson command using Mata.

**Code block 2: mypoisson5.ado**

*! version 5.0.0 10Mar2016 program define mypoisson5, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant vce(string) ] marksample touse _vce_parse `touse' , optlist(Robust) argoptlist(CLuster) : , vce(`vce') local vce "`r(vce)'" local clustervar "`r(cluster)'" if "`vce'" == "robust" | "`vce'" == "cluster" { local vcetype "Robust" } if "`clustervar'" != "" { capture confirm numeric variable `clustervar' if _rc { display in red "invalid vce() option" display in red "cluster variable {bf:`clustervar'} is " /// "string variable instead of a numeric variable" exit(198) } sort `clustervar' } gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'", "`vce'", "`clustervar'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local clustvar "`clustervar'" ereturn local predict "mypoisson5_p" ereturn local cmd "mypoisson5" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo, string scalar vcetype, string scalar clustervar) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf2") optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) if (vcetype == "robust") { V = optimize_result_V_robust(S) } else if (vcetype == "cluster") { cvar = st_data(., clustervar, touse) optimize_init_cluster(S, cvar) V = optimize_result_V_robust(S) } else { // vcetype must IID V = optimize_result_V_oim(S) } rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb, mu xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y)) if (todo>=1) { grad = (y - mu):*X } if (todo==2) { hess = -quadcross(X, mu, X) } } end

Example 1 illustrates that our implementation works by comparing the predictions obtained after **mypoisson5** with those obtained after **poisson**.

**Example 1: predict after mypoisson5**

. clear all . use accident3 . quietly poisson accidents cvalue kids traffic . predict double n1 (option n assumed; predicted number of events) . quietly mypoisson5 accidents cvalue kids traffic . predict double n2 expected counts . list n1 n2 in 1/5 +-----------------------+ | n1 n2 | |-----------------------| 1. | .15572052 .15572052 | 2. | .47362502 .47362483 | 3. | .46432954 .46432946 | 4. | .84841301 .84841286 | 5. | .40848207 .40848209 | +-----------------------+

**Done and undone**

I made **predict** work after **mypoisson5** by writing an ado-command that computes the prediction and by storing the name of this ado-command in **e(predict)**.

In my next post, I discuss how to check that a working command is still working, a topic known as certification.

Pingback: The Stata Blog » Programming an estimation command in Stata: A map to posted entries()