Home > Programming > Programming an estimation command in Stata: Making predict work

Programming an estimation command in Stata: Making predict work

I make predict work after mypoisson5 by writing an ado-command that computes the predictions and by having mypoisson5 store the name of this new ado-command in e(predict). The ado-command that computes predictions using the parameter estimates computed by ado-command mytest should be named mytest_p, by convention. In the next section, I discuss mypoisson5_p, which computes predictions after mypoisson5. In section Storing the name of the prediction command in e(predict), I show that storing the name mypoisson5_p in e(predict) requires only a one-line change to mypoisson4.ado, which I discussed in Programming an estimation command in Stata: Adding analytical derivatives to a poisson command using Mata.

This is the twenty-fourth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

An ado-command that computes predictions

The syntax of mypoisson5_p is

mypoisson5_p [type] newvarname [if] [in] [, n xb]

mypoisson5_p computes the expected number of counts when option n is specified, and it computes the linear predictions when option xb is specified. n is the default option, if neither xb or n is specified by the user. Despite the syntax diagram, the user may not specify both xb and n.

Now consider the code for this command in code block 1.

mypoisson5_p.ado

*! version 1.0.0  10Mar2016
program define mypoisson5_p
    version 14

    syntax newvarname [if] [in] , [ xb n ]

    marksample touse, novarlist

    local nopts : word count `xb' `n'
    if `nopts' >1 {
        display "{err}only one statistic may be specified"
        exit 498
    }

    if `nopts' == 0 {
        local n n
        display "expected counts"
    }

    if "`xb'" != "" {
        _predict `typlist' `varlist' if `touse' , xb
    }
    else {
        tempvar xbv
        quietly _predict double `xbv' if `touse' , xb
        generate `typlist' `varlist' = exp(`xbv') if `touse'
    }
end

Line 5 uses syntax to parse the syntax specified in the syntax diagram above. Line 5 specifies that mypoisson5_p requires the name of a new variable, that it allows an if or in condition, and that it accepts the options xb and n. syntax newvarname specifies that the user must specify a name for a variable that is not in the dataset in memory. syntax stores the name of the new variable in the local macro varlist. If the user specifies a variable type in addition to the variable name, the type will be stored in the local macro typlist. For example, if the user specified

. mypoisson5_p double yhat

the local macro varlist would contain “yhat” and the local macro typlist would contain “double”. If the user does not specify a type, the local macro typlist contains nothing.

Line 7 uses marksample to create a sample-identification variable whose name will be in the local macro touse. Unlike the examples in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, I specified the option novarlist on marksample so that marksample will use only the user-specified if or in restrictions to create the sample-identification variable and not use the nonexistent observations in the new variable.

The options xb and n specify which statistic to compute. The syntax command on line 5 allows users to specify

  1. the xb option,
  2. the n option,
  3. both the xb option and the n option, or
  4. neither the xb option nor the n option.

In case (1), the local macro xb will contain “xb” and the local macro n will contain nothing. In case (2), the local macro xb will contain nothing and the local macro n will contain “n”. In case (3), the local macro xb will contain “xb” and the local macro n will contain “n”. In case (4), the local macro xb will contain nothing and the local macro n will contain nothing.

The syntax diagram and its discussion imply that cases (1), (2), and (4) are valid, but that case (3) would be an error. Line 9 puts the number of options specified by the user in the local macro nopts. The rest of the code uses nopts, xb, and n to handle cases (1)–(4).

Lines 10–13 handle case (3) by exiting with a polite error message when nopts contains 2.

Lines 15–18 handle case (4) by putting “n” in the local macro n when nopts contains 0.

At this point, we have handled cases (3) and (4), and we use xb and n to handle cases (1) and (2), because either xb is not empty and n is empty, or xb is empty and n is not empty.

Lines 20–22 handle case (1) by using _predict to compute the xb predictions when the local macro xb is not empty. Note that the predictions are computed at the precision specified by the user.

Lines 23–27 handle case (2) by using _predict to compute xb in a temporary variable that is subsequently used to compute n. Note that the temporary variable for xb is always computed in double precision and that the n is computed at the precision specified by the user.

Storing the name of the prediction command in e(predict)

To compute the xb statistic, users type

. predict double yhat, xb

instead of typing

. mypoisson5_p double yhat, xb

This syntax works because the predict command uses the ado-command whose name is stored in e(predict). On line 50 of mypoisson5 in code block 2, I store “mypoisson5_p” in e(predict). This addition is the only difference between mypoisson5.ado in code block 2 and mypoisson4.ado in code block 5 in Programming an estimation command in Stata: Adding analytical derivatives to a poisson command using Mata.

Code block 2: mypoisson5.ado

*! version 5.0.0  10Mar2016
program define mypoisson5, eclass sortpreserve
    version 14

    syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant vce(string) ]
    marksample touse

    _vce_parse `touse' , optlist(Robust) argoptlist(CLuster) : , vce(`vce')
    local vce        "`r(vce)'"
    local clustervar "`r(cluster)'"
    if "`vce'" == "robust" | "`vce'" == "cluster" {
        local vcetype "Robust"
    }
    if "`clustervar'" != "" {
        capture confirm numeric variable `clustervar'
        if _rc {
            display in red "invalid vce() option"
            display in red "cluster variable {bf:`clustervar'} is " ///
                "string variable instead of a numeric variable"
            exit(198)
        }
        sort `clustervar'
    }

    gettoken depvar indepvars : varlist
    _fv_check_depvar `depvar'

    tempname b mo V N rank

    getcinfo `indepvars' , `constant'
    local  cnames "`r(cnames)'"
    matrix `mo' = r(mo)

    mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", ///
       "`b'", "`V'", "`N'", "`rank'", "`mo'", "`vce'", "`clustervar'")

    if "`constant'" == "" {
        local cnames "`cnames' _cons"
    }
    matrix colnames `b' = `cnames'
    matrix colnames `V' = `cnames'
    matrix rownames `V' = `cnames'

    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn local  vce      "`vce'"
    ereturn local  vcetype  "`vcetype'"
    ereturn local  clustvar "`clustervar'"
    ereturn local  predict "mypoisson5_p"
    ereturn local  cmd     "mypoisson5"

    ereturn display

end

program getcinfo, rclass
    syntax varlist(ts fv), [ noCONStant ]

    _rmcoll `varlist' , `constant' expand
    local cnames `r(varlist)'
    local p : word count `cnames'
    if "`constant'" == "" {
        local p = `p' + 1
        local cons _cons
    }

    tempname b mo

    matrix `b' = J(1, `p', 0)
    matrix colnames `b' = `cnames' `cons'
    _ms_omit_info `b'
    matrix `mo' = r(omit)

    return local  cnames "`cnames'"
    return matrix mo = `mo'
end

mata:

void mywork( string scalar depvar,  string scalar indepvars,
             string scalar touse,   string scalar constant,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname,
             string scalar mo,
             string scalar vcetype, string scalar clustervar)
{

    real vector y, b
    real matrix X, V, Ct
    real scalar n, p, rank

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indepvars, touse)
    if (constant == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    Ct = makeCt(mo)

    S  = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &plleval3())
    optimize_init_evaluatortype(S, "gf2")
    optimize_init_params(S, J(1, p, .01))
    optimize_init_constraints(S, Ct)

    b    = optimize(S)

    if (vcetype == "robust") {
        V    = optimize_result_V_robust(S)
    }
    else if (vcetype == "cluster") {
        cvar = st_data(., clustervar, touse)
        optimize_init_cluster(S, cvar)
        V    = optimize_result_V_robust(S)
    }
    else {                 // vcetype must IID
        V    = optimize_result_V_oim(S)
    }
    rank = p - diag0cnt(invsym(V))

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, rank)
}

real matrix makeCt(string scalar mo)
{
    real vector mo_v
    real scalar ko, j, p

    mo_v = st_matrix(mo)
    p    = cols(mo_v)
    ko   = sum(mo_v)
    if (ko>0) {
        Ct   = J(0, p, .)
        for(j=1; j<=p; j++) {
            if (mo_v[j]==1) {
                Ct  = Ct \ e(j, p)
            }
        }
        Ct = Ct, J(ko, 1, 0)
    }
    else {
        Ct = J(0,p+1,.)
    }

    return(Ct)

}

void plleval3(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb, mu

    xb  = X*b'
    mu  = exp(xb)
    val = (-mu + y:*xb - lnfactorial(y))

    if (todo>=1) {
        grad = (y - mu):*X
    }
    if (todo==2) {
        hess = -quadcross(X, mu, X)
    }
}

end

Example 1 illustrates that our implementation works by comparing the predictions obtained after mypoisson5 with those obtained after poisson.

Example 1: predict after mypoisson5

. clear all

. use accident3

. quietly poisson accidents cvalue kids traffic

. predict double n1
(option n assumed; predicted number of events)

. quietly mypoisson5 accidents cvalue kids traffic

. predict double n2
expected counts

. list n1 n2 in 1/5

     +-----------------------+
     |        n1          n2 |
     |-----------------------|
  1. | .15572052   .15572052 |
  2. | .47362502   .47362483 |
  3. | .46432954   .46432946 |
  4. | .84841301   .84841286 |
  5. | .40848207   .40848209 |
     +-----------------------+

Done and undone

I made predict work after mypoisson5 by writing an ado-command that computes the prediction and by storing the name of this ado-command in e(predict).

In my next post, I discuss how to check that a working command is still working, a topic known as certification.