Home > Programming > Programming an estimation command in Stata: Handling factor variables in a poisson command using Mata

Programming an estimation command in Stata: Handling factor variables in a poisson command using Mata

mypoisson2.ado handles factor variables and computes its Poisson regression results in Mata. I discuss the code for mypoisson2.ado, which I obtained by adding the method for handling factor variables discussed in Programming an estimation command in Stata: Handling factor variables in optimize() to mypoisson1.ado, discussed in Programming an estimation command in Stata: A poisson command using Mata.

This is the twenty-first post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

A Poisson command with Mata computations

mypoisson2 computes Poisson regression results in Mata. The syntax of the mypoisson2 command is

mypoisson2 depvar indepvars [if] [in] [, noconstant]

where indepvars can contain factor variables or time-series variables.

In the remainder of this post, I discuss the code for mypoisson2.ado. I recommend that you click on the filename to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

Code block 1: mypoisson2.ado

*! version 2.0.0  07Feb2016
program define mypoisson2, eclass sortpreserve
    version 14

    syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant ]
    marksample touse

    gettoken depvar indepvars : varlist
    _fv_check_depvar `depvar'

    tempname b mo V N rank

    getcinfo `indepvars' , `constant'
    local  cnames "`r(cnames)'"
    matrix `mo' = r(mo)

    mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", ///
       "`b'", "`V'", "`N'", "`rank'", "`mo'")

    if "`constant'" == "" {
        local cnames "`cnames' _cons"
    }
    matrix colnames `b' = `cnames'
    matrix colnames `V' = `cnames'
    matrix rownames `V' = `cnames'

    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn local  cmd     "mypoisson1"

    ereturn display

end

program getcinfo, rclass
    syntax varlist(ts fv), [ noCONStant ]

    _rmcoll `varlist' , `constant' expand
    local cnames `r(varlist)'
    local p : word count `cnames'
    if "`constant'" == "" {
        local p = `p' + 1
        local cons _cons
    }

    tempname b mo

    matrix `b' = J(1, `p', 0)
    matrix colnames `b' = `cnames' `cons'
    _ms_omit_info `b'
    matrix `mo' = r(omit)

    return local  cnames "`cnames'"
    return matrix mo = `mo'
end

mata:

void mywork( string scalar depvar,  string scalar indepvars,
             string scalar touse,   string scalar constant,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname,
             string scalar mo)
{

    real vector y, b
    real matrix X, V, Ct
    real scalar n, p, rank

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indepvars, touse)
    if (constant == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    Ct = makeCt(mo)

    S  = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &plleval2())
    optimize_init_params(S, J(1, p, .01))
    optimize_init_constraints(S, Ct)

    b    = optimize(S)
    V    = optimize_result_V_oim(S)
    rank = p - diag0cnt(invsym(V))

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, rank)
}

real matrix makeCt(string scalar mo)
{
    real vector mo_v
    real scalar ko, j, p

    mo_v = st_matrix(mo)
    p    = cols(mo_v)
    ko   = sum(mo_v)
    if (ko>0) {
        Ct   = J(0, p, .)
        for(j=1; j<=p; j++) {
            if (mo_v[j]==1) {
                Ct  = Ct \ e(j, p)
            }
        }
        Ct = Ct, J(ko, 1, 0)
    }
    else {
        Ct = J(0,p+1,.)
    }

    return(Ct)

}

void plleval2(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb

    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}

end

As with programs that I have previously discussed, there is an ado part and Mata part. Lines 2–56 are the ado part; they define mypoisson2 and the subroutine getcinfo. Lines 58–133 are the Mata part; they define the Mata work function mywork() used in mypoisson2, the makeCt function used in mywork(), and the evaluator function plleval2() used in mywork().

The ado-command mypoisson2 has the following parts:

  1. Lines 5–11 parse what the user typed, identify the sample, and create temporary names for Stata objects used in the computations or returned by our Mata work function.
  2. Lines 13–15 use the subroutine getcinfo to get information about the user-specified covariates and then store this information in the local cnames and a Stata matrix.
  3. Lines 17–18 call the Mata work function.
  4. Lines 20–30 post the results returned by the Mata work function to e().
  5. Line 32 displays the results.

The Mata function mywork() has the following parts:

  1. Lines 60–65 parse the arguments.
  2. Lines 67–69 declare vectors, matrices, and scalars that are local to mywork().
  3. Lines 71–90 compute the results.
  4. Lines 92–95 copy the computed results to Stata, using the names that were passed as arguments.

I now discuss the ado-code in some detail, focusing on only the aspects that are new to mypoisson2.ado.

The subroutine getcinfo encapsulates the computations performed in examples 3, 4, and 5 in Programming an estimation command in Stata: Handling factor variables in optimize(). getcinfo uses _rmcoll to identify which covariates must be omitted, stores the names of the covariates to be omitted in the local macro cnames, then uses _ms_omit_info to create a vector containing a 1 for omitted variables and a 0 otherwise. getcinfo puts cnames into r(cnames) and the vector identifying the omitted variables into r(mo).

Lines 14–15 store the information put into r() by getcinfo in the local macro cnames and the Stata vector whose name is contained in the local macro mo. Lines 23–25 use cnames to put row names on the vector of point estimates and row and column names on the estimated variance–covariance matrix of the estimator (VCE). Line 18 passes the vector to mywork().

I now discuss the Mata code in some detail, again focusing on only the new aspects. Line 79 gets the constraint matrix Ct needed to handle any omitted variables from makeCt(). Lines 98–121 define makeCt(), which encapsulates the computations that form Cm in example 6 in Programming an estimation command in Stata: Handling factor variables in optimize(). Line 86 uses optimize_init_constraints() to put Ct in the optimize() object. Ct contains a matrix with zero rows when there are no constraints, and putting a constraint matrix with zero rows into the optimize() object tells optimize() that there are no constraints.

The output in examples 1 and 2 confirms that mypoisson2 produces the same results as poisson when a full set of indicator variables is included in a model with a constant term.

Example 1: mypoisson2 results

. clear all

. use accident3

. mypoisson2 accidents cvalue ibn.kids traffic, noconstant
Iteration 0:   f(p) = -843.66874
Iteration 1:   f(p) = -573.50561
Iteration 2:   f(p) = -545.86215
Iteration 3:   f(p) = -545.11765
Iteration 4:   f(p) = -545.10899
Iteration 5:   f(p) = -545.10898
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6582923   .0703823    -9.35   0.000    -.7962391   -.5203456
             |
        kids |
          0  |   .7157575    .282144     2.54   0.011     .1627653     1.26875
          1  |  -.9465934   .3111915    -3.04   0.002    -1.556518   -.3366693
          2  |  -.8589336   .3097583    -2.77   0.006    -1.466049   -.2518184
          3  |  -2.518175   .4366261    -5.77   0.000    -3.373947   -1.662404
             |
     traffic |   .1383977   .0307285     4.50   0.000      .078171    .1986243
------------------------------------------------------------------------------

Example 2: poisson results

. poisson accidents cvalue ibn.kids traffic, noconstant

Iteration 0:   log likelihood = -1250.3959
Iteration 1:   log likelihood = -553.73534
Iteration 2:   log likelihood = -545.14915
Iteration 3:   log likelihood = -545.10902
Iteration 4:   log likelihood = -545.10898

Poisson regression                              Number of obs     =        505
                                                Wald chi2(6)      =     285.69
Log likelihood = -545.10898                     Prob > chi2       =     0.0000

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6582924   .0703822    -9.35   0.000     -.796239   -.5203457
             |
        kids |
          0  |   .7157576   .2821434     2.54   0.011     .1627666    1.268749
          1  |  -.9465933    .311191    -3.04   0.002    -1.556516   -.3366701
          2  |  -.8589334   .3097578    -2.77   0.006    -1.466048   -.2518192
          3  |   -2.51817    .436625    -5.77   0.000     -3.37394   -1.662401
             |
     traffic |   .1383977   .0307284     4.50   0.000     .0781711    .1986242
------------------------------------------------------------------------------

Done and undone

I discussed mypoisson2, which handles factor variables and uses Mata to compute Poisson regression results. In my next post, I add robust and cluster–robust estimators of the VCE.