Home > Programming > Programming an estimation command in Stata: A poisson command using Mata

Programming an estimation command in Stata: A poisson command using Mata

\(
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)I discuss mypoisson1, which computes Poisson-regression results in Mata. The code in mypoisson1.ado is remarkably similar to the code in myregress11.ado, which computes ordinary least-squares (OLS) results in Mata, as I discussed in Programming an estimation command in Stata: An OLS command using Mata.

I build on previous posts. I use the structure of Stata programs that use Mata work functions that I discussed previously in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. You should be familiar with Poisson regression and using optimize(), which I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

This is the nineteenth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

A poisson command with Mata computations

The Stata command mypoisson1 computes the results in Mata. The syntax of the mypoisson1 command is

mypoisson1 depvar indepvars [if] [in] [, noconstant]

where indepvars can contain time-series variables. mypoisson1 does not allow for factor variables because they complicate the program. I discuss these complications, and present solutions, in my next post.

In the remainder of this post, I discuss the code for mypoisson1.ado. I recommend that you click on the file name to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

Code block 1: mypoisson1.ado

*! version 1.0.0  31Jan2016
program define mypoisson1, eclass sortpreserve
    version 14.1

    syntax varlist(numeric ts min=2) [if] [in] [, noCONStant ]
    marksample touse

    gettoken depvar indepvars : varlist

    _rmcoll `indepvars', `constant' forcedrop
    local indepvars  "`r(varlist)'"

    tempname b V N rank

    mata: mywork("`depvar'", "`indepvars'", "`touse'", "`constant'", ///
       "`b'", "`V'", "`N'", "`rank'")

    if "`constant'" == "" {
        local cnames "`indepvars' _cons"
    }
    else {
        local cnames "`indepvars'"
    }
    matrix colnames `b' = `cnames'
    matrix colnames `V' = `cnames'
    matrix rownames `V' = `cnames'

    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn local  cmd     "mypoisson1"

    ereturn display

end

mata:

void mywork( string scalar depvar,  string scalar indepvars,
             string scalar touse,   string scalar constant,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname)
{

    real vector y, b, mo_v, cv
    real matrix X, V, Cm
    real scalar n, p, rank, ko

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indepvars, touse)
    if (constant == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S  = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &plleval2())
    optimize_init_params(S, J(1, p, .01))

    b    = optimize(S)
    V    = optimize_result_V_oim(S)
    rank = p - diag0cnt(invsym(V))

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, rank)
}

void plleval2(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb

    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}

end

mypoisson1.ado has the structure of Stata programs that compute their results in Mata, which I discussed in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. Lines 1–35 define the ado-command mypoisson1. Lines 37–83 define the Mata work function mywork() used in mypoisson1 and the evaluator function plleval2() used in mywork().

The ado-command mypoisson1 has four parts:

  1. Lines 5–13 parse what the user typed, identify the sample, drop collinear variables from the list of independent variables, and create temporary names for Stata objects returned by our Mata work function.
  2. Lines 15–16 call the Mata work function.
  3. Lines 18–31 post the results returned by the Mata work function to e().
  4. Line 33 displays the results.

The Mata work function mywork() has four parts.

  1. Lines 39–42 parse the arguments.
  2. Lines 45–47 declare vectors, matrices, and scalars that are local to mywork().
  3. Lines 49–65 compute the results.
  4. Lines 67–70 copy the computed results to Stata, using the names that were passed in arguments.

Now, I discuss the ado-code in some detail. Lines 2–35 are almost the same as lines 2–35 of myregress11, which I discussed in Programming an estimation command in Stata: An OLS command using Mata. That myregress11 handles factor variables but mypoisson1 does not causes most of the differences. That myregress11 stores the residual degrees of freedom but mypoisson1 does not causes two minor differences.

myregress11 uses the matrix inverter to handle cases of collinear independent variables. In the Poisson-regression case discussed here, collinear independent variables define an unconstrained problem without a unique solution, so optimize() cannot converge. mypoisson1 drops collinear independent variables to avoid this problem. I discuss a better solution in my next post.

Lines 10–11 use _rmcoll to drop the collinear independent variables and store the list of linearly independent variables in the local macro indepvars.

Now, I discuss the Mata work function mywork() in some detail. The mywork() defined on lines 39–71 is remarkably similar to the mywork() function defined on lines 39–72 of myregress11.ado, discussed in Programming an estimation command in Stata: An OLS command using Mata. In mypoisson1.ado, lines 57–65 use optimize() to compute the results. In myregress11.ado, lines 58–64 use matrix computations to compute the results.

Lines 73–81 define the evaluator function plleval2(), which is used by optimize() to compute the results, as I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

Examples 1 and 2 illustrate that mypoisson1 produces the same results as poisson.

Example 1: mypoisson1
(Uses accident3.dta)

. clear all

. use accident3

. mypoisson1 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943554   -.5174188
        kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
     traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082077
       _cons |   .5743543   .2839519     2.02   0.043     .0178187     1.13089
------------------------------------------------------------------------------

Example 2: poisson

 poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
                                                LR chi2(3)        =     340.20
                                                Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
        kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
     traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
       _cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------

Done and undone

I discussed mypoisson1, which computes Poisson-regression results in Mata. I highlighted how similar the code in mypoisson1.ado is to the code in myregress11.ado.

I also discussed that mypoisson1 drops collinear independent variables and noted that mypoisson1 does not handle factor variables. In my next post, I discuss a solution to the problems caused by collinear independent variables and discuss a command that handles factor variables.