Archive

Posts Tagged ‘econometrics’

Programming an estimation command in Stata: A poisson command using Mata

\(
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)I discuss mypoisson1, which computes Poisson-regression results in Mata. The code in mypoisson1.ado is remarkably similar to the code in myregress11.ado, which computes ordinary least-squares (OLS) results in Mata, as I discussed in Programming an estimation command in Stata: An OLS command using Mata.

I build on previous posts. I use the structure of Stata programs that use Mata work functions that I discussed previously in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. You should be familiar with Poisson regression and using optimize(), which I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

A poisson command with Mata computations

The Stata command mypoisson1 computes the results in Mata. The syntax of the mypoisson1 command is

mypoisson1 depvar indepvars [if] [in] [, noconstant]

where indepvars can contain time-series variables. mypoisson1 does not allow for factor variables because they complicate the program. I discuss these complications, and present solutions, in my next post.

In the remainder of this post, I discuss the code for mypoisson1.ado. I recommend that you click on the file name to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

Code block 1: mypoisson1.ado

*! version 1.0.0  31Jan2016
program define mypoisson1, eclass sortpreserve
    version 14.1

    syntax varlist(numeric ts min=2) [if] [in] [, noCONStant ]
    marksample touse

    gettoken depvar indepvars : varlist

    _rmcoll `indepvars', `constant' forcedrop
    local indepvars  "`r(varlist)'"

    tempname b V N rank

    mata: mywork("`depvar'", "`indepvars'", "`touse'", "`constant'", ///
       "`b'", "`V'", "`N'", "`rank'")

    if "`constant'" == "" {
        local cnames "`indepvars' _cons"
    }
    else {
        local cnames "`indepvars'"
    }
    matrix colnames `b' = `cnames'
    matrix colnames `V' = `cnames'
    matrix rownames `V' = `cnames'

    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn local  cmd     "mypoisson1"

    ereturn display

end

mata:

void mywork( string scalar depvar,  string scalar indepvars,
             string scalar touse,   string scalar constant,
             string scalar bname,   string scalar Vname,
             string scalar nname,   string scalar rname)
{

    real vector y, b, mo_v, cv
    real matrix X, V, Cm
    real scalar n, p, rank, ko

    y = st_data(., depvar, touse)
    n = rows(y)
    X = st_data(., indepvars, touse)
    if (constant == "") {
        X = X,J(n, 1, 1)
    }
    p = cols(X)

    S  = optimize_init()
    optimize_init_argument(S, 1, y)
    optimize_init_argument(S, 2, X)
    optimize_init_evaluator(S, &plleval2())
    optimize_init_params(S, J(1, p, .01))

    b    = optimize(S)
    V    = optimize_result_V_oim(S)
    rank = p - diag0cnt(invsym(V))

    st_matrix(bname, b)
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, rank)
}

void plleval2(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb

    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}

end

mypoisson1.ado has the structure of Stata programs that compute their results in Mata, which I discussed in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. Lines 1–35 define the ado-command mypoisson1. Lines 37–83 define the Mata work function mywork() used in mypoisson1 and the evaluator function plleval2() used in mywork().

The ado-command mypoisson1 has four parts:

  1. Lines 5–13 parse what the user typed, identify the sample, drop collinear variables from the list of independent variables, and create temporary names for Stata objects returned by our Mata work function.
  2. Lines 15–16 call the Mata work function.
  3. Lines 18–31 post the results returned by the Mata work function to e().
  4. Line 33 displays the results.

The Mata work function mywork() has four parts.

  1. Lines 39–42 parse the arguments.
  2. Lines 45–47 declare vectors, matrices, and scalars that are local to mywork().
  3. Lines 49–65 compute the results.
  4. Lines 67–70 copy the computed results to Stata, using the names that were passed in arguments.

Now, I discuss the ado-code in some detail. Lines 2–35 are almost the same as lines 2–35 of myregress11, which I discussed in Programming an estimation command in Stata: An OLS command using Mata. That myregress11 handles factor variables but mypoisson1 does not causes most of the differences. That myregress11 stores the residual degrees of freedom but mypoisson1 does not causes two minor differences.

myregress11 uses the matrix inverter to handle cases of collinear independent variables. In the Poisson-regression case discussed here, collinear independent variables define an unconstrained problem without a unique solution, so optimize() cannot converge. mypoisson1 drops collinear independent variables to avoid this problem. I discuss a better solution in my next post.

Lines 10–11 use _rmcoll to drop the collinear independent variables and store the list of linearly independent variables in the local macro indepvars.

Now, I discuss the Mata work function mywork() in some detail. The mywork() defined on lines 39–71 is remarkably similar to the mywork() function defined on lines 39–72 of myregress11.ado, discussed in Programming an estimation command in Stata: An OLS command using Mata. In mypoisson1.ado, lines 57–65 use optimize() to compute the results. In myregress11.ado, lines 58–64 use matrix computations to compute the results.

Lines 73–81 define the evaluator function plleval2(), which is used by optimize() to compute the results, as I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

Examples 1 and 2 illustrate that mypoisson1 produces the same results as poisson.

Example 1: mypoisson1
(Uses accident3.dta)

. clear all

. use accident3

. mypoisson1 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943554   -.5174188
        kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
     traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082077
       _cons |   .5743543   .2839519     2.02   0.043     .0178187     1.13089
------------------------------------------------------------------------------

Example 2: poisson

 poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
                                                LR chi2(3)        =     340.20
                                                Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
        kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
     traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
       _cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------

Done and undone

I discussed mypoisson1, which computes Poisson-regression results in Mata. I highlighted how similar the code in mypoisson1.ado is to the code in myregress11.ado.

I also discussed that mypoisson1 drops collinear independent variables and noted that mypoisson1 does not handle factor variables. In my next post, I discuss a solution to the problems caused by collinear independent variables and discuss a command that handles factor variables.

Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters

\(
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)I show how to use optimize() in Mata to maximize a Poisson log-likelihood function and to obtain estimators of the variance–covariance of the estimator (VCE) based on independent and identically distributed (IID) observations or on robust methods.

Using optimize()

There are many optional choices that one may make when solving a nonlinear optimization problem, but there are very few that one must make. The optimize*() functions in Mata handle this problem by making a set of default choices for you, requiring that you specify a few things, and allowing you to change any of the default choices.

When I use optimize() to solve a nonlinear optimization problem, I do four steps.

  1. I create an optimize() object

      
    : S = optimize_init()

    which contains all the default choices

  2. I use some of the optimize_init_*(S) functions to put information about my optimization problem into S.
  3. I use

      
    : betahat = optimize(S)

    to perform the optimization.

  4. I use some of the optimize_result_*(S) functions to get the results, which optimize(S) stored in S.

Consider maximizing the log-likelihood function of a Poisson model. The contribution of the \(i\)th observation to the log-likelihood is

\[
f_i(\betab) = y_i{\bf x_i}\betab – \exp({\bf x}_i\betab’) – \ln( y_i !)
\]

where \(y_i\) is the dependent variable, \({\bf x}_i\) is the vector of covariates, and \(\betab\) is the row vector of parameters that we select to maximize the log-likelihood function given by \(F(\betab) =\sum_i f_i(\betab)\). I could drop \(ln(y_i!)\), because it does not depend on the parameters. I include it to make the value of the log-likelihood function the same as that reported by Stata. Stata includes these terms so that the values of the log-likelihood functions are comparable across models.

The code block 1 copies the data from Stata to Mata and computes the Poisson log-likelihood function at the vector of parameter values b, which has been set to the arbitrary starting values of .01 for each parameter.

Code block 1: An evaluator function for the Poisson log-likelihood

clear all
use accident3
mata:
     y = st_data(., "accidents")
     X = st_data(., "cvalue kids traffic")
     X = X,J(rows(X), 1, 1)
     b = J(1, cols(X), .01)
    xb = X*b'
    f  = sum(-exp(xb) + y:*xb - lnfactorial(y))
end

The Mata function plleval() in code block 2 puts the value of the Poisson log-likelihood function at the vector of parameter values b into val.

Code block 2: An evaluator function for the Poisson log-likelihood

mata:
void plleval(real scalar todo, real vector b, val, grad, hess)
{
    real vector  y, xb
    real matrix  X

     y = st_data(., "accidents")
     X = st_data(., "cvalue kids traffic")
     X = X,J(rows(X), 1, 1)
    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end

plleval() has the default syntax of an evaluator function that optimize() can call. Evaluator functions have a default syntax so that optimize() can call them, which it must do to find the maximum. After describing the default syntax, I will show how to use evaluators with extra arguments.

plleval() is void, it returns nothing. The real scalar todo allows optimize() to tell the evaluator function what it must compute. The real vector b is the current value of the parameter vector. val is not typed because no matter what it contains on input, it will contain the value of the objective function on output. grad is not typed because it will optionally contain the vector of first derivatives of the objective function at the current value of b on output. hess is not typed because it will optionally contain the matrix of second derivatives of the objective function at the current value of b on output. As plleval() illustrates, the objective function must put the value of the objective function into the third argument, but it need not compute either the vector of first derivatives or the matrix of second derivatives.

In example 1, I use optimize() to maximize the Poisson log-likelihood function computed in plleval().

Example 1: Using optimize() to estimate Poisson parameters

(Uses accident3.dta)

. clear all

. use accident3

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval(real scalar todo, real vector b, val, grad, hess)
> {
>     real vector  y, xb
>     real matrix  X
> 
>      y = st_data(., "accidents")
>      X = st_data(., "cvalue kids traffic")
>      X = X,J(rows(X), 1, 1)
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
note: argument todo unused
note: argument grad unused
note: argument hess unused

: 
: S  = optimize_init()

: optimize_init_evaluator(S, &plleval())

: optimize_init_params(S, J(1, 4, .01))

: bh = optimize(S)
Iteration 0:   f(p) = -851.18669  
Iteration 1:   f(p) = -556.66874  
Iteration 2:   f(p) = -555.81708  
Iteration 3:   f(p) = -555.81538  
Iteration 4:   f(p) = -555.81538  

: bh
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
    +---------------------------------------------------------+

: end
--------------------------------------------------------------------------------

After defining plleval(), I use optimize_init() to create the optimize() object S. I must put information about how to call plleval() and the vector of starting values into S. Typing

optimize_init_evaluator(S, &plleval())

puts the address of the evaluator function plleval() into S. By preceding the name of the evaluator function plleval() with an ampersand (&), I put the address of the evaluator function into S. optimize() requires that you put the function address instead of the function name because having the address speeds up finding the function. Typing

optimize_init_params(S, J(1, 4, .01))

puts the vector of starting values, J(1, 4, .01), into S.

Typing

bh = optimize(S)

causes optimize() to solve the optimization problem described in S, and it causes optimize() to put the vector of optimal parameters in bh. optimize() produces the default iteration log, because we did not change the default specification in S.

When optimize() has completed, the results are in S. For example, I display the bh returned by optimize() and use optimize_result_params(S) to display the result stored in S. I further illustrate by displaying the standard errors; optimize_result_V_oim() retrieves the observed-information-matrix (OIM) estimator of the variance–covariance of the estimator (VCE). Many other results are stored in S; type help mf optimize and look at the optimize_result*() functions for details.

Comparing the results in examples 1 and 2 shows that they are correct.

Example 2: Results from poisson

. poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
                                                LR chi2(3)        =     340.20
                                                Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
        kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
     traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
       _cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------

plleval() is slow because it copies the data from Stata to Mata every time optimize() calls it. I would much rather pass the data to the evaluator function, but this requires putting information about the syntax of the new evaluator function in S. For example, I would like to use the evaluator function plleval2(). In example 3, I use optimize_init_argument() to put information into S about the extra arguments accepted by the new evaluator function plleval2().

Code block 3: Passing data to the Poisson evaluator function

mata:
void plleval2(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb

    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end

Line 3 declares the extra arguments, the real vector y, and the real vector X. The extra arguments come between the inputs that must always be present, the real scalar todo and the real vector b, and the always-present outputs; val, grad, and hess.

Example 3 uses optimize() to maximize the Poisson objective function coded in plleval2().

Example 3: Using optional arguments to pass data

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval2(real scalar todo, real vector b,     ///
>               real vector y,    real matrix X,     ///
>               val, grad, hess)
> {
>     real vector  xb
>
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
note: argument todo unused
note: argument grad unused
note: argument hess unused

:
: y = st_data(., "accidents")

: X = st_data(., "cvalue kids traffic")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval2())

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
    +---------------------------------------------------------+

: end
--------------------------------------------------------------------------------

After defining plleval2(), I copy the data from Stata to Mata, and I use optimize_init() to put the default choices into the optimize() object S. When I typed

optimize_init_argument(S, 1, y)

I put information into S specifying that optimize() should pass y as the first extra argument to the evaluator function. When I typed

optimize_init_argument(S, 2, X)

I put information into S specifying that optimize() should pass X as the second extra argument to the evaluator function.

Analogous to example 1, typing

optimize_init_evaluator(S, &plleval2())

puts the address of plleval2() into S, and typing

optimize_init_params(S, J(1, 4, .01))

puts the vector of starting values, J(1, 4, .01), in S.

The results are the same as those in example 1.

Vector of observation–level contributions and robust VCE estimation

Robust estimators for the VCE of an estimator use the structure of observation-level contributions; see Wooldridge (2010, chapters 12 and 13) or Cameron and Trivedi (2005, chapter 5). When the evaluator function gives optimize() a vector of observation-level contributions, instead of a scalar summation, optimize() can use this structure to compute robust or cluster–robust estimators of the VCE.

Consider plleval3(), which puts the vector of observation-level contributions into val.

Code block 4: A vector of observation-level contributions

mata:
void plleval3(real scalar todo, real vector b,     ///
              real vector y,    real matrix X,     ///
              val, grad, hess)
{
    real vector  xb

    xb = X*b'
   val = -exp(xb) + y:*xb - lnfactorial(y)
}
end

To use plleval3(), I must put information in the optimize() object stating that the evaluator function computes a vector of observation-level contributions. In example 4, I use optimize_init_evaluatortype() to put this information into the optimize() object S.

Example 4: Robust VCE estimation

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval3(real scalar todo, real vector b,     ///
>               real vector y,    real matrix X,     ///
>               val, grad, hess)
> {
>     real vector  xb
>
>     xb = X*b'
>    val = -exp(xb) + y:*xb - lnfactorial(y)
> }
note: argument todo unused
note: argument grad unused
note: argument hess unused

:
:
: y = st_data(., "accidents")

: X = st_data(., "cvalue kids traffic")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval3())

: optimize_init_evaluatortype(S, "gf0")

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871527   -1.009017051    .1467114658    .5743542978  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483832   .0807960809    .031376176   .2839517337  |
    +---------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_robust(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .1096020124    .188666044    .092431746   .6045057623  |
    +---------------------------------------------------------+

: end
--------------------------------------------------------------------------------

After defining plleval3(), I copy the data, create the optimize() object S, put the specifications for the extra arguments y and X in S, and put the address of plleval3() into S. Typing

optimize_init_evaluatortype(S, “gf0”)

puts in S the information that the evaluator function returns a vector of observation-level contribution and that it computes zero derivatives, that is the evaluator function is type “gf0”. Given the vector structure, I can type

optimize_result_V_robust(S)

to compute a robust estimator of the VCE.

sqrt(diagonal(optimize_result_V_robust(S)))’

returns the robust standard errors, which are the same as those reported by poisson in example 5.

Example 5: Robust VCE estimation by poisson

. poisson accidents cvalue kids traffic, vce(robust)

Iteration 0:   log pseudolikelihood = -555.86605
Iteration 1:   log pseudolikelihood =  -555.8154
Iteration 2:   log pseudolikelihood = -555.81538

Poisson regression                              Number of obs     =        505
                                                Wald chi2(3)      =      99.76
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -555.81538               Pseudo R2         =     0.2343

------------------------------------------------------------------------------
             |               Robust
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .1096019    -5.98   0.000    -.8707029   -.4410712
        kids |  -1.009017    .188666    -5.35   0.000    -1.378795   -.6392382
     traffic |   .1467115   .0924316     1.59   0.112    -.0344512    .3278741
       _cons |    .574354   .6045047     0.95   0.342    -.6104535    1.759162
------------------------------------------------------------------------------

Done and undone

I showed how to use optimize() to maximize a Poisson log-likelihood function. I also showed how to obtain a robust estimator of the VCE by coding the evaluator function to compute a vector of observation-level contributions. In my next post, I show how to write a Stata command that uses Mata to estimate the parameters of a Poisson regression model.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

Programming an estimation command in Stata: A review of nonlinear optimization using Mata

\(\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\xb}{{\bf x}}
\newcommand{\yb}{{\bf y}}
\newcommand{\gb}{{\bf g}}
\newcommand{\Hb}{{\bf H}}
\newcommand{\thetab}{\boldsymbol{\theta}}
\newcommand{\Xb}{{\bf X}}
\)I review the theory behind nonlinear optimization and get more practice in Mata programming by implementing an optimizer in Mata. In real problems, I recommend using the optimize() function or moptimize() function instead of the one I describe here. In subsequent posts, I will discuss optimize() and moptimize(). This post will help you develop your Mata programming skills and will improve your understanding of how optimize() and moptimize() work.

A quick review of nonlinear optimization

We want to maximize a real-valued function \(Q(\thetab)\), where \(\thetab\) is a \(p\times 1\) vector of parameters. Minimization is done by maximizing \(-Q(\thetab)\). We require that \(Q(\thetab)\) is twice, continuously differentiable, so that we can use a second-order Taylor series to approximate \(Q(\thetab)\) in a neighborhood of the point \(\thetab_s\),

\[
Q(\thetab) \approx Q(\thetab_s) + \gb_s'(\thetab -\thetab_s)
+ \frac{1}{2} (\thetab -\thetab_s)’\Hb_s (\thetab -\thetab_s)
\tag{1}
\]

where \(\gb_s\) is the \(p\times 1\) vector of first derivatives of \(Q(\thetab)\) evaluated at \(\thetab_s\) and \(\Hb_s\) is the \(p\times p\) matrix of second derivatives of \(Q(\thetab)\) evaluated at \(\thetab_s\), known as the Hessian matrix.

Nonlinear maximization algorithms start with a vector of initial values and produce a sequence of updated values that converge to the parameter vector that maximizes the objective function. The algorithms I discuss here can only find local maxima. The function in figure 1 has a local maximum at .2 and another at 1.5. The global maximum is at .2.

Figure 1: Local maxima
graph1

Each update is produced by finding the \(\thetab\) that maximizes the approximation on the right-hand side of equation (1) and letting it be \(\thetab_{s+1}\). To find the \(\thetab\) that maximizes the approximation, we set to \({\bf 0}\) the derivative of the right-hand side of equation (1) with respect to \(\thetab\),

\[
\gb_s + \Hb_s (\thetab -\thetab_s) = {\bf 0}
\tag{2}
\]

Replacing \(\thetab\) with \(\thetab_{s+1}\) and solving yields the update rule for \(\thetab_{s+1}\).

\[
\thetab_{s+1} = \thetab_s – \Hb_s^{-1} \gb_s
\tag{3}
\]

Note that the update is uniquely defined only if the Hessian matrix \(\Hb_s\) is full rank. To ensure that we have a local maximum, we will require that the Hessian be negative definite at the optimum, which also implies that the symmetric Hessian is full rank.

The update rule in equation (3) does not guarantee that \(Q(\thetab_{s+1})>Q(\thetab_s)\). We want to accept only those \(\thetab_{s+1}\) that do produce such an increase, so in practice, we use

\[
\thetab_{s+1} = \thetab_s – \lambda \Hb_s^{-1} \gb_s
\tag{4}
\]

where \(\lambda\) is the step size. In the algorithm presented here, we start with \(\lambda\) equal to \(1\) and, if necessary, decrease \(\lambda\) until we find a value that yields an increase.

The previous sentence is vague. I clarify it by writing an algorithm in Mata. Suppose that real scalar Q( real vector theta ) is a Mata function that returns the value of the objective function at a value of the parameter vector theta. For the moment, suppose that g_s is the vector of derivatives at the current theta, denoted by theta_s, and that Hi_s is the inverse of the Hessian matrix at theta_s. These definitions allow us to define the update function

Code block 1: Candidate rule for parameter vector

real vector tupdate(                 ///
	real scalar lambda,          ///
	real vector theta_s,         ///
	real vector g_s,             ///
	real matrix Hi_s)
{
	return (theta_s - lambda*Hi_s*g_s)
}

For specified values of lambda, theta_s, g_s, and Hi_s, tupdate() returns a candidate value for theta_s1. But we only accept candidate values of theta_s1 that yield an increase, so instead of using tupdate() to get an update, we would use GetUpdate().

Code block 2: Update function for parameter vector

real vector GetUpdate(            ///
    real vector theta_s,          ///
    real vector g_s,              ///
    real matrix Hi_s)
{
    lambda = 1
    theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
    while ( Q(theta_s1) < Q(theta_s) ) {
        lambda   = lambda/2
        theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
    }
    return(theta_s1)
}

GetUpdate() starts by getting a candidate value for theta_s1 when lambda = 1. GetUpdate() returns this candidate theta_s1 if it produces an increase in Q(). Otherwise, GetUpdate() divides lambda by 2 and gets another candidate theta_s1 until it finds a candidate that produces an increase in Q(). GetUpdate() returns the first candidate that produces an increase in Q().

While these functions clarify the ambiguities in the original vague statement, GetUpdate() makes the unwise assumption that there is always a lambda for which the candidate theta_s1 produces an increase in Q(). The version of GetUpdate() in code block 3 does not make this assumption, it exits with an error if lambda is too small; less than \(10^{-11}\).

Code block 3: A better update function for parameter vector

real vector GetUpdate(            ///
    real vector theta_s,          ///
    real vector g_s,              ///
    real matrix Hi_s)
{
    lambda = 1
    theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
    while ( Q(theta_s1) < Q(theta_s) ) {
        lambda   = lambda/2
        if (lambda < 1e-11) {
            printf("{red}Cannot find parameters that produce an increase.\n")
            exit(error(3360))
        }
        theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
    }
    return(theta_s1)
}

An outline of our algorithm for nonlinear optimization is the following:

  1. Select initial values for parameter vector.
  2. If current parameters set vector of derivatives of Q() to zero, go to (3); otherwise go to (A).
    • Use GetUpdate() to get new parameter values.
    • Calculate g_s and Hi_s at parameter values from (A).
    • Go to (2).
  3. Display results.

Code block 4 contains a Mata version of this algorithm.

Code block 4: Pseudocode for Newton–Raphson algorithm

theta_s  =  J(p, 1, .01)
GetDerives(theta_s, g_s, Hi_s.)
gz = g_s'*Hi*g
while (abs(gz) > 1e-13) {
	theta_s = GetUpdate(theta_s, g_s, Hi_s)
	GetDerives(theta_s, g_s, Hi_s)
	gz      = g_s'*Hi_s*g_s
	printf("gz is now %8.7g\n", gz)
}
printf("Converged value of theta is\n")
theta_s

Line 2 puts the vector of starting values, a \(p\times 1\) vector with each element equal to .01, in theta_s. Line 3 uses GetDerives() to put the vector of first derivatives into g_s and the inverse of the Hessian matrix into Hi_s. In GetDerives(), I use cholinv() to calculate Hi_s. cholinv() returns missing values if the matrix is not positive definite. By calculating Hi_s = -1*cholinv(-H_s), I ensure that Hi_s contains missing values when the Hessian is not negative definite and full rank.

Line 3 calculates how different the vector of first derivatives is from 0. Instead of using a sum of squares, obtainable by g_s’g_s, I weight the first derivatives by the inverse of the Hessian matrix, which puts the \(p\) first derivatives on a similar scale and ensures that the Hessian matrix is negative definite at convergence. (If the Hessian matrix is not negative definite, GetDerives() will put a matrix of missing values into Hi_s, which causes gz=., which will exceed the tolerance.)

To flush out the details we need a specific problem. Consider maximizing the log-likelihood function of a Poisson model, which has a simple functional form. The contribution of each observation to the log-likelihood is

\[
f_i(\betab) = y_i{\bf x_i}\betab – \exp({\bf x}_i\betab) – \ln( y_i !)
\]

where \(y_i\) is the dependent variable, \({\bf x}_i\) is the vector of covariates, and \(\betab\) is the vector of parameters that we select to maximize the log-likelihood function given by \(F(\betab) =\sum_i f_i(\betab)\). I could drop ln(y_i!), because it does not depend on the parameters. I include it to make the value of the log-likelihood function the same as that reported by Stata. Stata includes these terms so that log-likelihood-function values are comparable across models.

The pll() function in code block 5 computes the Poisson log-likelihood function from the vector of observations on the dependent variable y, the matrix of observations on the covariates X, and the vector of parameter values b.

Code block 5: A function for the Poisson log-likelihood function

// Compute Poisson log-likelihood 
mata:
real scalar pll(real vector y, real matrix X, real vector b)
{
    real vector  xb

    xb = x*b
    f  = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end

The vector of first derivatives is

\[
\frac{\partial~F(\xb_,\betab)}{\partial \betab}
= \sum_{i=1}^N (y_i – \exp(\xb_i\betab))\xb_i
\]

which I can compute in Mata as quadcolsum((y-exp(X*b)):*X), and the Hessian matrix is

\[
\label{eq:H}
\sum_{i=1}^N\frac{\partial^2~f(\xb_i,\betab)}{\partial\betab
\partial\betab^\prime}
= – \sum_{i=1}^N\exp(\xb_i\betab)\xb_i^\prime \xb_i
\tag{5}
\]

which I can compute in Mata as -quadcross(X, exp(X*b), X).

Here is some code that implements this Newton-Raphson NR algorithm for the Poisson regression problem.

Code block 6: pnr1.do
(Uses accident3.dta)

// Newton-Raphson for Poisson log-likelihood
clear all
use accident3

mata:
real scalar pll(real vector y, real matrix X, real vector b)
{
    real vector  xb
    xb = X*b
    return(sum(-exp(xb) + y:*xb - lnfactorial(y)))
}

void GetDerives(real vector y, real matrix X, real vector theta, g, Hi)
{
	real vector exb

	exb = exp(X*theta)
	g   =  (quadcolsum((y - exb):*X))'
	Hi  = quadcross(X, exb, X)
	Hi = -1*cholinv(Hi)
}

real vector tupdate(                 ///
	real scalar lambda,          ///
	real vector theta_s,          ///
	real vector g_s,              ///
	real matrix Hi_s)
{
	return (theta_s - lambda*Hi_s*g_s)
}

real vector GetUpdate(            ///
    real vector y,                ///
    real matrix X,                ///
    real vector theta_s,          ///
    real vector g_s,              ///
    real matrix Hi_s)
{
    real scalar lambda 
    real vector theta_s1

    lambda = 1
    theta_s1 = tupdate(lambda, theta_s, g_s, Hi_s)
    while ( pll(y, X, theta_s1) <= pll(y, X, theta_s) ) {
        lambda   = lambda/2
        if (lambda < 1e-11) {
            printf("{red}Cannot find parameters that produce an increase.\n")
            exit(error(3360))
        }
        theta_s1 = tupdate(lambda, theta_s, g_s, Hi_s)
    }
    return(theta_s1)
}


y = st_data(., "accidents")
X = st_data(., "cvalue kids traffic")
X = X,J(rows(X), 1, 1)

b  =  J(cols(X), 1, .01)
GetDerives(y, X, b, g=., Hi=.)
gz = .
while (abs(gz) > 1e-11) {
	bs1 = GetUpdate(y, X, b, g, Hi)
	b   = bs1
	GetDerives(y, X, b, g, Hi)
	gz = g'*Hi*g
	printf("gz is now %8.7g\n", gz)
}
printf("Converged value of beta is\n")
b

end

Line 3 reads in the downloadable accident3.dta dataset before dropping down to Mata. I use variables from this dataset on lines 56 and 57.

Lines 6–11 define pll(), which returns the value of the Poisson log-likelihood function, given the vector of observations on the dependent variable y, the matrix of covariate observations X, and the current parameters b.

Lines 13‐21 put the vector of first derivatives in g and the inverse of the Hessian matrix in Hi. Equation 5 specifies a matrix that is negative definite, as long as the covariates are not linearly dependent. As discussed above, cholinv() returns a matrix of missing values if the matrix is not positive definite. I multiply the right-hand side on line 20 by \(-1\) instead of on line 19.

Lines 23–30 implement the tupdate() function previously discussed.

Lines 32–53 implement the GetUpdate() function previously discussed, with the caveats that this version handles the data and uses pll() to compute the value of the objective function.

Lines 56–58 get the data from Stata and row-join a vector to X for the constant term.

Lines 60–71 implement the NR algorithm discussed above for this Poisson regression problem.

Running nr1.do produces

Example 1: NR algorithm for Poisson

. do pnr1

. // Newton-Raphson for Poisson log-likelihood
. clear all

. use accident3

. 
. mata:

[Output Omitted]

: b  =  J(cols(X), 1, .01)

: GetDerives(y, X, b, g=., Hi=.)

: gz = .

: while (abs(gz) > 1e-11) {
>         bs1 = GetUpdate(y, X, b, g, Hi)
>         b   = bs1
>         GetDerives(y, X, b, g, Hi)
>         gz = g'*Hi*g
>         printf("gz is now %8.7g\n", gz)
> }
gz is now -119.201
gz is now -26.6231
gz is now -2.02142
gz is now -.016214
gz is now -1.3e-06
gz is now -8.3e-15

: printf("Converged value of beta is\n")
Converged value of beta is

: b
                  1
    +----------------+
  1 |  -.6558870685  |
  2 |  -1.009016966  |
  3 |   .1467114652  |
  4 |   .5743541223  |
    +----------------+

: 
: end
--------------------------------------------------------------------------------
. 
end of do-file

The point estimates in example 1 are equivalent to those produced by poisson.

Example 2: poisson results

. poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605  
Iteration 1:   log likelihood =  -555.8154  
Iteration 2:   log likelihood = -555.81538  

Poisson regression                              Number of obs     =        505
                                                LR chi2(3)        =     340.20
                                                Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
        kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
     traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
       _cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------

Done and undone

I implemented a simple nonlinear optimizer to practice Mata programming and to review the theory behind nonlinear optimization. In future posts, I implement a command for Poisson regression that uses the optimizer in optimize().

Programming an estimation command in Stata: Adding robust and cluster-robust VCEs to our Mata based OLS command

I show how to use the undocumented command _vce_parse to parse the options for robust or cluster-robust estimators of the variance-covariance of the estimator (VCE). I then discuss myregress12.ado, which performs its computations in Mata and computes VCE estimators based on independently and identically distributed (IID) observations, robust methods, or cluster-robust methods.

myregress12.ado performs ordinary least-squares (OLS) regression, and it extends myregress11.ado, which I discussed in Programming an estimation command in Stata: An OLS command using Mata. To get the most out of this post, you should be familiar with Programming an estimation command in Stata: Using a subroutine to parse a complex option and Programming an estimation command in Stata: Computing OLS objects in Mata.

Parsing the vce() option

I used ado-subroutines to simplify the parsing of the options vce(robust) and vce(cluster cvarname) in myregress10.ado; see Programming an estimation command in Stata: Using a subroutine to parse a complex option. Part of the point was to illustrate how to write ado-subroutines and the programming tricks that I used in these subroutines.

Here I use the undocumented command _vce_parse to simplify the parsing. There are many undocumented commands designed to help Stata programmers. They are undocumented in that they are tersely documented in the system help but not documented in the manuals. In addition, the syntax or behavior of these commands may change over Stata releases, although this rarely happens.

_vce_parse helps Stata programmers parse the vce() option. To see how it works, consider the problem of parsing the syntax of myregress12.

myregress12 depvar [indepvars] [if] [in] [, vce(robust | cluster clustervar) noconstant]

where indepvars can contain factor variables or time-series variables.

I can use the syntax command to put whatever the user specifies in the option vce() into the local macro vce, but I still must (1) check that what was specified makes sense and (2) create local macros that the code can use to do the right thing. Examples 1–7 create the local macro vce, simulating what syntax would do, and then use _vce_robust to perform tasks (1) and (2).

I begin with the case in which the user specified vce(robust); here the local macro vce would contain the word robust.

Example 1: parsing vce(robust)

. clear all

. sysuse auto
(1978 Automobile Data)

. local vce "robust"

. _vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(`vce')

. return list

macros:
             r(robust) : "robust"
             r(vceopt) : "vce(robust)"
                r(vce) : "robust"

The command

_vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(`vce’)

has two pieces. The piece before the colon (:) specifies the rules; the piece after the colon specifies what the user typed. Each piece can have a Stata object followed by some options; note the commas before optlist(Robust) and before vce(`vce’). In the case at hand, the second piece only contains what the user specified – vce(robust) – and the first piece only contains the options optlist(Robust) and argoptlist(CLuster). The option optlist(Robust) specifies that the vce() option in the second piece may contain the option robust and that its minimal abbreviation is r. Note how the word Robust in optlist(Robust) mimics how syntax specifies minimum abbreviations. The option argoptlist(CLuster) specifies that the vce() option in the second piece may contain cluster clustervar, that the minimum abbreviation of cluster is cl, and that it will put the argument clustervar into a local macro.

After the command,

_vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(`vce’)

I use return list to show what _vce_parse stored in r(). Because local macro vce contains “robust”, _vce_parse

  1. puts the word robust in the local macro r(robust);
  2. puts what the user typed, vce(robust), in the local macro r(vceopt); and
  3. puts the type of VCE, robust, in the local macro r(vce).

Examples 2 and 3 illustrate that _vce_parse stores the same values in these local macros when the user specifies vce(rob) or vce(r), which are valid abbreviations for vce(robust).

Example 2: parsing vce(rob)

. local vce "rob"

. _vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(`vce')

. return list

macros:
             r(robust) : "robust"
             r(vceopt) : "vce(robust)"
                r(vce) : "robust"

Example 3: parsing vce(r)

. local vce "r"

. _vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(`vce')

. return list

macros:
             r(robust) : "robust"
             r(vceopt) : "vce(robust)"
                r(vce) : "robust"

Now, consider parsing the option vce(cluster clustervar). Because the cluster variable clustervar may contain missing values, _vce_parse may need to update a sample-identification variable before it stores the name of the cluster variable in a local macro. In example 4, I use the command

_vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(`vce’)

to handle the case when the user specifies vce(cluster rep78). The results from the tabulate and summarize commands illustrate that _vce_parse updates the sample-identification variable mytouse to account for the missing observations in rep78.

Example 4: parsing vce(cluster rep78)

. generate byte mytouse = 1

. tabulate mytouse

    mytouse |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         74      100.00      100.00
------------+-----------------------------------
      Total |         74      100.00

. summarize rep78

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       rep78 |         69    3.405797    .9899323          1          5

. local vce "cluster rep78"

. _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(`vce')

. return list

macros:
             r(robust) : "robust"
            r(cluster) : "rep78"
             r(vceopt) : "vce(cluster rep78)"
            r(vceargs) : "rep78"
                r(vce) : "cluster"

. tabulate mytouse

    mytouse |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |          5        6.76        6.76
          1 |         69       93.24      100.00
------------+-----------------------------------
      Total |         74      100.00

I use return list to show what _vce_parse stored in r(). Because local macro vce contains cluster rep78, _vce_parse

  1. puts the word robust in the local macro r(robust);
  2. puts the name of the cluster variable, rep78, in the local macro r(cluster);
  3. puts what the user typed, vce(cluster rep78), in the local macro r(vceopt);
  4. puts the argument to the cluster option, rep78, in the local macro r(vceargs); and
  5. puts the type of VCE, cluster, in the local macro r(vce).

Examples 5 and 6 illustrate that _vce_parse stores the same values in these local macros when the user specifies vce(clus rep78) or vce(cl rep78), which are valid abbreviations for vce(cluster rep78).

Example 5: parsing vce(clus rep78)

. local vce "clus rep78"

. _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(`vce')

. return list

macros:
             r(robust) : "robust"
            r(cluster) : "rep78"
             r(vceopt) : "vce(cluster rep78)"
            r(vceargs) : "rep78"
                r(vce) : "cluster"

Example 6: parsing vce(cl rep78)

. local vce "cl rep78"

. _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(`vce')

. return list

macros:
             r(robust) : "robust"
            r(cluster) : "rep78"
             r(vceopt) : "vce(cluster rep78)"
            r(vceargs) : "rep78"
                r(vce) : "cluster"

Having illustrated how to make _vce_parse handle the cases when the user specifies something valid, I will show in example 7 that it will also produce a standard error message when the user specifies an error condition.

Example 7: parsing vce(silly)

. local vce "silly"

. capture noisily _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vc
> e(`vce')
vcetype 'silly' not allowed

. return list

_vce_parse can parse other types of vce() options; to see them type help _vce_parse.

Also, remember to type undocumented when you are looking for a programmer’s tool.

The code for myregress12

Here is the code for myregress12.ado, which uses _vce_parse. I describe how it works below.

I recommend that you click on the file name to download the code for my myregress12.ado. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

Code block 1: myregress12.ado

*! version 12.0.0  16Jan2016
program define myregress12, eclass sortpreserve
    version 14.1

    syntax varlist(numeric ts fv) [if] [in] [, noCONStant vce(string) ]
    marksample touse

    _vce_parse `touse' , optlist(Robust) argoptlist(CLuster) : , vce(`vce')
    local vce        "`r(vce)'"
    local clustervar "`r(cluster)'"
    if "`vce'" == "robust" | "`vce'" == "cluster" {
        local vcetype "Robust"
    }
    if "`clustervar'" != "" {
        capture confirm numeric variable `clustervar'
        if _rc {
            display in red "invalid vce() option"
            display in red "cluster variable {bf:`clustervar'} is " ///
                "string variable instead of a numeric variable"
            exit(198)
        }
        sort `clustervar'
    }

    gettoken depvar indepvars : varlist
    _fv_check_depvar `depvar'

    fvexpand `indepvars' 
    local cnames `r(varlist)'

    tempname b V N rank df_r

    mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'",    ///
       "`vce'", "`clustervar'",                                  /// 
       "`b'", "`V'", "`N'", "`rank'", "`df_r'") 

    if "`constant'" == "" {
        local cnames `cnames' _cons
    }

    matrix colnames `b' = `cnames'
    matrix colnames `V' = `cnames'
    matrix rownames `V' = `cnames'

    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N        = `N'
    ereturn scalar rank     = `rank'
    ereturn scalar df_r     = `df_r'
    ereturn local  vce      "`vce'"
    ereturn local  vcetype  "`vcetype'"
    ereturn local  clustvar "`clustervar'"
    ereturn local  cmd      "myregress12"

    ereturn display

end

mata:

void mywork( string scalar depvar,  string scalar indepvars, 
             string scalar touse,   string scalar constant,  
             string scalar vcetype, string scalar clustervar,
             string scalar bname,   string scalar Vname,     
             string scalar nname,   string scalar rname,     
             string scalar dfrname) 
{

    real vector    y, b, e, e2, cvar, ei 
    real matrix    X, XpXi, M, info, xi 
    real scalar    n, p, k, nc, i, dfr

    y    = st_data(., depvar, touse)
    X    = st_data(., indepvars, touse)
    n    = rows(X)

    if (constant == "") {
        X    = X,J(n,1,1)
    }

    XpXi = quadcross(X, X)
    XpXi = invsym(XpXi)
    b    = XpXi*quadcross(X, y)
    e    = y - X*b
    e2   = e:^2
    p    = cols(X)
    k    = p - diag0cnt(XpXi)
    if (vcetype == "robust") {
        M    = quadcross(X, e2, X)
        dfr  = n - k
        V    = (n/dfr)*XpXi*M*XpXi
    }
    else if (vcetype == "cluster") {
        cvar = st_data(., clustervar, touse)
        info = panelsetup(cvar, 1)
        nc   = rows(info)
        M    = J(k, k, 0)
        dfr  = nc - 1
        for(i=1; i<=nc; i++) {
            xi = panelsubmatrix(X,i,info)
            ei = panelsubmatrix(e,i,info)
            M  = M + xi'*(ei*ei')*xi
        }
        V    = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi
    }
    else {                 // vcetype must IID
        dfr  = n - k
        V    = (quadsum(e2)/dfr)*XpXi
    }

    st_matrix(bname, b')
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, k)
    st_numscalar(dfrname, dfr)

}

end

Let’s break this 118-line program into familiar pieces. Lines 2-56 define the ado-command, and lines 58-118 define the Mata work function that is used by the ado-command. Despite the addition of details to handle the parsing and computation of a robust or cluster-robust VCE, the structures of the ado-command and of the Mata work function are the same as they were in myregress11.ado; see Programming an estimation command in Stata: An OLS command using Mata.

The ado-command has four parts.

  1. Lines 5-31 parse what the user typed, identify the sample, and create temporary names for the results returned by our Mata work function.
  2. Lines 33-35 call the Mata work function.
  3. Lines 37-52 post the results returned by the Mata work function to e().
  4. Line 54 displays the results.

The Mata function mywork() also has four parts.

  1. Lines 60-65 parse the arguments.
  2. Lines 68-70 declare vectors, matrices, and scalars that are local to mywork().
  3. Lines 80-108 compute the results.
  4. Lines 110-114 copy the computed results to Stata, using the names that were passed in the arguments.

Now, I address the details of the ado-code, although I do not discuss details in myregress12.ado, which I already covered when describing myregress11.ado in Programming an estimation command in Stata: An OLS command using Mata. Line 5 allows the user to specify the vce() option, and line 8 uses _vce_parse to parse what the user specifies. Lines 9 and 10 put the type of VCE found by _vce_parse in the local macro vce and the name of the cluster variable, if specified, in the local macro clustervar. Lines 11-13 put Robust in the local vcetype, if the specified vce is either robust or cluster. If there is a cluster variable, lines 14–23 check that it is numeric and use it to sort the data.

Line 34 passes the new arguments for the type of VCE and the name of the cluster variable to the Mata work function mywork().

Lines 49–51 store the type of VCE, the output label for the VCE type, and the name of the cluster variable in e(), respectively.

Now, I address the details of the Mata work function mywork() but only discussing what I have added to mywork() in myregress11.ado. Line 62 declares the new arguments. The string scalar vcetype is empty, or it contains “robust”, or it contains “cluster”. The string scalar clustervar is either empty or contains the name of the cluster variable.

Lines 68–70 declare the local-to-the-function vectors cvar and ei and the local-to-the-function matrices M, info, and xi that are needed now but not previously.

Lines 87, 91–92, 104–105, and 108 specify if-else blocks to compute the correct VCE. Lines 88–90 compute a robust estimator of the VCE if vcetype contains “robust”. Lines 93–103 compute a cluster-robust estimator of the VCE if vcetype contains “cluster”. Lines 106–107 compute an IID-based estimator of the VCE if vcetype contains neither “robust” nor “cluster”.

Done and undone

I introduced the undocumented command _vce_parse and discussed the code for myregress12.ado, which uses Mata to compute OLS point estimates and an IID-based VCE, a robust VCE, or a cluster-robust VCE.

The structure of the code is the same as the one that I used in myregress11.ado and in mymean8.ado, which I discussed in Programming an estimation command in Stata: An OLS command using Mata and in Programming an estimation command in Stata: A first ado-command using Mata. That the structure remains the same makes it easier to handle the details that arise in more complicated problems.

Programming an estimation command in Stata: A map to posted entries

I have posted 15 entries to the Stata blog about programming an estimation command in Stata. They are best read in order. The comprehensive list below allows you to read them from first to last, at your own pace.

  1. Programming estimators in Stata: Why you should

    To help you write Stata commands that people want to use, I illustrate how Stata syntax is predictable and give an overview of the estimation-postestimation structure that you will want to emulate in your programs.

  2. Programming an estimation command in Stata: Where to store your stuff

    I discuss the difference between scripts and commands, and I introduce some essential programming concepts and constructions that I use to write the scripts and commands.

  3. Programming an estimation command in Stata: Global macros versus local macros

    I discuss a pair of examples that illustrate the differences between global macros and local macros.

  4. Programming an estimation command in Stata: A first ado-command

    I discuss the code for a simple estimation command to focus on the details of how to implement an estimation command. The command that I discuss estimates the mean by the sample average. I begin by reviewing the formulas and a do-file that implements them. I subsequently introduce ado-file programming and discuss two versions of the command. Along the way, I illustrate some of the postestimation features that work after the command.

  5. Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects

    I present the formulas for computing the ordinary least-squares (OLS) estimator, and I discuss some do-file implementations of them. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors. I introduce the Stata matrix commands and matrix functions that I use in ado-commands that I discuss in upcoming posts.

  6. Programming an estimation command in Stata: A first command for OLS

    I show how to write a Stata estimation command that implements the OLS estimator by explaining the code.

  7. Programming an estimation command in Stata: A better OLS command

    I use the syntax command to improve the command that implements the OLS estimator that I discussed in Programming an estimation command in Stata: A first command for OLS. I show how to require that all variables be numeric variables and how to make the command accept time-series operated variables.

  8. Programming an estimation command in Stata: Allowing for sample restrictions and factor variables

    I modify the OLS command discussed in Programming an estimation command in Stata: A better OLS command to allow for sample restrictions, to handle missing values, to allow for factor variables, and to deal with perfectly collinear variables.

  9. Programming an estimation command in Stata: Allowing for options

    I make three improvements to the command that implements the OLS estimator that I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables. First, I allow the user to request a robust estimator of the variance-covariance of the estimator. Second, I allow the user to suppress the constant term. Third, I store the residual degrees of freedom in e(df_r) so that test will use the t or F distribution instead of the normal or chi-squared distribution to compute the p-value of Wald tests.

  10. Programming an estimation command in Stata: Using a subroutine to parse a complex option

    I make two improvements to the command that implements the OLS estimator that I discussed in Programming an estimation command in Stata: Allowing for options. First, I add an option for a cluster-robust estimator of the variance -covariance of the estimator (VCE). Second, I make the command accept the modern syntax for either a robust or a cluster-robust estimator of the VCE. In the process, I use subroutines in my ado-program to facilitate the parsing, and I discuss some advanced parsing tricks.

  11. Programming an estimation command in Stata: Mata 101

    I introduce Mata, the matrix programming language that is part of Stata.

  12. Programming an estimation command in Stata: Mata functions

    I show how to write a function in Mata, the matrix programming language that is part of Stata.

  13. Programming an estimation command in Stata: A first ado-command using Mata

    I discuss a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrate a general structure for Stata/Mata programs.

  14. Programming an estimation command in Stata: Computing OLS objects in Mata

    I present the formulas for computing the OLS estimator and show how to compute them in Mata. This post is a Mata version of Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors.

  15. Programming an estimation command in Stata: An OLS command using Mata

    I discuss a command that computes OLS results in Mata, paying special attention to the structure of Stata programs that use Mata work functions.

  16. Programming an estimation command in Stata: Adding robust and cluster-robust VCEs to our Mata based OLS command

    I show how to use the undocumented command _vce_parse to parse the options for robust or cluster-robust estimators of the VCE. I then discuss myregress12.ado, which performs its computations in Mata and computes an IID-based, a robust, or a cluster-robust estimator of the VCE.

  17. Programming an estimation command in Stata: A review of nonlinear optimization using Mata

    I review the theory behind nonlinear optimization and get some practice in Mata programming by implementing an optimizer in Mata. This post is designed to help you develop your Mata programming skills and to improve your understanding of how the Mata optimization suites optimize() and moptimize() work.

  18. Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters

    I show how to use optimize() in Mata to maximize a Poisson log-likelihood function and to obtain estimators of the VCE based on IID observations or on robust methods.

  19. Programming an estimation command in Stata: A poisson command using Mata

    I discuss mypoisson1, which computes Poisson-regression results in Mata. The code in mypoisson1.ado is remarkably similar to the code in myregress11.ado, which computes OLS results in Mata, as I discussed in Programming an estimation command in Stata: An OLS command using Mata

Programming an estimation command in Stata: An OLS command using Mata

I discuss a command that computes ordinary least-squares (OLS) results in Mata, paying special attention to the structure of Stata programs that use Mata work functions.

This command builds on several previous posts; at a minimum, you should be familiar with Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: Computing OLS objects in Mata.

An OLS command with Mata computations

The Stata command myregress11 computes the results in Mata. The syntax of the myregress11 command is

myregress11 depvar [indepvars] [if] [in] [, noconstant]

where indepvars can contain factor variables or time-series variables.

In the remainder of this post, I discuss the code for myregress11.ado. I recommend that you click on the file name to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

I do not discuss the formulas for the OLS objects. See Programming an estimation command in Stata: Computing OLS objects in Mata for the formulas and Mata implementations.

Code block 1: myregress11.ado

*! version 11.0.0  11Jan2016
program define myregress11, eclass sortpreserve
    version 14

    syntax varlist(numeric ts fv) [if] [in] [, noCONStant ]
    marksample touse

    gettoken depvar indepvars : varlist
    _fv_check_depvar `depvar'

    fvexpand `indepvars' 
    local cnames `r(varlist)'

    tempname b V N rank df_r

    mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", ///
       "`b'", "`V'", "`N'", "`rank'", "`df_r'") 

    if "`constant'" == "" {
    	local cnames `cnames' _cons
    }

    matrix colnames `b' = `cnames'
    matrix colnames `V' = `cnames'
    matrix rownames `V' = `cnames'

    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `rank'
    ereturn scalar df_r    = `df_r'
    ereturn local  cmd     "myregress11"

    ereturn display

end

mata:

void mywork( string scalar depvar,  string scalar indepvars, 
             string scalar touse,   string scalar constant,  
	     string scalar bname,   string scalar Vname,     
	     string scalar nname,   string scalar rname,     
	     string scalar dfrname) 
{

    real vector y, b, e, e2
    real matrix X, XpXi
    real scalar n, k

    y    = st_data(., depvar, touse)
    X    = st_data(., indepvars, touse)
    n    = rows(X)

    if (constant == "") {
        X    = X,J(n,1,1)
    }

    XpXi = quadcross(X, X)
    XpXi = invsym(XpXi)
    b    = XpXi*quadcross(X, y)
    e    = y - X*b
    e2   = e:^2
    k    = cols(X) - diag0cnt(XpXi)
    V    = (quadsum(e2)/(n-k))*XpXi

    st_matrix(bname, b')
    st_matrix(Vname, V)
    st_numscalar(nname, n)
    st_numscalar(rname, k)
    st_numscalar(dfrname, n-k)

}

end

Let’s break this 74-line program into familiar pieces to make it easier to understand. Lines 2–35 define the ado-command, and lines 37–74 define the Mata work function that is used by the ado-command. Although there are more details, I used this structure in mymean8.ado, which I discussed in Programming an estimation command in Stata: A first ado-command using Mata.

The ado-command has four parts.

  1. Lines 5–14 parse what the user typed, identify the sample, and create temporary names for the results returned by our Mata work function.
  2. Lines 16-17 call the Mata work function.
  3. Lines 19–31 post the results returned by the Mata work function to e().
  4. Line 33 displays the results.

The Mata function mywork() also has four parts.

  1. Lines 39–43 parse the arguments.
  2. Lines 46–48 declare vectors, matrices, and scalars that are local to mywork().
  3. Lines 54–64 compute the results.
  4. Lines 66–70 copy the computed results to Stata, using the names that were passed in arguments.

Now let’s discuss the ado-code in some detail. Line 5 uses the syntax command to put the names of the variables specified by the user into the local macro varlist, to parse an optional if restriction, to parse an optional in restriction, and to parse the noconstant option. The variables specified by the user must be numeric, may contain time-series variables, and may contain factor variables. For more detailed discussions of similar syntax usages, see Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, Programming an estimation command in Stata: Allowing for options, and Programming an estimation command in Stata: Using a subroutine to parse a complex option.

Line 6 creates a temporary sample-identification variable and stores its name in the local macro touse. I discussed this usage in detail in the section surrounding code block 1 in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables.

Line 8 uses gettoken to store the name of the dependent variable in the local macro depvar and the names of the independent variables in the local macro indepvars. Line 9 uses _fv_check_depvar to check that the name of the dependent variable is not a factor variable.

Line 11 uses fvexpand to expand the factor variables in indepvars. Line 12 puts the expanded names stored in r(varlist) by fvexpand in the local macro cnames. A single factor variable can imply more than one coefficient. fvexpand finds the canonical names for these coefficients and returns them in r(varlist). I have not used fvexpand until now because the Stata commands that I used to compute the results automatically created the coefficient names. Mata functions are designed for speed, so I must create the coefficient names when I use them.

Example 1 illustrates how one factor variable can imply more than one coefficient.

Example 1: fvexpand

. sysuse auto
(1978 Automobile Data)

. tabulate rep78

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

. fvexpand i.rep78

. return list

macros:
              r(fvops) : "true"
            r(varlist) : "1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78"

. summarize 2.rep78

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     2.rep78 |         69     .115942    .3225009          0          1

The tabulate results show that there are five levels in rep78. fvexpand finds the levels and creates a list of the names of the implied indicator variables 1b.rep78, 2.rep78, 3.rep78, 4.rep78, and 5.rep78. Comparing the results from summarize 2.rep78 and tabulate rep78 illustrates this notation. The b in 1b.rep78 identifies level 1 as the base category to be omitted when there is a constant in the model. Type help fvvarlist for more details.

Line 14 creates the temporary names for the results. For example, it stores a safe, temporary name in the local macro b that can be used for the matrix storing the point estimates. I discussed this usage in the section Using temporary names for global objects in Programming an estimation command in Stata: A first ado-command using Mata.

Lines 16 and 17 call the Mata function mywork(), which uses the information contained in the local macros depvar, cnames, touse, and constant to compute the results that are returned in the Stata objects whose names are stored in the local macros b, V, N, rank, and df_r.

Line 20 appends _cons to the local macro cnames, if the user specified the option noconstant.

Lines 23–25 put row names on the vector of point estimates and row and column names on the matrix containing the estimated variance-covariance of the estimator (VCE).

Lines 27–31 post the results to e().

Line 33 displays a standard Stata output table, using the results in e(b), e(V), and e(df_r).

Note that the local macro b created on line 14 contains a temporary name that is passed to mywork() on line 17 and that the Stata matrix whose name is contained in the local macro b is used on lines 23 and 27. mywork() puts the vector of point estimates in the Stata matrix whose name is stored in the local macro b. Also note that the local macro V created on line 14 contains a temporary name that is passed to mywork() on line 17 and that the Stata matrix whose name is contained in the local macro V is used on lines 24, 25, and 27. mywork() puts the estimated VCE in the Stata matrix whose name is stored in the local macro V.

To see how this works, let’s discuss the mywork() function in detail. Lines 39–43 declare that mywork() returns nothing, it is void, and declare that mywork() accepts nine arguments, each of which is a string scalar. The first four arguments are inputs; depvar contains the name of the independent variable, indepvars contains the names of the independent variables, touse contains the name of the sample-identification variable, and constant contains either noconstant or is empty. The values of these arguments are used on lines 50, 51, and 54 to create the vector y and the matrix X.

The last five arguments contain names used to write the results back to Stata. mywork() writes the results back to Stata using the passed-in temporary names. For example, line 17 shows that the Mata string scalar bname contains the temporary name stored in the local macro b. Line 66 copies the results stored in the transpose of the Mata vector b to a Stata matrix whose name is stored in the Mata string scalar bname. (Line 60 shows that the vector b contains the OLS point estimates.) Lines 23 and 27 then use this Stata vector whose name is stored in the local macro b. Similarly, line 17 shows that the Mata string scalar Vname contains the temporary name stored in the local macro V. Line 67 copies the results stored in the Mata matrix V to a Stata matrix whose name is stored in the Mata string scalar Vname. (Line 64 shows that V contains the estimated VCE.) Lines 24, 25, and 27 then use this Stata matrix whose name is stored in the local macro V. The arguments nname, rname, and dfrname are used analogously to return the results for the number of observations, the rank of the VCE, and the degrees of freedom of the residuals.

Lines 50–64 compute the point estimates and the VCE. Except for line 54, I discussed these computations in Programming an estimation command in Stata: A first ado-command using Mata. Line 54 causes a column of 1s to be joined to the covariate matrix X when the string scalar constant is empty. Lines 5 and 16 imply that the Mata string scalar constant contains noconstant when the user specifies the noconstant option and that it is empty otherwise.

Done and undone

I discussed the code for myregress11.ado, which uses Mata to compute OLS point estimates and a VCE that assumes independent and identically distributed observations. The structure of the code is the same as the one that I used in mymean7.ado and mymean8.ado, discussed in Programming an estimation command in Stata: A first ado-command using Mata, although there are more details in the OLS program.

Key to this structure is that the Mata work function accepts two types of arguments: the names of Stata objects that are inputs and temporary names that are used to write the results back to Stata from Mata.

In the next post, I extend myregress11.ado to allow for robust or cluster-robust estimators of the VCE.

probit or logit: ladies and gentlemen, pick your weapon

We often use probit and logit models to analyze binary outcomes. A case can be made that the logit model is easier to interpret than the probit model, but Stata’s margins command makes any estimator easy to interpret. Ultimately, estimates from both models produce similar results, and using one or the other is a matter of habit or preference.

I show that the estimates from a probit and logit model are similar for the computation of a set of effects that are of interest to researchers. I focus on the effects of changes in the covariates on the probability of a positive outcome for continuous and discrete covariates. I evaluate these effects on average and at the mean value of the covariates. In other words, I study the average marginal effects (AME), the average treatment effects (ATE), the marginal effects at the mean values of the covariates (MEM), and the treatment effects at the mean values of the covariates (TEM).

First, I present the results. Second, I discuss the code used for the simulations.

Results

In Table 1, I present the results of a simulation with 4,000 replications when the true data generating process (DGP) satisfies the assumptions of a probit model. I show the average of the AME and the ATE estimates and the 5% rejection rate of the true null hypothesis that arise after probit and logit estimation. I also provide an approximate true value of the AME and ATE. I obtain the approximate true values by computing the ATE and AME, at the true values of the coefficients, using a sample of 20 million observations. I will provide more details on the simulation in a later section.

Table 1: Average Marginal and Treatment Effects: True DGP Probit

Simulation Results for N=10,000 and 4,000 Replications
Statistic Approximate True Value Probit Logit
AME of x1 -.1536 -.1537 -.1537
5% Rejection Rate .050 .052
ATE of x2 .1418 .1417 .1417
5% Rejection Rate .050 .049

For the MEM and TEM, we have the following:

Table 2: Marginal and Treatment Effects at Mean Values: True DGP Probit

Simulation Results for N=10,000 and 4,000 Replications
Statistic Approximate True Value Probit Logit
MEM of x1 -.1672 -.1673 -.1665
5% Rejection Rate .056 .06
TEM of x2 .1499 .1498 .1471
5% Rejection Rate .053 .058

The logit estimates are close to the true value and have a rejection rate that is close to 5%. Fitting the parameters of our model using logit when the true DGP satisfies the assumptions of a probit model does not lead us astray.

If the true DGP satisfies the assumptions of the logit model, the conclusions are the same. I present the results in the next two tables.

Table 3: Average Marginal and Treatment Effects: True DGP Logit

Simulation Results for N=10,000 and 4,000 Replications
Statistic Approximate True Value Probit Logit
AME of x1 -.1090 -.1088 -.1089
5% Rejection Rate .052 .052
ATE of x2 .1046 .1044 .1045
5% Rejection Rate .053 .051

Table 4: Marginal and Treatment Effects at Mean Values: True DGP Logit

Simulation Results for N=10,000 and 4,000 Replications
Statistic Approximate True Value Probit Logit
MEM of x1 -.1146 -.1138 -.1146
5% Rejection Rate .050 .051
TEM of x2 .1086 .1081 .1085
5% Rejection Rate .058 .058

Why?

Maximum likelihood estimators find the parameters that maximize the likelihood that our data will fit the distributional assumptions that we make. The likelihood chosen is an approximation to the true likelihood, and it is a helpful approximation if the true likelihood and our approximating are close to each other. Viewing likelihood-based models as useful approximations, instead of as models of a true likelihood, is the basis of quasilikelihood theory. For more details, see White (1996) and Wooldridge (2010).

It is assumed that the unobservable random variable in the probit model and logit model comes from a standard normal and logistic distribution, respectively. The cumulative distribution functions (CDFs) in these two cases are close to each other, especially around the mean. Therefore, estimators under these two sets of assumptions produce similar results. To illustrate these arguments, we can plot the two CDFs and their differences as follows:

Graph 1: Normal and Logistic CDF’s and their Difference
graph1

The difference between the CDFs approaches zero as you get closer to the mean, from the right or from the left, and it is always smaller than .15.

Simulation design

Below is the code I used to generate the data for my simulations. In the first part, lines 4 to 12, I generate outcome variables that satisfy the assumptions of the probit model, y1, and the logit model, y2. In the second part, lines 13 to 16, I compute the marginal effects for the logit and probit models. I have a continuous and a discrete covariate. For the discrete covariate, the marginal effect is a treatment effect. In the third part, lines 17 to 25, I compute the marginal effects evaluated at the means. I will use these estimates later to compute approximations to the true values of the effects.

program define mkdata
        syntax, [n(integer 1000)]
        clear
        quietly set obs `n'
        // 1. Generating data from probit, logit, and misspecified 
        generate x1    = rnormal()
        generate x2    = rbeta(2,4)>.5
        generate e1    = rnormal()
        generate u     = runiform()
        generate e2    = ln(u) -ln(1-u)
        generate xb    = .5*(1 -x1 + x2)
        generate y1    =  xb + e1 > 0
        generate y2    =  xb + e2 > 0
        // 2. Computing probit & logit marginal and treatment effects 
        generate m1    = normalden(xb)*(-.5)
        generate m2    = normal(1 -.5*x1 ) - normal(.5 -.5*x1)
        generate m1l   = exp(xb)*(-.5)/(1+exp(xb))^2
        generate m2l   = exp(1 -.5*x1)/(1+ exp(1 -.5*x1 )) - ///
                         exp(.5 -.5*x1)/(1+ exp(.5 -.5*x1 ))
        // 3. Computing probit & logit marginal and treatment effects at means
        quietly mean x1 x2 
        matrix A         = r(table)
        scalar a         = .5 -.5*A[1,1] + .5*A[1,2]
        scalar b1        =  1 -.5*A[1,1]
        scalar b0        = .5 -.5*A[1,1]
        generate mean1   = normalden(a)*(-.5)
        generate mean2   = normal(b1) - normal(b0)
        generate mean1l  = exp(a)*(-.5)/(1+exp(a))^2
        generate mean2l  = exp(b1)/(1+ exp(b1)) - exp(b0)/(1+ exp(b0))
end

I approximate the true marginal effects using a sample of 20 million observations. This is a reasonable strategy in this case. For example, take the average marginal effect for a continuous covariate, \(x_{k}\), in the case of the probit model:

\[\begin{equation*}
\frac{1}{N}\sum_{i=1}^N \phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}
\end{equation*}\]

The expression above is an approximation to \(E\left(\phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}\right)\). To obtain this expected value, we would need to integrate over the distribution of all the covariates. This is not practical and would limit my choice of covariates. Instead, I draw a sample of 20 million observations, compute \(\frac{1}{N}\sum_{i=1}^N \phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}\), and take it to be the true value. I follow the same logic for the other marginal effects.

Below is the code I use to compute the approximate true marginal effects. I draw the 20 million observations, then I compute the averages that I am going to use in my simulation, and I create locals for each approximate true value.

. mkdata, n(20000000)

. local values "m1 m2 m1l m2l mean1 mean2 mean1l mean2l"

. local means  "mx1 mx2 mx1l mx2l meanx1 meanx2 meanx1l meanx2l"

. local n : word count `values'

. forvalues i= 1/`n' {
  2.         local a: word `i' of `values'
  3.         local b: word `i' of `means'
  4.         sum `a', meanonly
  5.         local `b' = r(mean)
  6. }

Now I am ready to run all the simulations that I used to produce the results in the previous section. The code that I used for the simulations for the ATE and the AME when the true DGP is a probit is given by

. postfile mprobit y1p y1p_r y1l y1l_r y2p y2p_r y2l y2l_r ///
>                 using simsmprobit, replace 

. forvalues i=1/4000 {
  2.         quietly {
  3.                 mkdata, n(10000)
  4.                 probit y1 x1 i.x2, vce(robust)
  5.                 margins, dydx(*) atmeans post 
  6.                 local y1p = _b[x1]
  7.                 test _b[x1] = `meanx1'
  8.                 local y1p_r   = (r(p)<.05) 
  9.                 local y2p = _b[1.x2]
 10.                 test _b[1.x2] = `meanx2'
 11.                 local y2p_r   = (r(p)<.05) 
 12.                 logit  y1 x1 i.x2, vce(robust)
 13.                 margins, dydx(*) atmeans post 
 14.                 local y1l = _b[x1]
 15.                 test _b[x1] = `meanx1'
 16.                 local y1l_r   = (r(p)<.05) 
 17.                 local y2l = _b[1.x2]
 18.                 test _b[1.x2] = `meanx2'
 19.                 local y2l_r   = (r(p)<.05) 
 20.                 post mprobit (`y1p') (`y1p_r') (`y1l') (`y1l_r') ///
>                            (`y2p') (`y2p_r') (`y2l') (`y2l_r')
 21.         }
 22. }
. use simsprobit
. summarize

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         y1p |      4,000   -.1536812    .0038952  -.1697037  -.1396532
       y1p_r |      4,000         .05    .2179722          0          1
         y1l |      4,000   -.1536778    .0039179  -.1692524  -.1396366
       y1l_r |      4,000      .05175    .2215496          0          1
         y2p |      4,000     .141708    .0097155   .1111133   .1800973
-------------+---------------------------------------------------------
       y2p_r |      4,000       .0495    .2169367          0          1
         y2l |      4,000    .1416983    .0097459   .1102069   .1789895
       y2l_r |      4,000        .049     .215895          0          1

For the results in the case of the MEM and the TEM when the true DGP is a probit, I use margins with the option atmeans. The other cases are similar. I use robust standard error for all computations to account for the fact that my likelihood model is an approximation to the true likelihood, and I use the option vce(unconditional) to account for the fact that I am using two-step M-estimation. See Wooldridge (2010) for more details on two-step M-estimation.

You can download the code used to produce the results by clicking this link: pvsl.do

Concluding Remarks

I provided simulation evidence that illustrates that the differences between using estimates of effects after probit or logit is negligible. The reason lies in the theory of quasilikelihood and, specifically, in that the cumulative distribution functions of the probit and logit models are similar, especially around the mean.

References

White, H. 1996. Estimation, Inference, and Specification Analysis>. Cambridge: Cambridge University Press.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

Programming an estimation command in Stata: Computing OLS objects in Mata

\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}
\newcommand{\ebi}{\boldsymbol{\epsilon}_i}
\newcommand{\Sigmab}{\boldsymbol{\Sigma}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\eb}{{\bf e}}
\newcommand{\xb}{{\bf x}}
\newcommand{\xbit}{{\bf x}_{it}}
\newcommand{\xbi}{{\bf x}_{i}}
\newcommand{\zb}{{\bf z}}
\newcommand{\zbi}{{\bf z}_i}
\newcommand{\wb}{{\bf w}}
\newcommand{\yb}{{\bf y}}
\newcommand{\ub}{{\bf u}}
\newcommand{\Xb}{{\bf X}}
\newcommand{\Mb}{{\bf M}}
\newcommand{\Xtb}{\tilde{\bf X}}
\newcommand{\Wb}{{\bf W}}
\newcommand{\Vb}{{\bf V}}\)I present the formulas for computing the ordinary least-squares (OLS) estimator and show how to compute them in Mata. This post is a Mata version of Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors.

OLS formulas

Recall that the OLS point estimates are given by

\[
\widehat{\betab} =
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\left(
\sum_{i=1}^N \xb_i’y_i
\right)
\]

where \(\xb_i\) is the \(1\times k\) vector of independent variables, \(y_i\) is the dependent variable for each of the \(N\) sample observations, and the model for \(y_i\) is

\[
y_i = \xb_i\betab’ + \epsilon_i
\]

If the \(\epsilon_i\) are independently and identically distributed (IID), we estimate the variance-covariance matrix of the estimator (VCE) by

\[
\widehat{\Vb} = \widehat{s}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]

where \(\widehat{s} = 1/(N-k)\sum_{i=1}^N e_i^2\) and \(e_i=y_i-\xb_i\widehat{\betab}\). See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for introductions to OLS.

Mata implementation

I compute the OLS point estimates in Mata in example 1.

Example 1: Computing OLS point estimates in Mata

. sysuse auto
(1978 Automobile Data)

. mata:
------------------------------------------------- mata (type end to exit) ------
: y    = st_data(., "price")

: X    = st_data(., "mpg trunk")

: n    = rows(X)

: X    = X,J(n,1,1)

: XpX  = quadcross(X, X)

: XpXi = invsym(XpX)

: b    = XpXi*quadcross(X, y)

: end
--------------------------------------------------------------------------------

I used st_data() to put a copy of the observations on price into the Mata vector y and to put a copy of the observations on mpg and trunk into the Mata matrix X. I used rows(X) to put the number of observations into n. After adding a column of ones onto X for the constant term, I used quadcross() to calculate \(\Xb’\Xb\) in quad precision. After using invsym() to calculate the inverse of the symmetric matrix XpXi, I calculated the point estimates from the OLS formula.

In example 1, I computed the OLS point estimates after forming the cross products. As discussed in Lange (2010, chapter 7), I could compute more accurate estimates using a QR decomposition; type help mf_qrd for details about computing QR decompositions in Mata. By computing the cross products in quad precision, I obtained point estimates that are almost as accurate as those obtainable from a QR decomposition in double precision, but that is a topic for another post.

Here are the point estimates I computed in Mata and comparable results from regress.

Example 2: Results from Mata and regress

. mata: b'
                  1              2              3
    +----------------------------------------------+
  1 |  -220.1648801    43.55851009    10254.94983  |
    +----------------------------------------------+

. regress price mpg trunk

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     10.14
       Model |   141126459         2  70563229.4   Prob > F        =    0.0001
    Residual |   493938937        71  6956886.44   R-squared       =    0.2222
-------------+----------------------------------   Adj R-squared   =    0.2003
       Total |   635065396        73  8699525.97   Root MSE        =    2637.6

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -350.9529    -89.3769
       trunk |   43.55851   88.71884     0.49   0.625    -133.3418    220.4589
       _cons |   10254.95   2349.084     4.37   0.000      5571.01    14938.89
------------------------------------------------------------------------------

Given the OLS point estimates, I can now compute the IID estimator of the VCE.

Example 3: Computing the IID VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: e    = y - X*b

: e2   = e:^2

: k    = cols(X)

: V    = (quadsum(e2)/(n-k))*XpXi

: sqrt(diagonal(V))'
                 1             2             3
    +-------------------------------------------+
  1 |  65.59262431   88.71884015    2349.08381  |
    +-------------------------------------------+

: end
--------------------------------------------------------------------------------

I put the residuals into the Mata vector e, which I subsequently element-wise squared. I used cols(X) to put the number of covariates into k. I used quadsum() to compute the sum of the squared residuals in quad precision when computing V, an IID estimator for the VCE. The standard errors displayed by sqrt(diagonal(V)) are the same as the ones displayed by regress in example 2.

Robust standard errors

The frequently used robust estimator of the VCE is given by

\[
\widehat{V}_{robust}=\frac{N}{N-k}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]

where
\[\Mb=\sum_{i=1}^N \widehat{e}_i^2\xb_i’\xb_i\]

See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for derivations and discussions.

Example 4 implements this estimator in Mata.

Example 4: A robust VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: M    = quadcross(X, e2, X)

: V    = (n/(n-k))*XpXi*M*XpXi

: sqrt(diagonal(V))'
                 1             2             3
    +-------------------------------------------+
  1 |  72.45387946   71.45370224   2430.640607  |
    +-------------------------------------------+

: end
--------------------------------------------------------------------------------

Using quadcross(X, e2, X) to compute M is more accurate and faster than looping over the observations. The accuracy comes from the quad precision offered by quadcross(). The speed comes from performing the loops in compiled C code instead of compiled Mata code. Mata is fast but C is faster, because C imposes much more structure and because C is compiled using much more platform-specific information than Mata.

quadcross() is also faster because it has been parallelized, like many Mata functions. For example, a call to quadcross() from Stata/MP with 2 processors will run about twice as fast as a call to quadcross() from Stata/SE when there are many rows in X. A detailed discussion of the performance increases offered by Mata in Stata/MP is a subject for another post.

I now verify that my computations match those reported by regress.

Example 5: Comparing computations of robust VCE

. regress price mpg trunk, vce(robust)

Linear regression                               Number of obs     =         74
                                                F(2, 71)          =      11.59
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   72.45388    -3.04   0.003    -364.6338   -75.69595
       trunk |   43.55851    71.4537     0.61   0.544    -98.91613    186.0331
       _cons |   10254.95   2430.641     4.22   0.000      5408.39    15101.51
------------------------------------------------------------------------------

Cluster-robust standard errors

The cluster-robust estimator of the VCE is frequently used when the data have a group structure, also known as a panel structure or as a longitudinal structure. This VCE accounts for the within-group correlation of the errors, and it is given by

\[
\widehat{V}_{cluster}=\frac{N-1}{N-k}\frac{g}{g-1}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb_c
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]

where
\[
\Mb_c=\sum_{j=1}^g
\Xb_j’
(\widehat{\eb}_j \widehat{\eb}_j’)
\Xb_j
\]

\(\Xb_j\) is the \(n_j\times k\) matrix of observations on \(\xb_i\) in group \(j\), \(\widehat{\eb}_j\) is the \(n_j\times 1\) vector of residuals in group \(j\), and \(g\) is the number of groups. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.

Computing \(\Mb_c\) requires sorting the data by group. I use rep78, with the missing values replaced by 6, as the group variable in my example. In example 6, I sort the dataset in Stata, put a copy of the observations on the modified rep78 into the column vector id, and recompute the OLS objects that I need. I could have sorted the dataset in Mata, but I usually sort it in Stata, so that is what I illustrated. Type help mf_sort for sorting in Mata. In a real program, I would not need to recompute everything. I do here because I did not want to discuss the group variable or sorting the dataset until I discussed cluster-robust standard errors.

Example 6: Setup for computing M

. replace rep78=6 if missing(rep78)
(5 real changes made)

. sort rep78

. mata:
------------------------------------------------- mata (type end to exit) ------
: id   = st_data(., "rep78")

: y    = st_data(., "price")

: X    = st_data(., "mpg trunk")

: n    = rows(X)

: X    = X,J(n,1,1)

: k    = cols(X)

: XpX  = quadcross(X, X)

: XpXi = invsym(XpX)

: b    = XpXi*quadcross(X, y)

: e    = y - X*b

: end
--------------------------------------------------------------------------------

The Mata function panelsetup(Q,p) returns a matrix describing the group structure of the data when Q is sorted by the group variable in column p. I illustrate this function in example 7.

Example 7: panelsetup()

. list rep78 if rep78<3, sepby(rep78)

     +-------+
     | rep78 |
     |-------|
  1. |     1 |
  2. |     1 |
     |-------|
  3. |     2 |
  4. |     2 |
  5. |     2 |
  6. |     2 |
  7. |     2 |
  8. |     2 |
  9. |     2 |
 10. |     2 |
     +-------+

. mata:
------------------------------------------------- mata (type end to exit) ------
: info = panelsetup(id, 1)

: info
        1    2
    +-----------+
  1 |   1    2  |
  2 |   3   10  |
  3 |  11   40  |
  4 |  41   58  |
  5 |  59   69  |
  6 |  70   74  |
    +-----------+

: end
--------------------------------------------------------------------------------

I begin by listing out the group variable, rep78, in Stata for the first two groups. I then use panelsetup() to create info, which has one row for each group with the first column containing the first row of that group and the second column containing the second row of that group. I display info to illustrate what it contains. The first row of info specifies that the first group starts in row 1 and ends in row 2, which matches the results produced by list. The second row of info specifies that the second group starts in row 3 and ends in row 10, which also matches the results produced by list.

Having created info, I can use it and the panelsubmatrix() to compute \(\Mb_c\).

Example 8: A cluster-robust VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: nc   = rows(info)

: M    = J(k, k, 0)

: for(i=1; i<=nc; i++) {
>     xi = panelsubmatrix(X,i,info)
>     ei = panelsubmatrix(e,i,info)
>     M  = M + xi'*(ei*ei')*xi
> }

: V    = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi

: sqrt(diagonal(V))'
                 1             2             3
    +-------------------------------------------+
  1 |  93.28127184   58.89644366   2448.547376  |
    +-------------------------------------------+

: end
--------------------------------------------------------------------------------

After storing the number of groups in nc, I created an initial M to be a k \(\times\) k matrix of zeros. For each group, I used panelsubmatrix() to extract the covariate for that group from X, I used panelsubmatrix() to extract the residuals for that group from e, and I added that group's contribution into M. After looping over the groups, I computed V and displayed the standard errors.

I now verify that my computations match those reported by regress.

Example 9: Comparing computations of cluster-robust VCE

. regress price mpg trunk, vce(cluster rep78)

Linear regression                               Number of obs     =         74
                                                F(2, 5)           =       9.54
                                                Prob > F          =     0.0196
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

                                  (Std. Err. adjusted for 6 clusters in rep78)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   93.28127    -2.36   0.065     -459.952    19.62226
       trunk |   43.55851   58.89644     0.74   0.493    -107.8396    194.9566
       _cons |   10254.95   2448.547     4.19   0.009     3960.758    16549.14
------------------------------------------------------------------------------

Done and undone

I reviewed the formulas that underlie the OLS estimator and showed how to compute them in Mata. In the next two posts, I write an ado-command that implements these formulas.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.

Lange, K. 2010. Numerical Analysis for Statisticians. 2nd ed. New York: Springer.

Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.

Programming an estimation command in Stata: A first ado-command using Mata

I discuss a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrate a general structure for Stata/Mata programs. This post builds on Programming an estimation command in Stata: Mata 101, Programming an estimation command in Stata: Mata functions, and Programming an estimation command in Stata: A first ado command.

Using Mata in ado-programs

I begin by reviewing the structure in mymean5.ado, which I discussed in Programming an estimation command in Stata: A first ado command.

Code block 1: mymean5.ado

*! version 5.0.0 20Oct2015
program define mymean5, eclass
	version 14

	syntax varlist(max=1)

	tempvar e2 
	tempname b V
	quietly summarize `varlist'
	local sum            = r(sum)
	local N              = r(N)
	matrix `b'           = (1/`N')*`sum'
	generate double `e2' = (`varlist' - `b'[1,1])^2
	quietly summarize `e2'
	matrix `V'           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn post `b' `V'
	ereturn display
end

The syntax command on line 5 stores the name of the variable for which the command estimates the mean. The tempvar and tempname commands on lines 7 and 8 put temporary names into the local macros e2, b, and V. Lines 9-15 compute the point estimates and the estimated variance-covariance of the estimator (VCE), using the temporary names for objects, so as not to overwrite user-created objects. Lines 16–18 put the column name on the point estimate and row and column names on the estimated VCE. Line 19 posts the point estimate and the estimated VCE to e(b) and e(V), respectively. Line 20 produces a standard output table from the information stored in e(b) and e(V).

By the end of this post, I will have a command that replaces the Stata computations on lines 9–15 with Mata computations. To illustrate the structure of Stata-Mata programming, I start off only computing the point estimate in myregress6.

Code block 2: mymean6.ado

*! version 6.0.0 05Dec2015
program define mymean6, eclass
	version 14

	syntax varlist(max=1)

	mata: x  = st_data(., "`varlist'")
	mata: w  = mean(x)
	mata: st_matrix("Q", w)

	display "The point estimates are in Q"
	matrix list Q

end

Line 7 executes a one-line call to Mata; in this construction, Stata drops down to Mata, executes the Mata expression, and pops back up to Stata. Popping down to Mata and back up to Stata takes almost no time, but I would like to avoid doing it three times. (Lines 8 and 9 are also one-line calls to Mata.)

Line 7 puts a copy of all the observations on the variable for which the command estimates the mean in the Mata column vector named x, which is stored in global Mata memory. Line 8 stores the mean of the column vector in the 1\(\times\)1 matrix named w, which is also stored in global Mata memory. Line 9 copies the Mata matrix w to the Stata matrix named Q. Lines 11 and 12 display the results stored in Stata.

I illustrate what myregress6 produces in example 1.

Example 1: myregress6 uses global Mata memory

. sysuse auto
(1978 Automobile Data)

. mymean6 price
The point estimates are in Q

symmetric Q[1,1]
           c1
r1  6165.2568

. matrix dir
            Q[1,1]

. mata: mata describe

      # bytes   type                        name and extent
-------------------------------------------------------------------------------
            8   real scalar                 w
          592   real colvector              x[74]
-------------------------------------------------------------------------------

I use matrix dir to illustrate that Q is a Stata matrix, and I use mata describe to illustrate that x and w are objects in global Mata memory. Using fixed names for an object in Stata memory or in global Mata memory should be avoided, because you can overwrite users’ data.

mymean7 does not put anything in global Mata memory; all computations are done using objects that are local to the Mata function mymean_work(). mymean7 uses temporary names for objects stored in Stata memory.

Code block 3: mymean7.ado

*! version 7.0.0 07Dec2015
program define mymean7, eclass
	version 14

	syntax varlist(max=1)

	tempname b
	mata: mymean_work("`varlist'", "`b'")

	display "b is "
	matrix list `b'
end

mata:
void mymean_work(string scalar vname, string scalar mname)
{
	real vector    x
	real scalar    w
	
	x  = st_data(., vname)
	w  = mean(x)
	st_matrix(mname, w)
}
end

There are two parts to mymean7.ado: an ado-program and a Mata function. The ado-program is defined on lines 2–12. The Mata function mymean_work() is defined on lines 14–24. The Mata function mymean_work() is local to the ado-program mymean7.

Line 8 uses a one-line call to Mata to execute mymean_work(). mymean_work() does not return anything to global Mata memory, and we are passing in two arguments. The first argument is a string scalar containing the name of the variable for which the function should compute the estimate. The second argument is a string scalar containing the temporary name stored in the local macro b. This temporary name will be the name of a Stata matrix that stores the point estimate computed in mymean_work().

Line 15 declares the function mymean_work(). Function declarations specify what the function returns, the name of the function, and the arguments that the function accepts; see Programming an estimation command in Stata: Mata functions for a quick introduction.

The word void on line 15 specifies that the function does not return an argument; in other words, it returns nothing. What precedes the ( is the function name; thus, mymean_work() is the name of the function. The words string scalar vname specify that the first argument of mymean_work() is a string scalar that is known as vname inside mymean_work(). The comma separates the first argument from the second argument. The words string scalar mname specify that the second argument of mymean_work() is a string scalar that is known as mname inside mymean_work(). ) closes the function declaration.

Lines 17-22 define mymean_work() because they are enclosed between the curly braces on lines 16 and 23. Lines 17 and 18 declare the real vector x and real scalar w, which are local to mymean_work(). Lines 20 and 21 compute the estimate. Line 22 copies the estimate stored in the scalar w, which is local to the Mata function mymean_work(), to the Stata matrix whose name is stored in the string scalar mname, which contains the temporary name contained in the local macro b that was passed to the function on line 8.

The structure used in mymean7 ensures three important features.

  1. It does not use global Mata memory.
  2. It uses temporary names for global Stata objects.
  3. It leaves nothing behind in Mata memory or Stata memory.

Examples 2 and 3 combine to illustrate feature (3); example 2 clears Stata and Mata memory, and example 3 shows that mymean7 leaves nothing in the previously cleared memory.

Example 2: Removing objects from Stata and Mata memory

. clear all

. matrix dir

. mata: mata describe

      # bytes   type                        name and extent
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

In detail, I use clear all to drop all objects from Stata and Mata memory, use matrix dir to illustrate that no matrices were left in Stata memory, and use mata describe to illustrate that no objects were left in Mata memory.

Example 3: mymean7 leaves nothing in Stata or Mata memory

. sysuse auto
(1978 Automobile Data)

. mymean7 price
b is 

symmetric __000000[1,1]
           c1
r1  6165.2568

. matrix dir

. mata: mata describe

      # bytes   type                        name and extent
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

In example 3, I use mymean7 to estimate the mean, and use matrix dir and mata describe to illustrate that mymean7 did not leave Stata matrices or Mata objects in memory. The output also illustrates that the temporary name __000000 was used for the Stata matrix that held the result before the ado-program terminated.

While it is good that mymean7 leaves nothing in global Stata or Mata memory, it is bad that mymean7 does not leave the estimate behind somewhere, like in e().

mymean8 stores the results in e() and has the features of mymean5, but computes its results in Mata.

Code block 4: mymean8.ado

*! version 8.0.0 07Dec2015
program define mymean8, eclass
	version 14

	syntax varlist(max=1)

	tempname b V
	mata: mymean_work("`varlist'", "`b'", "`V'")
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn post `b' `V'
	ereturn display
end

mata:
void mymean_work(                  ///
          string scalar vname,     ///
	  string scalar mname,     ///
	  string scalar vcename)
{
	real vector    x, e2
	real scalar    w, n, v
	
	x  = st_data(., vname)
	n  = rows(x)
	w  = mean(x)
	e2 = (x :- w):^2
	v   = (1/(n*(n-1)))*sum(e2)
	st_matrix(mname,   w)
	st_matrix(vcename, v)
}
end

Line 8 is a one-line call to mymean_work(), which now has three arguments: the name of the variable whose mean is to be estimated, a temporary name for the Stata matrix that will hold the point estimate, and a temporary name for the Stata matrix that will hold the estimated VCE. The declaration mymean_work() on lines 17-20 has been adjusted accordingly; each of the three arguments is a string scalar. Lines 22 and 23 declare objects local to mymean_work(). Lines 25-29 compute the mean and the estimated VCE. Lines 30 and 31 copy these results to Stata matrices, under the temporary names in the second and third arguments.

There is a logic to the order of the arguments in mymean_work(); the first argument is the name of an input, the second and third arguments are temporary names for the outputs.

Returning to the ado-code, we see that lines 9–11 put row or column names on the point estimate or the estimated VCE. Line 12 posts the results to e(), which are displayed by line 13.

Example 4 illustrates that mymean8 produces the same point estimate and standard error as produced by mean.

Example 4: Comparing mymean8 and mean

. mymean8 price
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |   6165.257   342.8719    17.98   0.000      5493.24    6837.273
------------------------------------------------------------------------------

. mean price

Mean estimation                   Number of obs   =         74

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       price |   6165.257   342.8719      5481.914      6848.6
--------------------------------------------------------------

The confidence intervals produced by mymean8 differ from those produced by mean because mymean8 uses a normal distribution while mean uses a \(t\) distribution. The mymean_work() in mymean9 uses a fourth argument to remove this difference.

Code block 5: mymean9.ado

*! version 9.0.0 07Dec2015
program define mymean9, eclass
	version 14

	syntax varlist(max=1)

	tempname b V dfr
	mata: mymean_work("`varlist'", "`b'", "`V'", "`dfr'")
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn post `b' `V'
	ereturn scalar df_r  = `dfr'
	ereturn display
end

mata:
void mymean_work(                  ///
          string scalar vname,     ///
	  string scalar mname,     ///
	  string scalar vcename,   ///
	  string scalar dfrname)
{
	real vector    x, e2
	real scalar    w, n, v
	
	x  = st_data(., vname)
	n  = rows(x)
	w  = mean(x)
	e2 = (x :- w):^2
	v   = (1/(n*(n-1)))*sum(e2)
	st_matrix(mname,   w)
	st_matrix(vcename, v)
	st_numscalar(dfrname, n-1)
}
end

On line 8, mymean_work() accepts four arguments. The fourth argument is new; it contains the temporary name that is used for the Stata scalar that holds the residual degrees of freedom. Line 34 copies the value of the expression n-1 to the Stata numeric scalar whose name is stored in the string scalar dfrname; this Stata scalar now contains the residual degrees of freedom. Line 13 stores the residual degrees of freedom in e(df_r), which causes ereturn display to use a \(t\) distribution instead of a normal distribution.

Example 5: mymean9 uses a $t$ distribution

. mymean9 price
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       price |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
------------------------------------------------------------------------------

mymean9 has five basic parts.

  1. It parses the user input.
  2. It uses a one-line call to a Mata work routine to compute results and to store these results in Stata matrices whose temporary names are passed to the Mata work routine.
  3. It puts on the column and row names and stores the results in e().
  4. It displays the results.
  5. It defines the Mata work routine after the end that terminates the definition of the ado-program.

This structure can accommodate any estimator whose results we can store in e(). The details of each part become increasingly complicated, but the structure remains the same. In future posts, I discuss Stata/Mata programs with this structure that implement the ordinary-least squares (OLS) estimator and the Poisson quasi-maximum-likelihood estimator.

Done and undone

I discussed a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrated a general structure for Stata/Mata programs.

In the next post, I show some Mata computations that produce the point estimates, an IID VCE, a robust VCE, and a cluster-robust VCE for the OLS estimator.

Programming an estimation command in Stata: Mata functions

I show how to write a function in Mata, the matrix programming language that is part of Stata. This post uses concepts introduced in Programming an estimation command in Stata: Mata 101.

Mata functions

Commands do work in Stata. Functions do work in Mata. Commands operate on Stata objects, like variables, and users specify options to alter the behavior. Mata functions accept arguments, operate on the arguments, and may return a result or alter the value of an argument to contain a result.

Consider myadd() defined below.

Code block 1: myadd()

mata:
function myadd(X, Y)
{
    A = X + Y
    return(A)
}
end

myadd() accepts two arguments, X and Y, puts the sum of X and Y into A, and returns A. For example,

Example 1: Defining and using a function

. mata:
------------------------------------------------- mata (type end to exit) ------
: function myadd(X, Y)
> {
>     A = X + Y
>     return(A)
> }

: C = J(3, 3, 4)

: D = I(3)

: W = myadd(C,D)

: W
[symmetric]
       1   2   3
    +-------------+
  1 |  5          |
  2 |  4   5      |
  3 |  4   4   5  |
    +-------------+

: end
--------------------------------------------------------------------------------

After defining myadd(), I create the matrices C and D, and I pass C and D into myadd(), which returns their sum. Mata assigns the returned sum to W, which I display. Note that inside the function myadd(), C and D are respectively known as X and Y.

The A created in myadd() can only be accessed inside myadd(), and it does not conflict with an A defined outside myadd(); that is, A is local to the function myadd(). I now illustrate that A is local to myadd().

Example 2: A is local to myadd()

. mata:
------------------------------------------------- mata (type end to exit) ------
: A = J(3, 3, 4)

: A
[symmetric]
       1   2   3
    +-------------+
  1 |  4          |
  2 |  4   4      |
  3 |  4   4   4  |
    +-------------+

: W = myadd(A,D)

: A
[symmetric]
       1   2   3
    +-------------+
  1 |  4          |
  2 |  4   4      |
  3 |  4   4   4  |
    +-------------+

: end
--------------------------------------------------------------------------------

Having illustrated that the A defined inside myadd() is local to myadd(), I should point out that the C and D matrices I defined in example 1 are in global Mata memory. As in ado-programs, we do not want to use fixed names in global Mata memory in our programs so that we do not destroy the users’ data. Fortunately, this problem is easily solved by writing Mata functions that write their results out to Stata and do not return results. I will provide detailed discussions of this solution in the commands that I develop in subsequent posts.

When a Mata function changes the value of an argument inside the function, that changes the value of that argument outside the function; in other words, arguments are passed by address. Mata functions can compute more than one result by storing these results in arguments. For example, sumproduct() returns both the sum and the element-wise product of two matrices.

Code block 2: sumproduct()

function sumproduct(X, Y, S, P)
{
	S = X +  Y
	P = X :* Y
	return
}

sumproduct() returns the sum of the arguments X and Y in the argument S and the element-wise product in P.

Example 3: Returning results in arguments

. mata:
------------------------------------------------- mata (type end to exit) ------
: function sumproduct(X, Y, S, P)
> {
>         S = X +  Y
>         P = X :* Y
>         return
> }

: A = I(3)

: B = rowshape(1::9, 3)

: A
[symmetric]
       1   2   3
    +-------------+
  1 |  1          |
  2 |  0   1      |
  3 |  0   0   1  |
    +-------------+

: B
       1   2   3
    +-------------+
  1 |  1   2   3  |
  2 |  4   5   6  |
  3 |  7   8   9  |
    +-------------+

: W=.

: W
  .

: Z=.

: Z
  .

: sumproduct(A, B, W, Z)

: W
        1    2    3
    +----------------+
  1 |   2    2    3  |
  2 |   4    6    6  |
  3 |   7    8   10  |
    +----------------+

: Z
[symmetric]
       1   2   3
    +-------------+
  1 |  1          |
  2 |  0   5      |
  3 |  0   0   9  |
    +-------------+

: end
--------------------------------------------------------------------------------

After defining sumproduct(), I use I() to create A and use rowshape() to create B. I then create W and Z; each is a scalar missing value. I must create W and Z before I pass them as arguments; otherwise, I would be referencing arguments that do not exist. After calling sumproduct(), I display W and Z to illustrate that they now contain the sum and element-wise product of X and Y.

In myadd() and sumproduct(), I did not specify what type of thing each argument must be, nor did I specify what type of thing each function would return. In other words, I used implicit declarations. Implicit declarations are easier to type than explicit declarations, but they make error messages and code less informative. I highly recommend explicitly declaring returns, arguments, and local variables to make your code and error messages more readable.

myadd2() is a version of myadd() that explicitly declares the type of thing returned, the type of thing that each argument must be, and the type that each local-to-the-function thing must be.

Code block 3: myadd2(): Explicit declarations

mata:
numeric matrix myadd2(numeric matrix X, numeric matrix Y)
{
    numeric matrix A

    A = X + Y
    return(A)
}
end

myadd2() returns a numeric matrix that it constructs by adding the numeric matrix X to the numeric matrix Y. The local-to-the-function object A is also a numeric matrix. A numeric matrix is either a real matrix or a complex matrix; it cannot be a string matrix.

Explicitly declaring returns, arguments, and local variables makes the code more informative. I immediately see that myadd2() does not work with string matrices, but this property is buried in the code for myadd().

I cannot sufficiently stress the importance of writing easy-to-read code. Reading other people’s code is an essential part of programming. It is instructional, and it allows you to adopt the solutions that others have found or implemented. If you are new to programming, you may not yet realize that after a few months, reading your own code is like reading someone else’s code. Even if you never give your code to anyone else, it is essential that you write easy-to-read code so that you can read it at a later date.

Explicit declarations also make some error messages easier to track down. In examples 4 and 5, I pass a string matrix to myadd() and to myadd2(), respectively.

Example 4: Passing a string matrix to myadd()

. mata:
------------------------------------------------- mata (type end to exit) ------
: B = I(3)

: C = J(3,3,"hello")

: myadd(B,C)
                 myadd():  3250  type mismatch
                 :     -  function returned error
(0 lines skipped)
--------------------------------------------------------------------------------
r(3250);

Example 5: Passing a string matrix to myadd2()

. mata:
------------------------------------------------- mata (type end to exit) ------
: B = I(3)

: C = J(3,3,"hello")

: numeric matrix myadd2(numeric matrix X, numeric matrix Y)
> {
>     numeric matrix A
> 
>     A = X + Y
>     return(A)
> }

: myadd2(B,C)
                myadd2():  3251  C[3,3] found where real or complex required
                 :     -  function returned error
(0 lines skipped)
--------------------------------------------------------------------------------
r(3251);

end of do-file

The error message in example 4 indicates that somewhere in myadd(), an operator or a function could not perform something on two objects because their types were not compatible. Do not be deluded by the simplicity of myadd(). Tracking down a type mismatch in real code can be difficult.

In contrast, the error message in example 5 says that the matrix C we passed to myadd2() is neither a real nor a complex matrix like the argument of myadd2() requires. Looking at the code and the error message immediately informs me that the problem is that I passed a string matrix to a function that requires a numeric matrix.

Explicit declarations are so highly recommended that Mata has a setting to require it, as illustrated below.

Example 6: Turning on matastrict

. mata: mata set matastrict on

Setting matastrict to on causes the Mata compiler to require that return and local variables be explicitly declared. By default, matastrict is set to off, in which case return and local variables may be implicitly declared.

When matastrict is set to on, arguments are not required to be explicitly declared because some arguments hold results whose input and output types could differ. Consider makereal() defined and used in example 7.

Example 7: Changing an arguments type

. mata:
------------------------------------------------- mata (type end to exit) ------
: void makereal(A)
> {
>         A = substr(A, 11,1) 
>         A = strtoreal(A)
> }

: A = J(2,2, "Number is 4")

: A
[symmetric]
                 1             2
    +-----------------------------+
  1 |  Number is 4                |
  2 |  Number is 4   Number is 4  |
    +-----------------------------+

: makereal(A)

: A + I(2)
[symmetric]
       1   2
    +---------+
  1 |  5      |
  2 |  4   5  |
    +---------+

: end
--------------------------------------------------------------------------------

The declaration of makereal() specifies that makereal() returns nothing because void comes before the name of the function. Even though matastrict is set to on, I did not declare what type of thing A must be. The two executable lines of makereal() clarify that A must be a string on input and that A will be real on output, which I subsequently illustrate.

I use the feature that arguments may be implicitly declared to make my code easier to read. Many of the Mata functions that I write replace arguments with results. I explicitly declare arguments that are inputs, and I implicitly declare arguments that contain outputs. Consider sumproduct2().

Code block 4: sumproduct2(): Explicit declarations of inputs but not outputs

void sumproduct2(real matrix X, real matrix Y, S, P)
{
	S = X +  Y
	P = X :* Y
	return
}

sumproduct2() returns nothing because void comes before the function name. The first argument X is real matrix, the second argument Y is a real matrix, the third argument S is implicitly declared, and the fourth argument P is implicitly declared. My coding convention that inputs are explicitly declared and that outputs are implicitly declared immediately informs me that X and Y are inputs but that S and P are outputs. That X and Y are inputs and that S and P are outputs is illustrated in example 8.

Example 8: Explicitly declaring inputs but not outputs

. mata:
------------------------------------------------- mata (type end to exit) ------
: void sumproduct2(real matrix X, real matrix Y, S, P)
> {
>         S = X +  Y
>         P = X :* Y
>         return
> }

: A = I(2)

: B = rowshape(1::4, 2)

: C = .

: D = .

: sumproduct2(A, B, C, D)

: C
       1   2
    +---------+
  1 |  2   2  |
  2 |  3   5  |
    +---------+

: D
[symmetric]
       1   2
    +---------+
  1 |  1      |
  2 |  0   4  |
    +---------+

: end
--------------------------------------------------------------------------------

Done and undone

I showed how to write a function in Mata and discussed declarations in some detail. Type help m2_declarations for many more details.

In my next post, I use Mata functions to perform the computations for a simple estimation command.