### Archive

Archive for the ‘Programming’ Category

## Programming an estimation command in Stata: A poisson command using Mata


I build on previous posts. I use the structure of Stata programs that use Mata work functions that I discussed previously in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. You should be familiar with Poisson regression and using optimize(), which I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

A poisson command with Mata computations

The Stata command mypoisson1 computes the results in Mata. The syntax of the mypoisson1 command is

mypoisson1 depvar indepvars [if] [in] [, noconstant]

where indepvars can contain time-series variables. mypoisson1 does not allow for factor variables because they complicate the program. I discuss these complications, and present solutions, in my next post.

In the remainder of this post, I discuss the code for mypoisson1.ado. I recommend that you click on the file name to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

*! version 1.0.0  31Jan2016
program define mypoisson1, eclass sortpreserve
version 14.1

syntax varlist(numeric ts min=2) [if] [in] [, noCONStant ]
marksample touse

gettoken depvar indepvars : varlist

_rmcoll indepvars', constant' forcedrop
local indepvars  "r(varlist)'"

tempname b V N rank

mata: mywork("depvar'", "indepvars'", "touse'", "constant'", ///
"b'", "V'", "N'", "rank'")

if "constant'" == "" {
local cnames "indepvars' _cons"
}
else {
local cnames "indepvars'"
}
matrix colnames b' = cnames'
matrix colnames V' = cnames'
matrix rownames V' = cnames'

ereturn post b' V', esample(touse') buildfvinfo
ereturn scalar N       = N'
ereturn scalar rank    = rank'
ereturn local  cmd     "mypoisson1"

ereturn display

end

mata:

void mywork( string scalar depvar,  string scalar indepvars,
string scalar touse,   string scalar constant,
string scalar bname,   string scalar Vname,
string scalar nname,   string scalar rname)
{

real vector y, b, mo_v, cv
real matrix X, V, Cm
real scalar n, p, rank, ko

y = st_data(., depvar, touse)
n = rows(y)
X = st_data(., indepvars, touse)
if (constant == "") {
X = X,J(n, 1, 1)
}
p = cols(X)

S  = optimize_init()
optimize_init_argument(S, 1, y)
optimize_init_argument(S, 2, X)
optimize_init_evaluator(S, &plleval2())
optimize_init_params(S, J(1, p, .01))

b    = optimize(S)
V    = optimize_result_V_oim(S)
rank = p - diag0cnt(invsym(V))

st_matrix(bname, b)
st_matrix(Vname, V)
st_numscalar(nname, n)
st_numscalar(rname, rank)
}

void plleval2(real scalar todo, real vector b,     ///
real vector y,    real matrix X,     ///
{
real vector  xb

xb = X*b'
val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}

end


mypoisson1.ado has the structure of Stata programs that compute their results in Mata, which I discussed in Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: An OLS command using Mata. Lines 1–35 define the ado-command mypoisson1. Lines 37–83 define the Mata work function mywork() used in mypoisson1 and the evaluator function plleval2() used in mywork().

The ado-command mypoisson1 has four parts:

1. Lines 5–13 parse what the user typed, identify the sample, drop collinear variables from the list of independent variables, and create temporary names for Stata objects returned by our Mata work function.
2. Lines 15–16 call the Mata work function.
3. Lines 18–31 post the results returned by the Mata work function to e().
4. Line 33 displays the results.

The Mata work function mywork() has four parts.

1. Lines 39–42 parse the arguments.
2. Lines 45–47 declare vectors, matrices, and scalars that are local to mywork().
3. Lines 49–65 compute the results.
4. Lines 67–70 copy the computed results to Stata, using the names that were passed in arguments.

Now, I discuss the ado-code in some detail. Lines 2–35 are almost the same as lines 2–35 of myregress11, which I discussed in Programming an estimation command in Stata: An OLS command using Mata. That myregress11 handles factor variables but mypoisson1 does not causes most of the differences. That myregress11 stores the residual degrees of freedom but mypoisson1 does not causes two minor differences.

myregress11 uses the matrix inverter to handle cases of collinear independent variables. In the Poisson-regression case discussed here, collinear independent variables define an unconstrained problem without a unique solution, so optimize() cannot converge. mypoisson1 drops collinear independent variables to avoid this problem. I discuss a better solution in my next post.

Lines 10–11 use _rmcoll to drop the collinear independent variables and store the list of linearly independent variables in the local macro indepvars.

Now, I discuss the Mata work function mywork() in some detail. The mywork() defined on lines 39–71 is remarkably similar to the mywork() function defined on lines 39–72 of myregress11.ado, discussed in Programming an estimation command in Stata: An OLS command using Mata. In mypoisson1.ado, lines 57–65 use optimize() to compute the results. In myregress11.ado, lines 58–64 use matrix computations to compute the results.

Lines 73–81 define the evaluator function plleval2(), which is used by optimize() to compute the results, as I discussed in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters.

Examples 1 and 2 illustrate that mypoisson1 produces the same results as poisson.

Example 1: mypoisson1
(Uses accident3.dta)

. clear all

. use accident3

. mypoisson1 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943554   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082077
_cons |   .5743543   .2839519     2.02   0.043     .0178187     1.13089
------------------------------------------------------------------------------


Example 2: poisson

 poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
LR chi2(3)        =     340.20
Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------


Done and undone

I discussed mypoisson1, which computes Poisson-regression results in Mata. I highlighted how similar the code in mypoisson1.ado is to the code in myregress11.ado.

I also discussed that mypoisson1 drops collinear independent variables and noted that mypoisson1 does not handle factor variables. In my next post, I discuss a solution to the problems caused by collinear independent variables and discuss a command that handles factor variables.

Categories: Programming Tags:

## Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters


Using optimize()

There are many optional choices that one may make when solving a nonlinear optimization problem, but there are very few that one must make. The optimize*() functions in Mata handle this problem by making a set of default choices for you, requiring that you specify a few things, and allowing you to change any of the default choices.

When I use optimize() to solve a nonlinear optimization problem, I do four steps.

1. I create an optimize() object

: S = optimize_init()

which contains all the default choices

2. I use some of the optimize_init_*(S) functions to put information about my optimization problem into S.
3. I use

: betahat = optimize(S)

to perform the optimization.

4. I use some of the optimize_result_*(S) functions to get the results, which optimize(S) stored in S.

Consider maximizing the log-likelihood function of a Poisson model. The contribution of the $$i$$th observation to the log-likelihood is

$f_i(\betab) = y_i{\bf x_i}\betab – \exp({\bf x}_i\betab’) – \ln( y_i !)$

where $$y_i$$ is the dependent variable, $${\bf x}_i$$ is the vector of covariates, and $$\betab$$ is the row vector of parameters that we select to maximize the log-likelihood function given by $$F(\betab) =\sum_i f_i(\betab)$$. I could drop $$ln(y_i!)$$, because it does not depend on the parameters. I include it to make the value of the log-likelihood function the same as that reported by Stata. Stata includes these terms so that the values of the log-likelihood functions are comparable across models.

The code block 1 copies the data from Stata to Mata and computes the Poisson log-likelihood function at the vector of parameter values b, which has been set to the arbitrary starting values of .01 for each parameter.

Code block 1: An evaluator function for the Poisson log-likelihood

clear all
use accident3
mata:
y = st_data(., "accidents")
X = st_data(., "cvalue kids traffic")
X = X,J(rows(X), 1, 1)
b = J(1, cols(X), .01)
xb = X*b'
f  = sum(-exp(xb) + y:*xb - lnfactorial(y))
end


The Mata function plleval() in code block 2 puts the value of the Poisson log-likelihood function at the vector of parameter values b into val.

Code block 2: An evaluator function for the Poisson log-likelihood

mata:
void plleval(real scalar todo, real vector b, val, grad, hess)
{
real vector  y, xb
real matrix  X

y = st_data(., "accidents")
X = st_data(., "cvalue kids traffic")
X = X,J(rows(X), 1, 1)
xb = X*b'
val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end


plleval() has the default syntax of an evaluator function that optimize() can call. Evaluator functions have a default syntax so that optimize() can call them, which it must do to find the maximum. After describing the default syntax, I will show how to use evaluators with extra arguments.

plleval() is void, it returns nothing. The real scalar todo allows optimize() to tell the evaluator function what it must compute. The real vector b is the current value of the parameter vector. val is not typed because no matter what it contains on input, it will contain the value of the objective function on output. grad is not typed because it will optionally contain the vector of first derivatives of the objective function at the current value of b on output. hess is not typed because it will optionally contain the matrix of second derivatives of the objective function at the current value of b on output. As plleval() illustrates, the objective function must put the value of the objective function into the third argument, but it need not compute either the vector of first derivatives or the matrix of second derivatives.

In example 1, I use optimize() to maximize the Poisson log-likelihood function computed in plleval().

Example 1: Using optimize() to estimate Poisson parameters

(Uses accident3.dta)

. clear all

. use accident3

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval(real scalar todo, real vector b, val, grad, hess)
> {
>     real vector  y, xb
>     real matrix  X
>
>      y = st_data(., "accidents")
>      X = st_data(., "cvalue kids traffic")
>      X = X,J(rows(X), 1, 1)
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
note: argument todo unused
note: argument hess unused

:
: S  = optimize_init()

: optimize_init_evaluator(S, &plleval())

: optimize_init_params(S, J(1, 4, .01))

: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: bh
1              2              3              4
+-------------------------------------------------------------+
1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
+-------------------------------------------------------------+

: optimize_result_params(S)
1              2              3              4
+-------------------------------------------------------------+
1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
+-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
1             2             3             4
+---------------------------------------------------------+
1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
+---------------------------------------------------------+

: end
--------------------------------------------------------------------------------


After defining plleval(), I use optimize_init() to create the optimize() object S. I must put information about how to call plleval() and the vector of starting values into S. Typing

optimize_init_evaluator(S, &plleval())

puts the address of the evaluator function plleval() into S. By preceding the name of the evaluator function plleval() with an ampersand (&), I put the address of the evaluator function into S. optimize() requires that you put the function address instead of the function name because having the address speeds up finding the function. Typing

optimize_init_params(S, J(1, 4, .01))

puts the vector of starting values, J(1, 4, .01), into S.

Typing

bh = optimize(S)

causes optimize() to solve the optimization problem described in S, and it causes optimize() to put the vector of optimal parameters in bh. optimize() produces the default iteration log, because we did not change the default specification in S.

When optimize() has completed, the results are in S. For example, I display the bh returned by optimize() and use optimize_result_params(S) to display the result stored in S. I further illustrate by displaying the standard errors; optimize_result_V_oim() retrieves the observed-information-matrix (OIM) estimator of the variance–covariance of the estimator (VCE). Many other results are stored in S; type help mf optimize and look at the optimize_result*() functions for details.

Comparing the results in examples 1 and 2 shows that they are correct.

Example 2: Results from poisson

. poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
LR chi2(3)        =     340.20
Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------


plleval() is slow because it copies the data from Stata to Mata every time optimize() calls it. I would much rather pass the data to the evaluator function, but this requires putting information about the syntax of the new evaluator function in S. For example, I would like to use the evaluator function plleval2(). In example 3, I use optimize_init_argument() to put information into S about the extra arguments accepted by the new evaluator function plleval2().

Code block 3: Passing data to the Poisson evaluator function

mata:
void plleval2(real scalar todo, real vector b,     ///
real vector y,    real matrix X,     ///
{
real vector  xb

xb = X*b'
val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end


Line 3 declares the extra arguments, the real vector y, and the real vector X. The extra arguments come between the inputs that must always be present, the real scalar todo and the real vector b, and the always-present outputs; val, grad, and hess.

Example 3 uses optimize() to maximize the Poisson objective function coded in plleval2().

Example 3: Using optional arguments to pass data

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval2(real scalar todo, real vector b,     ///
>               real vector y,    real matrix X,     ///
> {
>     real vector  xb
>
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
note: argument todo unused
note: argument hess unused

:
: y = st_data(., "accidents")

: X = st_data(., "cvalue kids traffic")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval2())

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
1              2              3              4
+-------------------------------------------------------------+
1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
+-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
1             2             3             4
+---------------------------------------------------------+
1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
+---------------------------------------------------------+

: end
--------------------------------------------------------------------------------


After defining plleval2(), I copy the data from Stata to Mata, and I use optimize_init() to put the default choices into the optimize() object S. When I typed

optimize_init_argument(S, 1, y)

I put information into S specifying that optimize() should pass y as the first extra argument to the evaluator function. When I typed

optimize_init_argument(S, 2, X)

I put information into S specifying that optimize() should pass X as the second extra argument to the evaluator function.

Analogous to example 1, typing

optimize_init_evaluator(S, &plleval2())

puts the address of plleval2() into S, and typing

optimize_init_params(S, J(1, 4, .01))

puts the vector of starting values, J(1, 4, .01), in S.

The results are the same as those in example 1.

Vector of observation–level contributions and robust VCE estimation

Robust estimators for the VCE of an estimator use the structure of observation-level contributions; see Wooldridge (2010, chapters 12 and 13) or Cameron and Trivedi (2005, chapter 5). When the evaluator function gives optimize() a vector of observation-level contributions, instead of a scalar summation, optimize() can use this structure to compute robust or cluster–robust estimators of the VCE.

Consider plleval3(), which puts the vector of observation-level contributions into val.

Code block 4: A vector of observation-level contributions

mata:
void plleval3(real scalar todo, real vector b,     ///
real vector y,    real matrix X,     ///
{
real vector  xb

xb = X*b'
val = -exp(xb) + y:*xb - lnfactorial(y)
}
end


To use plleval3(), I must put information in the optimize() object stating that the evaluator function computes a vector of observation-level contributions. In example 4, I use optimize_init_evaluatortype() to put this information into the optimize() object S.

Example 4: Robust VCE estimation

. mata:
------------------------------------------------- mata (type end to exit) ------
: void plleval3(real scalar todo, real vector b,     ///
>               real vector y,    real matrix X,     ///
> {
>     real vector  xb
>
>     xb = X*b'
>    val = -exp(xb) + y:*xb - lnfactorial(y)
> }
note: argument todo unused
note: argument hess unused

:
:
: y = st_data(., "accidents")

: X = st_data(., "cvalue kids traffic")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval3())

: optimize_init_evaluatortype(S, "gf0")

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
1              2              3              4
+-------------------------------------------------------------+
1 |  -.6558871527   -1.009017051    .1467114658    .5743542978  |
+-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
1             2             3             4
+---------------------------------------------------------+
1 |  .0706483832   .0807960809    .031376176   .2839517337  |
+---------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_robust(S)))'
1             2             3             4
+---------------------------------------------------------+
1 |  .1096020124    .188666044    .092431746   .6045057623  |
+---------------------------------------------------------+

: end
--------------------------------------------------------------------------------


After defining plleval3(), I copy the data, create the optimize() object S, put the specifications for the extra arguments y and X in S, and put the address of plleval3() into S. Typing

optimize_init_evaluatortype(S, “gf0”)

puts in S the information that the evaluator function returns a vector of observation-level contribution and that it computes zero derivatives, that is the evaluator function is type “gf0”. Given the vector structure, I can type

optimize_result_V_robust(S)

to compute a robust estimator of the VCE.

sqrt(diagonal(optimize_result_V_robust(S)))’

returns the robust standard errors, which are the same as those reported by poisson in example 5.

Example 5: Robust VCE estimation by poisson

. poisson accidents cvalue kids traffic, vce(robust)

Iteration 0:   log pseudolikelihood = -555.86605
Iteration 1:   log pseudolikelihood =  -555.8154
Iteration 2:   log pseudolikelihood = -555.81538

Poisson regression                              Number of obs     =        505
Wald chi2(3)      =      99.76
Prob > chi2       =     0.0000
Log pseudolikelihood = -555.81538               Pseudo R2         =     0.2343

------------------------------------------------------------------------------
|               Robust
accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .1096019    -5.98   0.000    -.8707029   -.4410712
kids |  -1.009017    .188666    -5.35   0.000    -1.378795   -.6392382
traffic |   .1467115   .0924316     1.59   0.112    -.0344512    .3278741
_cons |    .574354   .6045047     0.95   0.342    -.6104535    1.759162
------------------------------------------------------------------------------


Done and undone

I showed how to use optimize() to maximize a Poisson log-likelihood function. I also showed how to obtain a robust estimator of the VCE by coding the evaluator function to compute a vector of observation-level contributions. In my next post, I show how to write a Stata command that uses Mata to estimate the parameters of a Poisson regression model.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

Categories: Programming Tags:

## Programming an estimation command in Stata: A review of nonlinear optimization using Mata


A quick review of nonlinear optimization

We want to maximize a real-valued function $$Q(\thetab)$$, where $$\thetab$$ is a $$p\times 1$$ vector of parameters. Minimization is done by maximizing $$-Q(\thetab)$$. We require that $$Q(\thetab)$$ is twice, continuously differentiable, so that we can use a second-order Taylor series to approximate $$Q(\thetab)$$ in a neighborhood of the point $$\thetab_s$$,

$Q(\thetab) \approx Q(\thetab_s) + \gb_s'(\thetab -\thetab_s) + \frac{1}{2} (\thetab -\thetab_s)’\Hb_s (\thetab -\thetab_s) \tag{1}$

where $$\gb_s$$ is the $$p\times 1$$ vector of first derivatives of $$Q(\thetab)$$ evaluated at $$\thetab_s$$ and $$\Hb_s$$ is the $$p\times p$$ matrix of second derivatives of $$Q(\thetab)$$ evaluated at $$\thetab_s$$, known as the Hessian matrix.

Nonlinear maximization algorithms start with a vector of initial values and produce a sequence of updated values that converge to the parameter vector that maximizes the objective function. The algorithms I discuss here can only find local maxima. The function in figure 1 has a local maximum at .2 and another at 1.5. The global maximum is at .2.

Figure 1: Local maxima

Each update is produced by finding the $$\thetab$$ that maximizes the approximation on the right-hand side of equation (1) and letting it be $$\thetab_{s+1}$$. To find the $$\thetab$$ that maximizes the approximation, we set to $${\bf 0}$$ the derivative of the right-hand side of equation (1) with respect to $$\thetab$$,

$\gb_s + \Hb_s (\thetab -\thetab_s) = {\bf 0} \tag{2}$

Replacing $$\thetab$$ with $$\thetab_{s+1}$$ and solving yields the update rule for $$\thetab_{s+1}$$.

$\thetab_{s+1} = \thetab_s – \Hb_s^{-1} \gb_s \tag{3}$

Note that the update is uniquely defined only if the Hessian matrix $$\Hb_s$$ is full rank. To ensure that we have a local maximum, we will require that the Hessian be negative definite at the optimum, which also implies that the symmetric Hessian is full rank.

The update rule in equation (3) does not guarantee that $$Q(\thetab_{s+1})>Q(\thetab_s)$$. We want to accept only those $$\thetab_{s+1}$$ that do produce such an increase, so in practice, we use

$\thetab_{s+1} = \thetab_s – \lambda \Hb_s^{-1} \gb_s \tag{4}$

where $$\lambda$$ is the step size. In the algorithm presented here, we start with $$\lambda$$ equal to $$1$$ and, if necessary, decrease $$\lambda$$ until we find a value that yields an increase.

The previous sentence is vague. I clarify it by writing an algorithm in Mata. Suppose that real scalar Q( real vector theta ) is a Mata function that returns the value of the objective function at a value of the parameter vector theta. For the moment, suppose that g_s is the vector of derivatives at the current theta, denoted by theta_s, and that Hi_s is the inverse of the Hessian matrix at theta_s. These definitions allow us to define the update function

Code block 1: Candidate rule for parameter vector

real vector tupdate(                 ///
real scalar lambda,          ///
real vector theta_s,         ///
real vector g_s,             ///
real matrix Hi_s)
{
return (theta_s - lambda*Hi_s*g_s)
}


For specified values of lambda, theta_s, g_s, and Hi_s, tupdate() returns a candidate value for theta_s1. But we only accept candidate values of theta_s1 that yield an increase, so instead of using tupdate() to get an update, we would use GetUpdate().

Code block 2: Update function for parameter vector

real vector GetUpdate(            ///
real vector theta_s,          ///
real vector g_s,              ///
real matrix Hi_s)
{
lambda = 1
theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
while ( Q(theta_s1) < Q(theta_s) ) {
lambda   = lambda/2
theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
}
return(theta_s1)
}


GetUpdate() starts by getting a candidate value for theta_s1 when lambda = 1. GetUpdate() returns this candidate theta_s1 if it produces an increase in Q(). Otherwise, GetUpdate() divides lambda by 2 and gets another candidate theta_s1 until it finds a candidate that produces an increase in Q(). GetUpdate() returns the first candidate that produces an increase in Q().

While these functions clarify the ambiguities in the original vague statement, GetUpdate() makes the unwise assumption that there is always a lambda for which the candidate theta_s1 produces an increase in Q(). The version of GetUpdate() in code block 3 does not make this assumption, it exits with an error if lambda is too small; less than $$10^{-11}$$.

Code block 3: A better update function for parameter vector

real vector GetUpdate(            ///
real vector theta_s,          ///
real vector g_s,              ///
real matrix Hi_s)
{
lambda = 1
theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
while ( Q(theta_s1) < Q(theta_s) ) {
lambda   = lambda/2
if (lambda < 1e-11) {
printf("{red}Cannot find parameters that produce an increase.\n")
exit(error(3360))
}
theta_s1 = tupdate(lambda, thetas, g_s, Hi_s)
}
return(theta_s1)
}


An outline of our algorithm for nonlinear optimization is the following:

1. Select initial values for parameter vector.
2. If current parameters set vector of derivatives of Q() to zero, go to (3); otherwise go to (A).
• Use GetUpdate() to get new parameter values.
• Calculate g_s and Hi_s at parameter values from (A).
• Go to (2).
3. Display results.

Code block 4 contains a Mata version of this algorithm.

Code block 4: Pseudocode for Newton–Raphson algorithm

theta_s  =  J(p, 1, .01)
GetDerives(theta_s, g_s, Hi_s.)
gz = g_s'*Hi*g
while (abs(gz) > 1e-13) {
theta_s = GetUpdate(theta_s, g_s, Hi_s)
GetDerives(theta_s, g_s, Hi_s)
gz      = g_s'*Hi_s*g_s
printf("gz is now %8.7g\n", gz)
}
printf("Converged value of theta is\n")
theta_s


Line 2 puts the vector of starting values, a $$p\times 1$$ vector with each element equal to .01, in theta_s. Line 3 uses GetDerives() to put the vector of first derivatives into g_s and the inverse of the Hessian matrix into Hi_s. In GetDerives(), I use cholinv() to calculate Hi_s. cholinv() returns missing values if the matrix is not positive definite. By calculating Hi_s = -1*cholinv(-H_s), I ensure that Hi_s contains missing values when the Hessian is not negative definite and full rank.

Line 3 calculates how different the vector of first derivatives is from 0. Instead of using a sum of squares, obtainable by g_s’g_s, I weight the first derivatives by the inverse of the Hessian matrix, which puts the $$p$$ first derivatives on a similar scale and ensures that the Hessian matrix is negative definite at convergence. (If the Hessian matrix is not negative definite, GetDerives() will put a matrix of missing values into Hi_s, which causes gz=., which will exceed the tolerance.)

To flush out the details we need a specific problem. Consider maximizing the log-likelihood function of a Poisson model, which has a simple functional form. The contribution of each observation to the log-likelihood is

$f_i(\betab) = y_i{\bf x_i}\betab – \exp({\bf x}_i\betab) – \ln( y_i !)$

where $$y_i$$ is the dependent variable, $${\bf x}_i$$ is the vector of covariates, and $$\betab$$ is the vector of parameters that we select to maximize the log-likelihood function given by $$F(\betab) =\sum_i f_i(\betab)$$. I could drop ln(y_i!), because it does not depend on the parameters. I include it to make the value of the log-likelihood function the same as that reported by Stata. Stata includes these terms so that log-likelihood-function values are comparable across models.

The pll() function in code block 5 computes the Poisson log-likelihood function from the vector of observations on the dependent variable y, the matrix of observations on the covariates X, and the vector of parameter values b.

Code block 5: A function for the Poisson log-likelihood function

// Compute Poisson log-likelihood
mata:
real scalar pll(real vector y, real matrix X, real vector b)
{
real vector  xb

xb = x*b
f  = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
end


The vector of first derivatives is

$\frac{\partial~F(\xb_,\betab)}{\partial \betab} = \sum_{i=1}^N (y_i – \exp(\xb_i\betab))\xb_i$

which I can compute in Mata as quadcolsum((y-exp(X*b)):*X), and the Hessian matrix is

$\label{eq:H} \sum_{i=1}^N\frac{\partial^2~f(\xb_i,\betab)}{\partial\betab \partial\betab^\prime} = – \sum_{i=1}^N\exp(\xb_i\betab)\xb_i^\prime \xb_i \tag{5}$

which I can compute in Mata as -quadcross(X, exp(X*b), X).

Here is some code that implements this Newton-Raphson NR algorithm for the Poisson regression problem.

Code block 6: pnr1.do
(Uses accident3.dta)

// Newton-Raphson for Poisson log-likelihood
clear all
use accident3

mata:
real scalar pll(real vector y, real matrix X, real vector b)
{
real vector  xb
xb = X*b
return(sum(-exp(xb) + y:*xb - lnfactorial(y)))
}

void GetDerives(real vector y, real matrix X, real vector theta, g, Hi)
{
real vector exb

exb = exp(X*theta)
Hi = -1*cholinv(Hi)
}

real vector tupdate(                 ///
real scalar lambda,          ///
real vector theta_s,          ///
real vector g_s,              ///
real matrix Hi_s)
{
return (theta_s - lambda*Hi_s*g_s)
}

real vector GetUpdate(            ///
real vector y,                ///
real matrix X,                ///
real vector theta_s,          ///
real vector g_s,              ///
real matrix Hi_s)
{
real scalar lambda
real vector theta_s1

lambda = 1
theta_s1 = tupdate(lambda, theta_s, g_s, Hi_s)
while ( pll(y, X, theta_s1) <= pll(y, X, theta_s) ) {
lambda   = lambda/2
if (lambda < 1e-11) {
printf("{red}Cannot find parameters that produce an increase.\n")
exit(error(3360))
}
theta_s1 = tupdate(lambda, theta_s, g_s, Hi_s)
}
return(theta_s1)
}

y = st_data(., "accidents")
X = st_data(., "cvalue kids traffic")
X = X,J(rows(X), 1, 1)

b  =  J(cols(X), 1, .01)
GetDerives(y, X, b, g=., Hi=.)
gz = .
while (abs(gz) > 1e-11) {
bs1 = GetUpdate(y, X, b, g, Hi)
b   = bs1
GetDerives(y, X, b, g, Hi)
gz = g'*Hi*g
printf("gz is now %8.7g\n", gz)
}
printf("Converged value of beta is\n")
b

end


Line 3 reads in the downloadable accident3.dta dataset before dropping down to Mata. I use variables from this dataset on lines 56 and 57.

Lines 6–11 define pll(), which returns the value of the Poisson log-likelihood function, given the vector of observations on the dependent variable y, the matrix of covariate observations X, and the current parameters b.

Lines 13‐21 put the vector of first derivatives in g and the inverse of the Hessian matrix in Hi. Equation 5 specifies a matrix that is negative definite, as long as the covariates are not linearly dependent. As discussed above, cholinv() returns a matrix of missing values if the matrix is not positive definite. I multiply the right-hand side on line 20 by $$-1$$ instead of on line 19.

Lines 23–30 implement the tupdate() function previously discussed.

Lines 32–53 implement the GetUpdate() function previously discussed, with the caveats that this version handles the data and uses pll() to compute the value of the objective function.

Lines 56–58 get the data from Stata and row-join a vector to X for the constant term.

Lines 60–71 implement the NR algorithm discussed above for this Poisson regression problem.

Running nr1.do produces

Example 1: NR algorithm for Poisson

. do pnr1

. // Newton-Raphson for Poisson log-likelihood
. clear all

. use accident3

.
. mata:

[Output Omitted]

: b  =  J(cols(X), 1, .01)

: GetDerives(y, X, b, g=., Hi=.)

: gz = .

: while (abs(gz) > 1e-11) {
>         bs1 = GetUpdate(y, X, b, g, Hi)
>         b   = bs1
>         GetDerives(y, X, b, g, Hi)
>         gz = g'*Hi*g
>         printf("gz is now %8.7g\n", gz)
> }
gz is now -119.201
gz is now -26.6231
gz is now -2.02142
gz is now -.016214
gz is now -1.3e-06
gz is now -8.3e-15

: printf("Converged value of beta is\n")
Converged value of beta is

: b
1
+----------------+
1 |  -.6558870685  |
2 |  -1.009016966  |
3 |   .1467114652  |
4 |   .5743541223  |
+----------------+

:
: end
--------------------------------------------------------------------------------
.
end of do-file


The point estimates in example 1 are equivalent to those produced by poisson.

Example 2: poisson results

. poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
LR chi2(3)        =     340.20
Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------


Done and undone

I implemented a simple nonlinear optimizer to practice Mata programming and to review the theory behind nonlinear optimization. In future posts, I implement a command for Poisson regression that uses the optimizer in optimize().

Categories: Programming Tags:

## Programming an estimation command in Stata: Adding robust and cluster-robust VCEs to our Mata based OLS command

I show how to use the undocumented command _vce_parse to parse the options for robust or cluster-robust estimators of the variance-covariance of the estimator (VCE). I then discuss myregress12.ado, which performs its computations in Mata and computes VCE estimators based on independently and identically distributed (IID) observations, robust methods, or cluster-robust methods.

myregress12.ado performs ordinary least-squares (OLS) regression, and it extends myregress11.ado, which I discussed in Programming an estimation command in Stata: An OLS command using Mata. To get the most out of this post, you should be familiar with Programming an estimation command in Stata: Using a subroutine to parse a complex option and Programming an estimation command in Stata: Computing OLS objects in Mata.

Parsing the vce() option

I used ado-subroutines to simplify the parsing of the options vce(robust) and vce(cluster cvarname) in myregress10.ado; see Programming an estimation command in Stata: Using a subroutine to parse a complex option. Part of the point was to illustrate how to write ado-subroutines and the programming tricks that I used in these subroutines.

Here I use the undocumented command _vce_parse to simplify the parsing. There are many undocumented commands designed to help Stata programmers. They are undocumented in that they are tersely documented in the system help but not documented in the manuals. In addition, the syntax or behavior of these commands may change over Stata releases, although this rarely happens.

_vce_parse helps Stata programmers parse the vce() option. To see how it works, consider the problem of parsing the syntax of myregress12.

myregress12 depvar [indepvars] [if] [in] [, vce(robust | cluster clustervar) noconstant]

where indepvars can contain factor variables or time-series variables.

I can use the syntax command to put whatever the user specifies in the option vce() into the local macro vce, but I still must (1) check that what was specified makes sense and (2) create local macros that the code can use to do the right thing. Examples 1–7 create the local macro vce, simulating what syntax would do, and then use _vce_robust to perform tasks (1) and (2).

I begin with the case in which the user specified vce(robust); here the local macro vce would contain the word robust.

Example 1: parsing vce(robust)

. clear all

. sysuse auto
(1978 Automobile Data)

. local vce "robust"

. _vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(vce')

. return list

macros:
r(robust) : "robust"
r(vceopt) : "vce(robust)"
r(vce) : "robust"


The command

_vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(vce’)

has two pieces. The piece before the colon (:) specifies the rules; the piece after the colon specifies what the user typed. Each piece can have a Stata object followed by some options; note the commas before optlist(Robust) and before vce(vce’). In the case at hand, the second piece only contains what the user specified – vce(robust) – and the first piece only contains the options optlist(Robust) and argoptlist(CLuster). The option optlist(Robust) specifies that the vce() option in the second piece may contain the option robust and that its minimal abbreviation is r. Note how the word Robust in optlist(Robust) mimics how syntax specifies minimum abbreviations. The option argoptlist(CLuster) specifies that the vce() option in the second piece may contain cluster clustervar, that the minimum abbreviation of cluster is cl, and that it will put the argument clustervar into a local macro.

After the command,

_vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(vce’)

I use return list to show what _vce_parse stored in r(). Because local macro vce contains “robust”, _vce_parse

1. puts the word robust in the local macro r(robust);
2. puts what the user typed, vce(robust), in the local macro r(vceopt); and
3. puts the type of VCE, robust, in the local macro r(vce).

Examples 2 and 3 illustrate that _vce_parse stores the same values in these local macros when the user specifies vce(rob) or vce(r), which are valid abbreviations for vce(robust).

Example 2: parsing vce(rob)

. local vce "rob"

. _vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(vce')

. return list

macros:
r(robust) : "robust"
r(vceopt) : "vce(robust)"
r(vce) : "robust"


Example 3: parsing vce(r)

. local vce "r"

. _vce_parse , optlist(Robust) argoptlist(CLuster) : , vce(vce')

. return list

macros:
r(robust) : "robust"
r(vceopt) : "vce(robust)"
r(vce) : "robust"


Now, consider parsing the option vce(cluster clustervar). Because the cluster variable clustervar may contain missing values, _vce_parse may need to update a sample-identification variable before it stores the name of the cluster variable in a local macro. In example 4, I use the command

_vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(vce’)

to handle the case when the user specifies vce(cluster rep78). The results from the tabulate and summarize commands illustrate that _vce_parse updates the sample-identification variable mytouse to account for the missing observations in rep78.

Example 4: parsing vce(cluster rep78)

. generate byte mytouse = 1

. tabulate mytouse

mytouse |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |         74      100.00      100.00
------------+-----------------------------------
Total |         74      100.00

. summarize rep78

Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
rep78 |         69    3.405797    .9899323          1          5

. local vce "cluster rep78"

. _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(vce')

. return list

macros:
r(robust) : "robust"
r(cluster) : "rep78"
r(vceopt) : "vce(cluster rep78)"
r(vceargs) : "rep78"
r(vce) : "cluster"

. tabulate mytouse

mytouse |      Freq.     Percent        Cum.
------------+-----------------------------------
0 |          5        6.76        6.76
1 |         69       93.24      100.00
------------+-----------------------------------
Total |         74      100.00


I use return list to show what _vce_parse stored in r(). Because local macro vce contains cluster rep78, _vce_parse

1. puts the word robust in the local macro r(robust);
2. puts the name of the cluster variable, rep78, in the local macro r(cluster);
3. puts what the user typed, vce(cluster rep78), in the local macro r(vceopt);
4. puts the argument to the cluster option, rep78, in the local macro r(vceargs); and
5. puts the type of VCE, cluster, in the local macro r(vce).

Examples 5 and 6 illustrate that _vce_parse stores the same values in these local macros when the user specifies vce(clus rep78) or vce(cl rep78), which are valid abbreviations for vce(cluster rep78).

Example 5: parsing vce(clus rep78)

. local vce "clus rep78"

. _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(vce')

. return list

macros:
r(robust) : "robust"
r(cluster) : "rep78"
r(vceopt) : "vce(cluster rep78)"
r(vceargs) : "rep78"
r(vce) : "cluster"


Example 6: parsing vce(cl rep78)

. local vce "cl rep78"

. _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vce(vce')

. return list

macros:
r(robust) : "robust"
r(cluster) : "rep78"
r(vceopt) : "vce(cluster rep78)"
r(vceargs) : "rep78"
r(vce) : "cluster"


Having illustrated how to make _vce_parse handle the cases when the user specifies something valid, I will show in example 7 that it will also produce a standard error message when the user specifies an error condition.

Example 7: parsing vce(silly)

. local vce "silly"

. capture noisily _vce_parse mytouse, optlist(Robust) argoptlist(CLuster) : , vc
> e(vce')
vcetype 'silly' not allowed

. return list


_vce_parse can parse other types of vce() options; to see them type help _vce_parse.

Also, remember to type undocumented when you are looking for a programmer’s tool.

The code for myregress12

Here is the code for myregress12.ado, which uses _vce_parse. I describe how it works below.

I recommend that you click on the file name to download the code for my myregress12.ado. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

*! version 12.0.0  16Jan2016
program define myregress12, eclass sortpreserve
version 14.1

syntax varlist(numeric ts fv) [if] [in] [, noCONStant vce(string) ]
marksample touse

_vce_parse touse' , optlist(Robust) argoptlist(CLuster) : , vce(vce')
local vce        "r(vce)'"
local clustervar "r(cluster)'"
if "vce'" == "robust" | "vce'" == "cluster" {
local vcetype "Robust"
}
if "clustervar'" != "" {
capture confirm numeric variable clustervar'
if _rc {
display in red "invalid vce() option"
display in red "cluster variable {bf:clustervar'} is " ///
"string variable instead of a numeric variable"
exit(198)
}
sort clustervar'
}

gettoken depvar indepvars : varlist
_fv_check_depvar depvar'

fvexpand indepvars'
local cnames r(varlist)'

tempname b V N rank df_r

mata: mywork("depvar'", "cnames'", "touse'", "constant'",    ///
"vce'", "clustervar'",                                  ///
"b'", "V'", "N'", "rank'", "df_r'")

if "constant'" == "" {
local cnames cnames' _cons
}

matrix colnames b' = cnames'
matrix colnames V' = cnames'
matrix rownames V' = cnames'

ereturn post b' V', esample(touse') buildfvinfo
ereturn scalar N        = N'
ereturn scalar rank     = rank'
ereturn scalar df_r     = df_r'
ereturn local  vce      "vce'"
ereturn local  vcetype  "vcetype'"
ereturn local  clustvar "clustervar'"
ereturn local  cmd      "myregress12"

ereturn display

end

mata:

void mywork( string scalar depvar,  string scalar indepvars,
string scalar touse,   string scalar constant,
string scalar vcetype, string scalar clustervar,
string scalar bname,   string scalar Vname,
string scalar nname,   string scalar rname,
string scalar dfrname)
{

real vector    y, b, e, e2, cvar, ei
real matrix    X, XpXi, M, info, xi
real scalar    n, p, k, nc, i, dfr

y    = st_data(., depvar, touse)
X    = st_data(., indepvars, touse)
n    = rows(X)

if (constant == "") {
X    = X,J(n,1,1)
}

XpXi = invsym(XpXi)
e    = y - X*b
e2   = e:^2
p    = cols(X)
k    = p - diag0cnt(XpXi)
if (vcetype == "robust") {
dfr  = n - k
V    = (n/dfr)*XpXi*M*XpXi
}
else if (vcetype == "cluster") {
cvar = st_data(., clustervar, touse)
info = panelsetup(cvar, 1)
nc   = rows(info)
M    = J(k, k, 0)
dfr  = nc - 1
for(i=1; i<=nc; i++) {
xi = panelsubmatrix(X,i,info)
ei = panelsubmatrix(e,i,info)
M  = M + xi'*(ei*ei')*xi
}
V    = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi
}
else {                 // vcetype must IID
dfr  = n - k
}

st_matrix(bname, b')
st_matrix(Vname, V)
st_numscalar(nname, n)
st_numscalar(rname, k)
st_numscalar(dfrname, dfr)

}

end


Let’s break this 118-line program into familiar pieces. Lines 2-56 define the ado-command, and lines 58-118 define the Mata work function that is used by the ado-command. Despite the addition of details to handle the parsing and computation of a robust or cluster-robust VCE, the structures of the ado-command and of the Mata work function are the same as they were in myregress11.ado; see Programming an estimation command in Stata: An OLS command using Mata.

1. Lines 5-31 parse what the user typed, identify the sample, and create temporary names for the results returned by our Mata work function.
2. Lines 33-35 call the Mata work function.
3. Lines 37-52 post the results returned by the Mata work function to e().
4. Line 54 displays the results.

The Mata function mywork() also has four parts.

1. Lines 60-65 parse the arguments.
2. Lines 68-70 declare vectors, matrices, and scalars that are local to mywork().
3. Lines 80-108 compute the results.
4. Lines 110-114 copy the computed results to Stata, using the names that were passed in the arguments.

Now, I address the details of the ado-code, although I do not discuss details in myregress12.ado, which I already covered when describing myregress11.ado in Programming an estimation command in Stata: An OLS command using Mata. Line 5 allows the user to specify the vce() option, and line 8 uses _vce_parse to parse what the user specifies. Lines 9 and 10 put the type of VCE found by _vce_parse in the local macro vce and the name of the cluster variable, if specified, in the local macro clustervar. Lines 11-13 put Robust in the local vcetype, if the specified vce is either robust or cluster. If there is a cluster variable, lines 14–23 check that it is numeric and use it to sort the data.

Line 34 passes the new arguments for the type of VCE and the name of the cluster variable to the Mata work function mywork().

Lines 49–51 store the type of VCE, the output label for the VCE type, and the name of the cluster variable in e(), respectively.

Now, I address the details of the Mata work function mywork() but only discussing what I have added to mywork() in myregress11.ado. Line 62 declares the new arguments. The string scalar vcetype is empty, or it contains “robust”, or it contains “cluster”. The string scalar clustervar is either empty or contains the name of the cluster variable.

Lines 68–70 declare the local-to-the-function vectors cvar and ei and the local-to-the-function matrices M, info, and xi that are needed now but not previously.

Lines 87, 91–92, 104–105, and 108 specify if-else blocks to compute the correct VCE. Lines 88–90 compute a robust estimator of the VCE if vcetype contains “robust”. Lines 93–103 compute a cluster-robust estimator of the VCE if vcetype contains “cluster”. Lines 106–107 compute an IID-based estimator of the VCE if vcetype contains neither “robust” nor “cluster”.

Done and undone

I introduced the undocumented command _vce_parse and discussed the code for myregress12.ado, which uses Mata to compute OLS point estimates and an IID-based VCE, a robust VCE, or a cluster-robust VCE.

The structure of the code is the same as the one that I used in myregress11.ado and in mymean8.ado, which I discussed in Programming an estimation command in Stata: An OLS command using Mata and in Programming an estimation command in Stata: A first ado-command using Mata. That the structure remains the same makes it easier to handle the details that arise in more complicated problems.

Categories: Programming Tags:

## Programming an estimation command in Stata: A map to posted entries

I have posted 15 entries to the Stata blog about programming an estimation command in Stata. They are best read in order. The comprehensive list below allows you to read them from first to last, at your own pace.

1. Programming estimators in Stata: Why you should

To help you write Stata commands that people want to use, I illustrate how Stata syntax is predictable and give an overview of the estimation-postestimation structure that you will want to emulate in your programs.

2. Programming an estimation command in Stata: Where to store your stuff

I discuss the difference between scripts and commands, and I introduce some essential programming concepts and constructions that I use to write the scripts and commands.

3. Programming an estimation command in Stata: Global macros versus local macros

I discuss a pair of examples that illustrate the differences between global macros and local macros.

4. Programming an estimation command in Stata: A first ado-command

I discuss the code for a simple estimation command to focus on the details of how to implement an estimation command. The command that I discuss estimates the mean by the sample average. I begin by reviewing the formulas and a do-file that implements them. I subsequently introduce ado-file programming and discuss two versions of the command. Along the way, I illustrate some of the postestimation features that work after the command.

5. Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects

I present the formulas for computing the ordinary least-squares (OLS) estimator, and I discuss some do-file implementations of them. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors. I introduce the Stata matrix commands and matrix functions that I use in ado-commands that I discuss in upcoming posts.

6. Programming an estimation command in Stata: A first command for OLS

I show how to write a Stata estimation command that implements the OLS estimator by explaining the code.

7. Programming an estimation command in Stata: A better OLS command

I use the syntax command to improve the command that implements the OLS estimator that I discussed in Programming an estimation command in Stata: A first command for OLS. I show how to require that all variables be numeric variables and how to make the command accept time-series operated variables.

8. Programming an estimation command in Stata: Allowing for sample restrictions and factor variables

I modify the OLS command discussed in Programming an estimation command in Stata: A better OLS command to allow for sample restrictions, to handle missing values, to allow for factor variables, and to deal with perfectly collinear variables.

9. Programming an estimation command in Stata: Allowing for options

I make three improvements to the command that implements the OLS estimator that I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables. First, I allow the user to request a robust estimator of the variance-covariance of the estimator. Second, I allow the user to suppress the constant term. Third, I store the residual degrees of freedom in e(df_r) so that test will use the t or F distribution instead of the normal or chi-squared distribution to compute the p-value of Wald tests.

10. Programming an estimation command in Stata: Using a subroutine to parse a complex option

I make two improvements to the command that implements the OLS estimator that I discussed in Programming an estimation command in Stata: Allowing for options. First, I add an option for a cluster-robust estimator of the variance -covariance of the estimator (VCE). Second, I make the command accept the modern syntax for either a robust or a cluster-robust estimator of the VCE. In the process, I use subroutines in my ado-program to facilitate the parsing, and I discuss some advanced parsing tricks.

11. Programming an estimation command in Stata: Mata 101

I introduce Mata, the matrix programming language that is part of Stata.

12. Programming an estimation command in Stata: Mata functions

I show how to write a function in Mata, the matrix programming language that is part of Stata.

13. Programming an estimation command in Stata: A first ado-command using Mata

I discuss a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrate a general structure for Stata/Mata programs.

14. Programming an estimation command in Stata: Computing OLS objects in Mata

I present the formulas for computing the OLS estimator and show how to compute them in Mata. This post is a Mata version of Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors.

15. Programming an estimation command in Stata: An OLS command using Mata

I discuss a command that computes OLS results in Mata, paying special attention to the structure of Stata programs that use Mata work functions.

16. Programming an estimation command in Stata: Adding robust and cluster-robust VCEs to our Mata based OLS command

I show how to use the undocumented command _vce_parse to parse the options for robust or cluster-robust estimators of the VCE. I then discuss myregress12.ado, which performs its computations in Mata and computes an IID-based, a robust, or a cluster-robust estimator of the VCE.

17. Programming an estimation command in Stata: A review of nonlinear optimization using Mata

I review the theory behind nonlinear optimization and get some practice in Mata programming by implementing an optimizer in Mata. This post is designed to help you develop your Mata programming skills and to improve your understanding of how the Mata optimization suites optimize() and moptimize() work.

18. Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters

I show how to use optimize() in Mata to maximize a Poisson log-likelihood function and to obtain estimators of the VCE based on IID observations or on robust methods.

19. Programming an estimation command in Stata: A poisson command using Mata

I discuss mypoisson1, which computes Poisson-regression results in Mata. The code in mypoisson1.ado is remarkably similar to the code in myregress11.ado, which computes OLS results in Mata, as I discussed in Programming an estimation command in Stata: An OLS command using Mata

Categories: Programming Tags:

## Programming an estimation command in Stata: An OLS command using Mata

I discuss a command that computes ordinary least-squares (OLS) results in Mata, paying special attention to the structure of Stata programs that use Mata work functions.

This command builds on several previous posts; at a minimum, you should be familiar with Programming an estimation command in Stata: A first ado-command using Mata and Programming an estimation command in Stata: Computing OLS objects in Mata.

An OLS command with Mata computations

The Stata command myregress11 computes the results in Mata. The syntax of the myregress11 command is

myregress11 depvar [indepvars] [if] [in] [, noconstant]

where indepvars can contain factor variables or time-series variables.

In the remainder of this post, I discuss the code for myregress11.ado. I recommend that you click on the file name to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

I do not discuss the formulas for the OLS objects. See Programming an estimation command in Stata: Computing OLS objects in Mata for the formulas and Mata implementations.

*! version 11.0.0  11Jan2016
program define myregress11, eclass sortpreserve
version 14

syntax varlist(numeric ts fv) [if] [in] [, noCONStant ]
marksample touse

gettoken depvar indepvars : varlist
_fv_check_depvar depvar'

fvexpand indepvars'
local cnames r(varlist)'

tempname b V N rank df_r

mata: mywork("depvar'", "cnames'", "touse'", "constant'", ///
"b'", "V'", "N'", "rank'", "df_r'")

if "constant'" == "" {
local cnames cnames' _cons
}

matrix colnames b' = cnames'
matrix colnames V' = cnames'
matrix rownames V' = cnames'

ereturn post b' V', esample(touse') buildfvinfo
ereturn scalar N       = N'
ereturn scalar rank    = rank'
ereturn scalar df_r    = df_r'
ereturn local  cmd     "myregress11"

ereturn display

end

mata:

void mywork( string scalar depvar,  string scalar indepvars,
string scalar touse,   string scalar constant,
string scalar bname,   string scalar Vname,
string scalar nname,   string scalar rname,
string scalar dfrname)
{

real vector y, b, e, e2
real matrix X, XpXi
real scalar n, k

y    = st_data(., depvar, touse)
X    = st_data(., indepvars, touse)
n    = rows(X)

if (constant == "") {
X    = X,J(n,1,1)
}

XpXi = invsym(XpXi)
e    = y - X*b
e2   = e:^2
k    = cols(X) - diag0cnt(XpXi)

st_matrix(bname, b')
st_matrix(Vname, V)
st_numscalar(nname, n)
st_numscalar(rname, k)
st_numscalar(dfrname, n-k)

}

end


Let’s break this 74-line program into familiar pieces to make it easier to understand. Lines 2–35 define the ado-command, and lines 37–74 define the Mata work function that is used by the ado-command. Although there are more details, I used this structure in mymean8.ado, which I discussed in Programming an estimation command in Stata: A first ado-command using Mata.

1. Lines 5–14 parse what the user typed, identify the sample, and create temporary names for the results returned by our Mata work function.
2. Lines 16-17 call the Mata work function.
3. Lines 19–31 post the results returned by the Mata work function to e().
4. Line 33 displays the results.

The Mata function mywork() also has four parts.

1. Lines 39–43 parse the arguments.
2. Lines 46–48 declare vectors, matrices, and scalars that are local to mywork().
3. Lines 54–64 compute the results.
4. Lines 66–70 copy the computed results to Stata, using the names that were passed in arguments.

Now let’s discuss the ado-code in some detail. Line 5 uses the syntax command to put the names of the variables specified by the user into the local macro varlist, to parse an optional if restriction, to parse an optional in restriction, and to parse the noconstant option. The variables specified by the user must be numeric, may contain time-series variables, and may contain factor variables. For more detailed discussions of similar syntax usages, see Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, Programming an estimation command in Stata: Allowing for options, and Programming an estimation command in Stata: Using a subroutine to parse a complex option.

Line 6 creates a temporary sample-identification variable and stores its name in the local macro touse. I discussed this usage in detail in the section surrounding code block 1 in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables.

Line 8 uses gettoken to store the name of the dependent variable in the local macro depvar and the names of the independent variables in the local macro indepvars. Line 9 uses _fv_check_depvar to check that the name of the dependent variable is not a factor variable.

Line 11 uses fvexpand to expand the factor variables in indepvars. Line 12 puts the expanded names stored in r(varlist) by fvexpand in the local macro cnames. A single factor variable can imply more than one coefficient. fvexpand finds the canonical names for these coefficients and returns them in r(varlist). I have not used fvexpand until now because the Stata commands that I used to compute the results automatically created the coefficient names. Mata functions are designed for speed, so I must create the coefficient names when I use them.

Example 1 illustrates how one factor variable can imply more than one coefficient.

Example 1: fvexpand

. sysuse auto
(1978 Automobile Data)

. tabulate rep78

Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |          2        2.90        2.90
2 |          8       11.59       14.49
3 |         30       43.48       57.97
4 |         18       26.09       84.06
5 |         11       15.94      100.00
------------+-----------------------------------
Total |         69      100.00

. fvexpand i.rep78

. return list

macros:
r(fvops) : "true"
r(varlist) : "1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78"

. summarize 2.rep78

Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
2.rep78 |         69     .115942    .3225009          0          1


The tabulate results show that there are five levels in rep78. fvexpand finds the levels and creates a list of the names of the implied indicator variables 1b.rep78, 2.rep78, 3.rep78, 4.rep78, and 5.rep78. Comparing the results from summarize 2.rep78 and tabulate rep78 illustrates this notation. The b in 1b.rep78 identifies level 1 as the base category to be omitted when there is a constant in the model. Type help fvvarlist for more details.

Line 14 creates the temporary names for the results. For example, it stores a safe, temporary name in the local macro b that can be used for the matrix storing the point estimates. I discussed this usage in the section Using temporary names for global objects in Programming an estimation command in Stata: A first ado-command using Mata.

Lines 16 and 17 call the Mata function mywork(), which uses the information contained in the local macros depvar, cnames, touse, and constant to compute the results that are returned in the Stata objects whose names are stored in the local macros b, V, N, rank, and df_r.

Line 20 appends _cons to the local macro cnames, if the user specified the option noconstant.

Lines 23–25 put row names on the vector of point estimates and row and column names on the matrix containing the estimated variance-covariance of the estimator (VCE).

Lines 27–31 post the results to e().

Line 33 displays a standard Stata output table, using the results in e(b), e(V), and e(df_r).

Note that the local macro b created on line 14 contains a temporary name that is passed to mywork() on line 17 and that the Stata matrix whose name is contained in the local macro b is used on lines 23 and 27. mywork() puts the vector of point estimates in the Stata matrix whose name is stored in the local macro b. Also note that the local macro V created on line 14 contains a temporary name that is passed to mywork() on line 17 and that the Stata matrix whose name is contained in the local macro V is used on lines 24, 25, and 27. mywork() puts the estimated VCE in the Stata matrix whose name is stored in the local macro V.

To see how this works, let’s discuss the mywork() function in detail. Lines 39–43 declare that mywork() returns nothing, it is void, and declare that mywork() accepts nine arguments, each of which is a string scalar. The first four arguments are inputs; depvar contains the name of the independent variable, indepvars contains the names of the independent variables, touse contains the name of the sample-identification variable, and constant contains either noconstant or is empty. The values of these arguments are used on lines 50, 51, and 54 to create the vector y and the matrix X.

The last five arguments contain names used to write the results back to Stata. mywork() writes the results back to Stata using the passed-in temporary names. For example, line 17 shows that the Mata string scalar bname contains the temporary name stored in the local macro b. Line 66 copies the results stored in the transpose of the Mata vector b to a Stata matrix whose name is stored in the Mata string scalar bname. (Line 60 shows that the vector b contains the OLS point estimates.) Lines 23 and 27 then use this Stata vector whose name is stored in the local macro b. Similarly, line 17 shows that the Mata string scalar Vname contains the temporary name stored in the local macro V. Line 67 copies the results stored in the Mata matrix V to a Stata matrix whose name is stored in the Mata string scalar Vname. (Line 64 shows that V contains the estimated VCE.) Lines 24, 25, and 27 then use this Stata matrix whose name is stored in the local macro V. The arguments nname, rname, and dfrname are used analogously to return the results for the number of observations, the rank of the VCE, and the degrees of freedom of the residuals.

Lines 50–64 compute the point estimates and the VCE. Except for line 54, I discussed these computations in Programming an estimation command in Stata: A first ado-command using Mata. Line 54 causes a column of 1s to be joined to the covariate matrix X when the string scalar constant is empty. Lines 5 and 16 imply that the Mata string scalar constant contains noconstant when the user specifies the noconstant option and that it is empty otherwise.

Done and undone

I discussed the code for myregress11.ado, which uses Mata to compute OLS point estimates and a VCE that assumes independent and identically distributed observations. The structure of the code is the same as the one that I used in mymean7.ado and mymean8.ado, discussed in Programming an estimation command in Stata: A first ado-command using Mata, although there are more details in the OLS program.

Key to this structure is that the Mata work function accepts two types of arguments: the names of Stata objects that are inputs and temporary names that are used to write the results back to Stata from Mata.

In the next post, I extend myregress11.ado to allow for robust or cluster-robust estimators of the VCE.

Categories: Programming Tags:

## Programming an estimation command in Stata: Computing OLS objects in Mata


OLS formulas

Recall that the OLS point estimates are given by

$\widehat{\betab} = \left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1} \left( \sum_{i=1}^N \xb_i’y_i \right)$

where $$\xb_i$$ is the $$1\times k$$ vector of independent variables, $$y_i$$ is the dependent variable for each of the $$N$$ sample observations, and the model for $$y_i$$ is

$y_i = \xb_i\betab’ + \epsilon_i$

If the $$\epsilon_i$$ are independently and identically distributed (IID), we estimate the variance-covariance matrix of the estimator (VCE) by

$\widehat{\Vb} = \widehat{s} \left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}$

where $$\widehat{s} = 1/(N-k)\sum_{i=1}^N e_i^2$$ and $$e_i=y_i-\xb_i\widehat{\betab}$$. See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for introductions to OLS.

Mata implementation

I compute the OLS point estimates in Mata in example 1.

Example 1: Computing OLS point estimates in Mata

. sysuse auto
(1978 Automobile Data)

. mata:
------------------------------------------------- mata (type end to exit) ------
: y    = st_data(., "price")

: X    = st_data(., "mpg trunk")

: n    = rows(X)

: X    = X,J(n,1,1)

: XpXi = invsym(XpX)

: end
--------------------------------------------------------------------------------


I used st_data() to put a copy of the observations on price into the Mata vector y and to put a copy of the observations on mpg and trunk into the Mata matrix X. I used rows(X) to put the number of observations into n. After adding a column of ones onto X for the constant term, I used quadcross() to calculate $$\Xb’\Xb$$ in quad precision. After using invsym() to calculate the inverse of the symmetric matrix XpXi, I calculated the point estimates from the OLS formula.

In example 1, I computed the OLS point estimates after forming the cross products. As discussed in Lange (2010, chapter 7), I could compute more accurate estimates using a QR decomposition; type help mf_qrd for details about computing QR decompositions in Mata. By computing the cross products in quad precision, I obtained point estimates that are almost as accurate as those obtainable from a QR decomposition in double precision, but that is a topic for another post.

Here are the point estimates I computed in Mata and comparable results from regress.

Example 2: Results from Mata and regress

. mata: b'
1              2              3
+----------------------------------------------+
1 |  -220.1648801    43.55851009    10254.94983  |
+----------------------------------------------+

. regress price mpg trunk

Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     10.14
Model |   141126459         2  70563229.4   Prob > F        =    0.0001
Residual |   493938937        71  6956886.44   R-squared       =    0.2222
Total |   635065396        73  8699525.97   Root MSE        =    2637.6

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |  -220.1649   65.59262    -3.36   0.001    -350.9529    -89.3769
trunk |   43.55851   88.71884     0.49   0.625    -133.3418    220.4589
_cons |   10254.95   2349.084     4.37   0.000      5571.01    14938.89
------------------------------------------------------------------------------


Given the OLS point estimates, I can now compute the IID estimator of the VCE.

Example 3: Computing the IID VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: e    = y - X*b

: e2   = e:^2

: k    = cols(X)

: sqrt(diagonal(V))'
1             2             3
+-------------------------------------------+
1 |  65.59262431   88.71884015    2349.08381  |
+-------------------------------------------+

: end
--------------------------------------------------------------------------------


I put the residuals into the Mata vector e, which I subsequently element-wise squared. I used cols(X) to put the number of covariates into k. I used quadsum() to compute the sum of the squared residuals in quad precision when computing V, an IID estimator for the VCE. The standard errors displayed by sqrt(diagonal(V)) are the same as the ones displayed by regress in example 2.

Robust standard errors

The frequently used robust estimator of the VCE is given by

$\widehat{V}_{robust}=\frac{N}{N-k} \left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1} \Mb \left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}$

where
$\Mb=\sum_{i=1}^N \widehat{e}_i^2\xb_i’\xb_i$

See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for derivations and discussions.

Example 4 implements this estimator in Mata.

Example 4: A robust VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: M    = quadcross(X, e2, X)

: V    = (n/(n-k))*XpXi*M*XpXi

: sqrt(diagonal(V))'
1             2             3
+-------------------------------------------+
1 |  72.45387946   71.45370224   2430.640607  |
+-------------------------------------------+

: end
--------------------------------------------------------------------------------


Using quadcross(X, e2, X) to compute M is more accurate and faster than looping over the observations. The accuracy comes from the quad precision offered by quadcross(). The speed comes from performing the loops in compiled C code instead of compiled Mata code. Mata is fast but C is faster, because C imposes much more structure and because C is compiled using much more platform-specific information than Mata.

quadcross() is also faster because it has been parallelized, like many Mata functions. For example, a call to quadcross() from Stata/MP with 2 processors will run about twice as fast as a call to quadcross() from Stata/SE when there are many rows in X. A detailed discussion of the performance increases offered by Mata in Stata/MP is a subject for another post.

I now verify that my computations match those reported by regress.

Example 5: Comparing computations of robust VCE

. regress price mpg trunk, vce(robust)

Linear regression                               Number of obs     =         74
F(2, 71)          =      11.59
Prob > F          =     0.0000
R-squared         =     0.2222
Root MSE          =     2637.6

------------------------------------------------------------------------------
|               Robust
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |  -220.1649   72.45388    -3.04   0.003    -364.6338   -75.69595
trunk |   43.55851    71.4537     0.61   0.544    -98.91613    186.0331
_cons |   10254.95   2430.641     4.22   0.000      5408.39    15101.51
------------------------------------------------------------------------------


Cluster-robust standard errors

The cluster-robust estimator of the VCE is frequently used when the data have a group structure, also known as a panel structure or as a longitudinal structure. This VCE accounts for the within-group correlation of the errors, and it is given by

$\widehat{V}_{cluster}=\frac{N-1}{N-k}\frac{g}{g-1} \left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1} \Mb_c \left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}$

where
$\Mb_c=\sum_{j=1}^g \Xb_j’ (\widehat{\eb}_j \widehat{\eb}_j’) \Xb_j$

$$\Xb_j$$ is the $$n_j\times k$$ matrix of observations on $$\xb_i$$ in group $$j$$, $$\widehat{\eb}_j$$ is the $$n_j\times 1$$ vector of residuals in group $$j$$, and $$g$$ is the number of groups. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.

Computing $$\Mb_c$$ requires sorting the data by group. I use rep78, with the missing values replaced by 6, as the group variable in my example. In example 6, I sort the dataset in Stata, put a copy of the observations on the modified rep78 into the column vector id, and recompute the OLS objects that I need. I could have sorted the dataset in Mata, but I usually sort it in Stata, so that is what I illustrated. Type help mf_sort for sorting in Mata. In a real program, I would not need to recompute everything. I do here because I did not want to discuss the group variable or sorting the dataset until I discussed cluster-robust standard errors.

Example 6: Setup for computing M

. replace rep78=6 if missing(rep78)

. sort rep78

. mata:
------------------------------------------------- mata (type end to exit) ------
: id   = st_data(., "rep78")

: y    = st_data(., "price")

: X    = st_data(., "mpg trunk")

: n    = rows(X)

: X    = X,J(n,1,1)

: k    = cols(X)

: XpXi = invsym(XpX)

: e    = y - X*b

: end
--------------------------------------------------------------------------------


The Mata function panelsetup(Q,p) returns a matrix describing the group structure of the data when Q is sorted by the group variable in column p. I illustrate this function in example 7.

Example 7: panelsetup()

. list rep78 if rep78<3, sepby(rep78)

+-------+
| rep78 |
|-------|
1. |     1 |
2. |     1 |
|-------|
3. |     2 |
4. |     2 |
5. |     2 |
6. |     2 |
7. |     2 |
8. |     2 |
9. |     2 |
10. |     2 |
+-------+

. mata:
------------------------------------------------- mata (type end to exit) ------
: info = panelsetup(id, 1)

: info
1    2
+-----------+
1 |   1    2  |
2 |   3   10  |
3 |  11   40  |
4 |  41   58  |
5 |  59   69  |
6 |  70   74  |
+-----------+

: end
--------------------------------------------------------------------------------


I begin by listing out the group variable, rep78, in Stata for the first two groups. I then use panelsetup() to create info, which has one row for each group with the first column containing the first row of that group and the second column containing the second row of that group. I display info to illustrate what it contains. The first row of info specifies that the first group starts in row 1 and ends in row 2, which matches the results produced by list. The second row of info specifies that the second group starts in row 3 and ends in row 10, which also matches the results produced by list.

Having created info, I can use it and the panelsubmatrix() to compute $$\Mb_c$$.

Example 8: A cluster-robust VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: nc   = rows(info)

: M    = J(k, k, 0)

: for(i=1; i<=nc; i++) {
>     xi = panelsubmatrix(X,i,info)
>     ei = panelsubmatrix(e,i,info)
>     M  = M + xi'*(ei*ei')*xi
> }

: V    = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi

: sqrt(diagonal(V))'
1             2             3
+-------------------------------------------+
1 |  93.28127184   58.89644366   2448.547376  |
+-------------------------------------------+

: end
--------------------------------------------------------------------------------


After storing the number of groups in nc, I created an initial M to be a k $$\times$$ k matrix of zeros. For each group, I used panelsubmatrix() to extract the covariate for that group from X, I used panelsubmatrix() to extract the residuals for that group from e, and I added that group's contribution into M. After looping over the groups, I computed V and displayed the standard errors.

I now verify that my computations match those reported by regress.

Example 9: Comparing computations of cluster-robust VCE

. regress price mpg trunk, vce(cluster rep78)

Linear regression                               Number of obs     =         74
F(2, 5)           =       9.54
Prob > F          =     0.0196
R-squared         =     0.2222
Root MSE          =     2637.6

(Std. Err. adjusted for 6 clusters in rep78)
------------------------------------------------------------------------------
|               Robust
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |  -220.1649   93.28127    -2.36   0.065     -459.952    19.62226
trunk |   43.55851   58.89644     0.74   0.493    -107.8396    194.9566
_cons |   10254.95   2448.547     4.19   0.009     3960.758    16549.14
------------------------------------------------------------------------------


Done and undone

I reviewed the formulas that underlie the OLS estimator and showed how to compute them in Mata. In the next two posts, I write an ado-command that implements these formulas.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.

Lange, K. 2010. Numerical Analysis for Statisticians. 2nd ed. New York: Springer.

Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.

Categories: Programming Tags:

## Programming an estimation command in Stata: A first ado-command using Mata

I discuss a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrate a general structure for Stata/Mata programs. This post builds on Programming an estimation command in Stata: Mata 101, Programming an estimation command in Stata: Mata functions, and Programming an estimation command in Stata: A first ado command.

I begin by reviewing the structure in mymean5.ado, which I discussed in Programming an estimation command in Stata: A first ado command.

*! version 5.0.0 20Oct2015
program define mymean5, eclass
version 14

syntax varlist(max=1)

tempvar e2
tempname b V
quietly summarize varlist'
local sum            = r(sum)
local N              = r(N)
matrix b'           = (1/N')*sum'
generate double e2' = (varlist' - b'[1,1])^2
quietly summarize e2'
matrix V'           = (1/((N')*(N'-1)))*r(sum)
matrix colnames b'  = varlist'
matrix colnames V'  = varlist'
matrix rownames V'  = varlist'
ereturn post b' V'
ereturn display
end


The syntax command on line 5 stores the name of the variable for which the command estimates the mean. The tempvar and tempname commands on lines 7 and 8 put temporary names into the local macros e2, b, and V. Lines 9-15 compute the point estimates and the estimated variance-covariance of the estimator (VCE), using the temporary names for objects, so as not to overwrite user-created objects. Lines 16–18 put the column name on the point estimate and row and column names on the estimated VCE. Line 19 posts the point estimate and the estimated VCE to e(b) and e(V), respectively. Line 20 produces a standard output table from the information stored in e(b) and e(V).

By the end of this post, I will have a command that replaces the Stata computations on lines 9–15 with Mata computations. To illustrate the structure of Stata-Mata programming, I start off only computing the point estimate in myregress6.

*! version 6.0.0 05Dec2015
program define mymean6, eclass
version 14

syntax varlist(max=1)

mata: x  = st_data(., "varlist'")
mata: w  = mean(x)
mata: st_matrix("Q", w)

display "The point estimates are in Q"
matrix list Q

end


Line 7 executes a one-line call to Mata; in this construction, Stata drops down to Mata, executes the Mata expression, and pops back up to Stata. Popping down to Mata and back up to Stata takes almost no time, but I would like to avoid doing it three times. (Lines 8 and 9 are also one-line calls to Mata.)

Line 7 puts a copy of all the observations on the variable for which the command estimates the mean in the Mata column vector named x, which is stored in global Mata memory. Line 8 stores the mean of the column vector in the 1$$\times$$1 matrix named w, which is also stored in global Mata memory. Line 9 copies the Mata matrix w to the Stata matrix named Q. Lines 11 and 12 display the results stored in Stata.

I illustrate what myregress6 produces in example 1.

Example 1: myregress6 uses global Mata memory

. sysuse auto
(1978 Automobile Data)

. mymean6 price
The point estimates are in Q

symmetric Q[1,1]
c1
r1  6165.2568

. matrix dir
Q[1,1]

. mata: mata describe

# bytes   type                        name and extent
-------------------------------------------------------------------------------
8   real scalar                 w
592   real colvector              x[74]
-------------------------------------------------------------------------------


I use matrix dir to illustrate that Q is a Stata matrix, and I use mata describe to illustrate that x and w are objects in global Mata memory. Using fixed names for an object in Stata memory or in global Mata memory should be avoided, because you can overwrite users’ data.

mymean7 does not put anything in global Mata memory; all computations are done using objects that are local to the Mata function mymean_work(). mymean7 uses temporary names for objects stored in Stata memory.

*! version 7.0.0 07Dec2015
program define mymean7, eclass
version 14

syntax varlist(max=1)

tempname b
mata: mymean_work("varlist'", "b'")

display "b is "
matrix list b'
end

mata:
void mymean_work(string scalar vname, string scalar mname)
{
real vector    x
real scalar    w

x  = st_data(., vname)
w  = mean(x)
st_matrix(mname, w)
}
end


There are two parts to mymean7.ado: an ado-program and a Mata function. The ado-program is defined on lines 2–12. The Mata function mymean_work() is defined on lines 14–24. The Mata function mymean_work() is local to the ado-program mymean7.

Line 8 uses a one-line call to Mata to execute mymean_work(). mymean_work() does not return anything to global Mata memory, and we are passing in two arguments. The first argument is a string scalar containing the name of the variable for which the function should compute the estimate. The second argument is a string scalar containing the temporary name stored in the local macro b. This temporary name will be the name of a Stata matrix that stores the point estimate computed in mymean_work().

Line 15 declares the function mymean_work(). Function declarations specify what the function returns, the name of the function, and the arguments that the function accepts; see Programming an estimation command in Stata: Mata functions for a quick introduction.

The word void on line 15 specifies that the function does not return an argument; in other words, it returns nothing. What precedes the ( is the function name; thus, mymean_work() is the name of the function. The words string scalar vname specify that the first argument of mymean_work() is a string scalar that is known as vname inside mymean_work(). The comma separates the first argument from the second argument. The words string scalar mname specify that the second argument of mymean_work() is a string scalar that is known as mname inside mymean_work(). ) closes the function declaration.

Lines 17-22 define mymean_work() because they are enclosed between the curly braces on lines 16 and 23. Lines 17 and 18 declare the real vector x and real scalar w, which are local to mymean_work(). Lines 20 and 21 compute the estimate. Line 22 copies the estimate stored in the scalar w, which is local to the Mata function mymean_work(), to the Stata matrix whose name is stored in the string scalar mname, which contains the temporary name contained in the local macro b that was passed to the function on line 8.

The structure used in mymean7 ensures three important features.

1. It does not use global Mata memory.
2. It uses temporary names for global Stata objects.
3. It leaves nothing behind in Mata memory or Stata memory.

Examples 2 and 3 combine to illustrate feature (3); example 2 clears Stata and Mata memory, and example 3 shows that mymean7 leaves nothing in the previously cleared memory.

Example 2: Removing objects from Stata and Mata memory

. clear all

. matrix dir

. mata: mata describe

# bytes   type                        name and extent
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------


In detail, I use clear all to drop all objects from Stata and Mata memory, use matrix dir to illustrate that no matrices were left in Stata memory, and use mata describe to illustrate that no objects were left in Mata memory.

Example 3: mymean7 leaves nothing in Stata or Mata memory

. sysuse auto
(1978 Automobile Data)

. mymean7 price
b is

symmetric __000000[1,1]
c1
r1  6165.2568

. matrix dir

. mata: mata describe

# bytes   type                        name and extent
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------


In example 3, I use mymean7 to estimate the mean, and use matrix dir and mata describe to illustrate that mymean7 did not leave Stata matrices or Mata objects in memory. The output also illustrates that the temporary name __000000 was used for the Stata matrix that held the result before the ado-program terminated.

While it is good that mymean7 leaves nothing in global Stata or Mata memory, it is bad that mymean7 does not leave the estimate behind somewhere, like in e().

mymean8 stores the results in e() and has the features of mymean5, but computes its results in Mata.

*! version 8.0.0 07Dec2015
program define mymean8, eclass
version 14

syntax varlist(max=1)

tempname b V
mata: mymean_work("varlist'", "b'", "V'")
matrix colnames b'  = varlist'
matrix colnames V'  = varlist'
matrix rownames V'  = varlist'
ereturn post b' V'
ereturn display
end

mata:
void mymean_work(                  ///
string scalar vname,     ///
string scalar mname,     ///
string scalar vcename)
{
real vector    x, e2
real scalar    w, n, v

x  = st_data(., vname)
n  = rows(x)
w  = mean(x)
e2 = (x :- w):^2
v   = (1/(n*(n-1)))*sum(e2)
st_matrix(mname,   w)
st_matrix(vcename, v)
}
end


Line 8 is a one-line call to mymean_work(), which now has three arguments: the name of the variable whose mean is to be estimated, a temporary name for the Stata matrix that will hold the point estimate, and a temporary name for the Stata matrix that will hold the estimated VCE. The declaration mymean_work() on lines 17-20 has been adjusted accordingly; each of the three arguments is a string scalar. Lines 22 and 23 declare objects local to mymean_work(). Lines 25-29 compute the mean and the estimated VCE. Lines 30 and 31 copy these results to Stata matrices, under the temporary names in the second and third arguments.

There is a logic to the order of the arguments in mymean_work(); the first argument is the name of an input, the second and third arguments are temporary names for the outputs.

Returning to the ado-code, we see that lines 9–11 put row or column names on the point estimate or the estimated VCE. Line 12 posts the results to e(), which are displayed by line 13.

Example 4 illustrates that mymean8 produces the same point estimate and standard error as produced by mean.

Example 4: Comparing mymean8 and mean

. mymean8 price
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
price |   6165.257   342.8719    17.98   0.000      5493.24    6837.273
------------------------------------------------------------------------------

. mean price

Mean estimation                   Number of obs   =         74

--------------------------------------------------------------
|       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
price |   6165.257   342.8719      5481.914      6848.6
--------------------------------------------------------------


The confidence intervals produced by mymean8 differ from those produced by mean because mymean8 uses a normal distribution while mean uses a $$t$$ distribution. The mymean_work() in mymean9 uses a fourth argument to remove this difference.

*! version 9.0.0 07Dec2015
program define mymean9, eclass
version 14

syntax varlist(max=1)

tempname b V dfr
mata: mymean_work("varlist'", "b'", "V'", "dfr'")
matrix colnames b'  = varlist'
matrix colnames V'  = varlist'
matrix rownames V'  = varlist'
ereturn post b' V'
ereturn scalar df_r  = dfr'
ereturn display
end

mata:
void mymean_work(                  ///
string scalar vname,     ///
string scalar mname,     ///
string scalar vcename,   ///
string scalar dfrname)
{
real vector    x, e2
real scalar    w, n, v

x  = st_data(., vname)
n  = rows(x)
w  = mean(x)
e2 = (x :- w):^2
v   = (1/(n*(n-1)))*sum(e2)
st_matrix(mname,   w)
st_matrix(vcename, v)
st_numscalar(dfrname, n-1)
}
end


On line 8, mymean_work() accepts four arguments. The fourth argument is new; it contains the temporary name that is used for the Stata scalar that holds the residual degrees of freedom. Line 34 copies the value of the expression n-1 to the Stata numeric scalar whose name is stored in the string scalar dfrname; this Stata scalar now contains the residual degrees of freedom. Line 13 stores the residual degrees of freedom in e(df_r), which causes ereturn display to use a $$t$$ distribution instead of a normal distribution.

Example 5: mymean9 uses a $t$ distribution

. mymean9 price
------------------------------------------------------------------------------
|      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
price |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
------------------------------------------------------------------------------


mymean9 has five basic parts.

1. It parses the user input.
2. It uses a one-line call to a Mata work routine to compute results and to store these results in Stata matrices whose temporary names are passed to the Mata work routine.
3. It puts on the column and row names and stores the results in e().
4. It displays the results.
5. It defines the Mata work routine after the end that terminates the definition of the ado-program.

This structure can accommodate any estimator whose results we can store in e(). The details of each part become increasingly complicated, but the structure remains the same. In future posts, I discuss Stata/Mata programs with this structure that implement the ordinary-least squares (OLS) estimator and the Poisson quasi-maximum-likelihood estimator.

Done and undone

I discussed a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrated a general structure for Stata/Mata programs.

In the next post, I show some Mata computations that produce the point estimates, an IID VCE, a robust VCE, and a cluster-robust VCE for the OLS estimator.

Categories: Programming Tags:

## Programming an estimation command in Stata: Mata functions

I show how to write a function in Mata, the matrix programming language that is part of Stata. This post uses concepts introduced in Programming an estimation command in Stata: Mata 101.

Mata functions

Commands do work in Stata. Functions do work in Mata. Commands operate on Stata objects, like variables, and users specify options to alter the behavior. Mata functions accept arguments, operate on the arguments, and may return a result or alter the value of an argument to contain a result.

mata:
{
A = X + Y
return(A)
}
end


myadd() accepts two arguments, X and Y, puts the sum of X and Y into A, and returns A. For example,

Example 1: Defining and using a function

. mata:
------------------------------------------------- mata (type end to exit) ------
> {
>     A = X + Y
>     return(A)
> }

: C = J(3, 3, 4)

: D = I(3)

: W
[symmetric]
1   2   3
+-------------+
1 |  5          |
2 |  4   5      |
3 |  4   4   5  |
+-------------+

: end
--------------------------------------------------------------------------------


After defining myadd(), I create the matrices C and D, and I pass C and D into myadd(), which returns their sum. Mata assigns the returned sum to W, which I display. Note that inside the function myadd(), C and D are respectively known as X and Y.

The A created in myadd() can only be accessed inside myadd(), and it does not conflict with an A defined outside myadd(); that is, A is local to the function myadd(). I now illustrate that A is local to myadd().

Example 2: A is local to myadd()

. mata:
------------------------------------------------- mata (type end to exit) ------
: A = J(3, 3, 4)

: A
[symmetric]
1   2   3
+-------------+
1 |  4          |
2 |  4   4      |
3 |  4   4   4  |
+-------------+

: A
[symmetric]
1   2   3
+-------------+
1 |  4          |
2 |  4   4      |
3 |  4   4   4  |
+-------------+

: end
--------------------------------------------------------------------------------


Having illustrated that the A defined inside myadd() is local to myadd(), I should point out that the C and D matrices I defined in example 1 are in global Mata memory. As in ado-programs, we do not want to use fixed names in global Mata memory in our programs so that we do not destroy the users’ data. Fortunately, this problem is easily solved by writing Mata functions that write their results out to Stata and do not return results. I will provide detailed discussions of this solution in the commands that I develop in subsequent posts.

When a Mata function changes the value of an argument inside the function, that changes the value of that argument outside the function; in other words, arguments are passed by address. Mata functions can compute more than one result by storing these results in arguments. For example, sumproduct() returns both the sum and the element-wise product of two matrices.

Code block 2: sumproduct()

function sumproduct(X, Y, S, P)
{
S = X +  Y
P = X :* Y
return
}


sumproduct() returns the sum of the arguments X and Y in the argument S and the element-wise product in P.

Example 3: Returning results in arguments

. mata:
------------------------------------------------- mata (type end to exit) ------
: function sumproduct(X, Y, S, P)
> {
>         S = X +  Y
>         P = X :* Y
>         return
> }

: A = I(3)

: B = rowshape(1::9, 3)

: A
[symmetric]
1   2   3
+-------------+
1 |  1          |
2 |  0   1      |
3 |  0   0   1  |
+-------------+

: B
1   2   3
+-------------+
1 |  1   2   3  |
2 |  4   5   6  |
3 |  7   8   9  |
+-------------+

: W=.

: W
.

: Z=.

: Z
.

: sumproduct(A, B, W, Z)

: W
1    2    3
+----------------+
1 |   2    2    3  |
2 |   4    6    6  |
3 |   7    8   10  |
+----------------+

: Z
[symmetric]
1   2   3
+-------------+
1 |  1          |
2 |  0   5      |
3 |  0   0   9  |
+-------------+

: end
--------------------------------------------------------------------------------


After defining sumproduct(), I use I() to create A and use rowshape() to create B. I then create W and Z; each is a scalar missing value. I must create W and Z before I pass them as arguments; otherwise, I would be referencing arguments that do not exist. After calling sumproduct(), I display W and Z to illustrate that they now contain the sum and element-wise product of X and Y.

In myadd() and sumproduct(), I did not specify what type of thing each argument must be, nor did I specify what type of thing each function would return. In other words, I used implicit declarations. Implicit declarations are easier to type than explicit declarations, but they make error messages and code less informative. I highly recommend explicitly declaring returns, arguments, and local variables to make your code and error messages more readable.

myadd2() is a version of myadd() that explicitly declares the type of thing returned, the type of thing that each argument must be, and the type that each local-to-the-function thing must be.

Code block 3: myadd2(): Explicit declarations

mata:
numeric matrix myadd2(numeric matrix X, numeric matrix Y)
{
numeric matrix A

A = X + Y
return(A)
}
end


myadd2() returns a numeric matrix that it constructs by adding the numeric matrix X to the numeric matrix Y. The local-to-the-function object A is also a numeric matrix. A numeric matrix is either a real matrix or a complex matrix; it cannot be a string matrix.

Explicitly declaring returns, arguments, and local variables makes the code more informative. I immediately see that myadd2() does not work with string matrices, but this property is buried in the code for myadd().

I cannot sufficiently stress the importance of writing easy-to-read code. Reading other people’s code is an essential part of programming. It is instructional, and it allows you to adopt the solutions that others have found or implemented. If you are new to programming, you may not yet realize that after a few months, reading your own code is like reading someone else’s code. Even if you never give your code to anyone else, it is essential that you write easy-to-read code so that you can read it at a later date.

Explicit declarations also make some error messages easier to track down. In examples 4 and 5, I pass a string matrix to myadd() and to myadd2(), respectively.

Example 4: Passing a string matrix to myadd()

. mata:
------------------------------------------------- mata (type end to exit) ------
: B = I(3)

: C = J(3,3,"hello")

:     -  function returned error
(0 lines skipped)
--------------------------------------------------------------------------------
r(3250);


Example 5: Passing a string matrix to myadd2()

. mata:
------------------------------------------------- mata (type end to exit) ------
: B = I(3)

: C = J(3,3,"hello")

: numeric matrix myadd2(numeric matrix X, numeric matrix Y)
> {
>     numeric matrix A
>
>     A = X + Y
>     return(A)
> }

myadd2():  3251  C[3,3] found where real or complex required
:     -  function returned error
(0 lines skipped)
--------------------------------------------------------------------------------
r(3251);

end of do-file


The error message in example 4 indicates that somewhere in myadd(), an operator or a function could not perform something on two objects because their types were not compatible. Do not be deluded by the simplicity of myadd(). Tracking down a type mismatch in real code can be difficult.

In contrast, the error message in example 5 says that the matrix C we passed to myadd2() is neither a real nor a complex matrix like the argument of myadd2() requires. Looking at the code and the error message immediately informs me that the problem is that I passed a string matrix to a function that requires a numeric matrix.

Explicit declarations are so highly recommended that Mata has a setting to require it, as illustrated below.

Example 6: Turning on matastrict

. mata: mata set matastrict on


Setting matastrict to on causes the Mata compiler to require that return and local variables be explicitly declared. By default, matastrict is set to off, in which case return and local variables may be implicitly declared.

When matastrict is set to on, arguments are not required to be explicitly declared because some arguments hold results whose input and output types could differ. Consider makereal() defined and used in example 7.

Example 7: Changing an arguments type

. mata:
------------------------------------------------- mata (type end to exit) ------
: void makereal(A)
> {
>         A = substr(A, 11,1)
>         A = strtoreal(A)
> }

: A = J(2,2, "Number is 4")

: A
[symmetric]
1             2
+-----------------------------+
1 |  Number is 4                |
2 |  Number is 4   Number is 4  |
+-----------------------------+

: makereal(A)

: A + I(2)
[symmetric]
1   2
+---------+
1 |  5      |
2 |  4   5  |
+---------+

: end
--------------------------------------------------------------------------------


The declaration of makereal() specifies that makereal() returns nothing because void comes before the name of the function. Even though matastrict is set to on, I did not declare what type of thing A must be. The two executable lines of makereal() clarify that A must be a string on input and that A will be real on output, which I subsequently illustrate.

I use the feature that arguments may be implicitly declared to make my code easier to read. Many of the Mata functions that I write replace arguments with results. I explicitly declare arguments that are inputs, and I implicitly declare arguments that contain outputs. Consider sumproduct2().

Code block 4: sumproduct2(): Explicit declarations of inputs but not outputs

void sumproduct2(real matrix X, real matrix Y, S, P)
{
S = X +  Y
P = X :* Y
return
}


sumproduct2() returns nothing because void comes before the function name. The first argument X is real matrix, the second argument Y is a real matrix, the third argument S is implicitly declared, and the fourth argument P is implicitly declared. My coding convention that inputs are explicitly declared and that outputs are implicitly declared immediately informs me that X and Y are inputs but that S and P are outputs. That X and Y are inputs and that S and P are outputs is illustrated in example 8.

Example 8: Explicitly declaring inputs but not outputs

. mata:
------------------------------------------------- mata (type end to exit) ------
: void sumproduct2(real matrix X, real matrix Y, S, P)
> {
>         S = X +  Y
>         P = X :* Y
>         return
> }

: A = I(2)

: B = rowshape(1::4, 2)

: C = .

: D = .

: sumproduct2(A, B, C, D)

: C
1   2
+---------+
1 |  2   2  |
2 |  3   5  |
+---------+

: D
[symmetric]
1   2
+---------+
1 |  1      |
2 |  0   4  |
+---------+

: end
--------------------------------------------------------------------------------


Done and undone

I showed how to write a function in Mata and discussed declarations in some detail. Type help m2_declarations for many more details.

In my next post, I use Mata functions to perform the computations for a simple estimation command.

Categories: Programming Tags:

## Programming an estimation command in Stata: Mata 101

I introduce Mata, the matrix programming language that is part of Stata.

Meeting Mata

Mata is a matrix programming language that is part of Stata. Mata code is fast because it is compiled to object code that runs on a virtual machine; type help m1_how for details.

The easiest way to learn Mata is to use it. I begin with an interactive session. (You might find it useful to type along.)

Example 1: A first interactive Mata session

. mata:
------------------------------------------------- mata (type end to exit) ------
: X = J(3, 4, 5)

: X
1   2   3   4
+-----------------+
1 |  5   5   5   5  |
2 |  5   5   5   5  |
3 |  5   5   5   5  |
+-----------------+

: w = (1::4)

: w
1
+-----+
1 |  1  |
2 |  2  |
3 |  3  |
4 |  4  |
+-----+

: v = X*w

: v
1
+------+
1 |  50  |
2 |  50  |
3 |  50  |
+------+

: v'
1    2    3
+----------------+
1 |  50   50   50  |
+----------------+

: end
--------------------------------------------------------------------------------


Typing mata: causes Stata to drop down to a Mata session. Typing end ends the Mata session, thereby popping back up to Stata. The dot prompt . is Stata asking for something to do. After you type mata:, the colon prompt : is the Mata compiler asking for something to do.

Typing X = J(3, 4, 5) at the colon prompt causes Mata to compile and execute this code. J(r, c, v) is the Mata function that creates an r$$\times$$c matrix, each of whose elements is v. The expression on the right-hand side of the assignment operator = is assigned to the symbol on the left-hand side.

Typing X by itself causes Mata to display what X contains, which is a 3$$\times$$4 matrix of 5s. Unassigned expressions display their results. Type help m2_exp for details about expressions.

Typing w = (1::4) causes Mata to use the column range operator to create the 4$$\times$$1 column vector that was assigned to w and displayed when I typed w by itself. Type help m2_op_range for details and a discussion of the row range operator.

Typing v = X*w causes Mata to assign the matrix product of X times w to v, which I subsequently display. I then illustrate that is the transpose operator. Type help m2_exp, marker(remarks7) for a list of operators.

Again, typing end ends the Mata session.

In almost all the work I do, I extract submatrices from a matrix.

Example 2: Extracting submatrices from a matrix

. mata:
------------------------------------------------- mata (type end to exit) ------
: rseed(1234)

: W = runiform(4,4)

: W
1             2             3             4
+---------------------------------------------------------+
1 |  .9472316166   .0522233748   .9743182755   .9457483679  |
2 |  .1856478315   .9487333737   .8825376215   .9440776079  |
3 |  .0894258515   .7505444902   .9484983174   .1121626508  |
4 |  .4809064012   .9763447517   .1254975307   .7655025515  |
+---------------------------------------------------------+

: v = (2, 4)

: u = (1\ 3)

: v
1   2
+---------+
1 |  2   4  |
+---------+

: u
1
+-----+
1 |  1  |
2 |  3  |
+-----+

: W[u, v]
1             2
+-----------------------------+
1 |  .0522233748   .9457483679  |
2 |  .7505444902   .1121626508  |
+-----------------------------+

: W[| 1,1 \ 3,3 |]
1             2             3
+-------------------------------------------+
1 |  .9472316166   .0522233748   .9743182755  |
2 |  .1856478315   .9487333737   .8825376215  |
3 |  .0894258515   .7505444902   .9484983174  |
+-------------------------------------------+

: end
--------------------------------------------------------------------------------


I use rseed() to set the seed for the random-number generator and then use runiform(r,c) to create a 4$$\times$$4 matrix uniform deviates, which I subsequently display.

Next, I use the row-join operator , to create the row vector v and I use the column-join operator \ to create the column vector u. Type help m2_op_join for details.

Typing W[u,v] extracts from W the rows specified in the vector u and the columns specified in the vector v.

I frequently extract rectangular blocks defined by a top-left element and a bottom-right element. I illustrate this syntax by typing

W[| 1,1 \ 3,3 |]

In detail, [| opens a range-subscript extraction, 1,1 is the address of the top-left element, \ separates the top-left element from the bottom-right element, 3,3 is the address of the bottom-right element, and |] closes a range-subscript extraction. Type help m2_subscripts for details.

Ironically, when I am doing matrix programming, I frequently want the element-by-element operator instead of the matrix operator. Preface any matrix operator in Mata with a colon (:) to obtain the element-by-element equivalent.

Example 3: Element-wise operators

. mata:
------------------------------------------------- mata (type end to exit) ------
: W = W[| 2,1 \ 4,4 |]

: W
1             2             3             4
+---------------------------------------------------------+
1 |  .1856478315   .9487333737   .8825376215   .9440776079  |
2 |  .0894258515   .7505444902   .9484983174   .1121626508  |
3 |  .4809064012   .9763447517   .1254975307   .7655025515  |
+---------------------------------------------------------+

: v = .1*(4::6)

: v
1
+------+
1 |  .4  |
2 |  .5  |
3 |  .6  |
+------+

: v:*W
1             2             3             4
+---------------------------------------------------------+
1 |  .0742591326   .3794933495   .3530150486   .3776310432  |
2 |  .0447129257   .3752722451   .4742491587   .0560813254  |
3 |  .2885438407    .585806851   .0752985184   .4593015309  |
+---------------------------------------------------------+

: v'*W
1             2             3             4
+---------------------------------------------------------+
1 |  .4075158991   1.340572446   .9025627257   .8930138994  |
+---------------------------------------------------------+

: end
--------------------------------------------------------------------------------


I extract the bottom four rows of W, store this matrix in W, and display this new W. I then create a row-wise conformable vector v, perform element-wise multiplication of v across the columns of W, and display the result. I cannot type v*W because the 3$$\times$$1 v is not conformable with the 3$$\times$$3 W. But I can, and do, type v’*W because the 1$$\times$$3 v’ is conformable with the 3$$\times$$3 W.

Example 4 uses an element-wise logical operator.

Example 4: Element-wise logical operator

. mata:
------------------------------------------------- mata (type end to exit) ------
: W :< v
1   2   3   4
+-----------------+
1 |  1   0   0   0  |
2 |  1   0   0   1  |
3 |  1   0   1   0  |
+-----------------+

: end
--------------------------------------------------------------------------------


I display the result of comparing the element-wise conformable v with W. Type help m2_op_colon for details.

Stata data in Mata

The Mata function st_data() creates a Mata matrix containing a copy of the data from the Stata dataset in memory. The Mata function st_view() creates a Mata view of the data in the Stata dataset in memory. Views act like matrices, but there is a speed-space tradeoff. Copies are fast at the cost of using twice as much memory. Views are slower, but they use little extra memory.

Copying the data from Stata into Mata doubles the memory used, but the values are stored in Mata memory. Every time a Mata function asks for a value from a matrix, it finds it immediately. In contrast, a view of the data in Stata barely increases the memory used, but the values are in Stata memory. Every time a Mata function asks for a value from a view, it finds a sign telling it where in Stata to get the value.

Example 5: Data from Stata into Mata

. sysuse auto
(1978 Automobile Data)

. list mpg headroom trunk rep78 turn foreign in 1/3 , nolabel

+-------------------------------------------------+
| mpg   headroom   trunk   rep78   turn   foreign |
|-------------------------------------------------|
1. |  22        2.5      11       3     40         0 |
2. |  17        3.0      11       3     40         0 |
3. |  22        3.0      12       .     35         0 |
+-------------------------------------------------+

. mata:
------------------------------------------------- mata (type end to exit) ------
: Y = st_data(., "mpg headroom trunk")

: st_view(X=., ., "rep78 turn foreign")

: V = Y,X

: V[| 1,1 \ 3,6 |]
1     2     3     4     5     6
+-------------------------------------+
1 |   22   2.5    11     3    40     0  |
2 |   17     3    11     3    40     0  |
3 |   22     3    12     .    35     0  |
+-------------------------------------+

: X[3,1] = 7

: X[| 1,1 \ 3,3 |]
1    2    3
+----------------+
1 |   3   40    0  |
2 |   3   40    0  |
3 |   7   35    0  |
+----------------+

: end
--------------------------------------------------------------------------------

. list rep78 turn foreign in 1/3 , nolabel

+------------------------+
| rep78   turn   foreign |
|------------------------|
1. |     3     40         0 |
2. |     3     40         0 |
3. |     7     35         0 |
+------------------------+


After I list out the first three observations on six variables in the auto dataset, I drop down to Mata, use st_data() to put a copy of all the observations on mpg, headroom, and trunk into the Mata matrix Y, and use st_view() to create the Mata view X on to all the observations on rep78, turn, and foreign.

After row-joining Y and X to create V, I display the first 3 rows of V. Note that the third observation on rep78 is missing and that Mata matrices and views can contain missing values.

Changing the value of an element in a view changes the data in Stata. I illustrate this point by replacing the (3,1) element of the view X with 7, displaying the first three rows of the view, and listing out the first three observations on rep78, turn, and foreign.

Copying matrices between Mata and Stata

The Mata function st_matrix() puts a copy of a Stata matrix into a Mata matrix, or it puts a copy of a Mata matrix into a Stata matrix. In example 6, V = st_matrix("B") puts a copy of the Stata matrix B into the Mata matrix V.

Example 6: Creating a copy of a Stata matrix in a Mata vector

. matrix B = (1, 2\ 3, 4)

. matrix list B

B[2,2]
c1  c2
r1   1   2
r2   3   4

. mata:
------------------------------------------------- mata (type end to exit) ------
: V = st_matrix("B")

: V
1   2
+---------+
1 |  1   2  |
2 |  3   4  |
+---------+

: end
--------------------------------------------------------------------------------


In example 7, st_matrix("Z", W) puts a copy of the Mata matrix W into the Stata matrix Z.

Example 7: Creating a copy of a Mata matrix in a Stata vector

. mata:
------------------------------------------------- mata (type end to exit) ------
: W = (4..6\7..9)

: W
1   2   3
+-------------+
1 |  4   5   6  |
2 |  7   8   9  |
+-------------+

: st_matrix("Z", W)

: end
--------------------------------------------------------------------------------

. matrix list Z

Z[2,3]
c1  c2  c3
r1   4   5   6
r2   7   8   9


Strings

Mata matrices can be string matrices.

In my work, I frequently have a list of variables in a string scalar that is easier to work with as a string vector.

Turning a string scalar list into a string vector

. mata:
------------------------------------------------- mata (type end to exit) ------
: s1 = "price mpg trunk"

: s1
price mpg trunk

: s2 = tokens(s1)

: s2
1       2       3
+-------------------------+
1 |  price     mpg   trunk  |
+-------------------------+

: end
--------------------------------------------------------------------------------


I use tokens() to create the string vector s2 from the string vector s1.

Flow of control

Mata has constructs for looping over a block of code enclosed between curly braces or only executing it if an expression is true.

I frequently use the for() construction to loop over a block of code.

Code block 1: for()

mata:
for(i=1; i<=3; i=i+1) {
i
}
end


In this example, I set i to the initial value of 1. The loop will continue as long as i is less than or equal to 3. Each time through the loop, the block of code enclosed between the curly braces is executed, and 1 is added to the current value of i. The code block displays the value of i. Example 9 illustrates these points.

Example 9: A for loop

. mata:
------------------------------------------------- mata (type end to exit) ------
: for(i=1; i<=3; i=i+1) {
>         i
> }
1
2
3

: end
--------------------------------------------------------------------------------


Sometimes, I want to execute a block of code as long as a condition is true, in which case I use a while loop, as in code block 2 and example 10.

Code block 1: globala.do

i = 7
while (i>5) {
i
i = i - 1
}


I set i to 7 and repeat the block of code between the curly braces while i is greater than 5. The block of code displays the current value of i, then subtracts 1 from i.

Example 10: A while loop

. mata:
------------------------------------------------- mata (type end to exit) ------
: i = 7

: while (i>5) {
>     i
>     i = i - 1
> }
7
6

: end
--------------------------------------------------------------------------------


The if construct only executes a code block if an expression is true. I usually use the if-else construct that executes one code block if an expression is true and another code block if the expression is false.

Example 11: An if-else construct

. mata:
-------------------------------------------- mata (type end to exit) ---
: for(i=2; i<10; i=i+5) {
>         i
>         if (i<3) {
>                 "i is less than 3"
>         }
>         else {
>                 "i is not less than 3"
>         }
> }
2
i is less than 3
7
i is not less than 3

: end
-------------------------------------------------------------------------


One-line calls to Mata

I frequently make one-line calls to Mata from Stata. A one-line call to Mata causes Stata to drop to Mata, compile and execute the line of Mata code, and pop back up to Stata.

Example 12: One-line calls to Mata

. mata: st_matrix("Q", I(3))

. matrix list Q

symmetric Q[3,3]
c1  c2  c3
r1   1
r2   0   1
r3   0   0   1


In example 12, I use the one-line call to Mata mata: st_matrix("Q", I(3)) to put a copy of the Mata matrix returned by the Mata expression I(3) into the Stata matrix Q. After the one-line call to Mata, I am back in Stata, so I use matrix list Q to show that the Stata matrix Q is a copy of the Mata matrix W.

Done and undone

I used an interactive session to introduce Mata, the matrix programming language that is part of Stata.

In the next post, I show how to define Mata functions.

Categories: Programming Tags: