Posts Tagged ‘Mata’

Programming estimators in Stata: Why you should

Distributing a Stata command that implements a statistical method will get that method used by lots of people. They will thank you. And, they will cite you!

This post is the first in the series #StataProgramming about programing an estimation command in Stata that uses Mata to do the numerical work. In the process of showing you how to program an estimation command in Stata, I will discuss do-file programming, ado-file programming, and Mata programming. When the series ends, you will be able to write Stata commands.

Stata users like its predictable syntax and its estimation-postestimation structure that facilitates hypothesis testing, specification tests, and parameter interpretation. To help you write Stata commands that people want to use, I illustrate how Stata syntax is predictable and give an overview of the estimation-postestimation structure that you will want to emulate in your programs.

Stata structure by example

I use and describe some simulated data about the number of traffic accidents observed on 948 people.

Example 1: Accident data

. use

. describe

Contains data from
  obs:           948                          
 vars:             6                          23 Sep 2015 13:04
 size:        22,752                          
              storage   display    value
variable name   type    format     label      variable label
kids            float   %9.0g                 number of children
cvalue          float   %9.0g                 car value index
tickets         float   %9.0g                 number of tickets in last 2 years
traffic         float   %9.0g                 local traffic index, larger=>worse
male            float   %9.0g                 1=>man, 0=>woman
accidents       float   %9.0g                 number of traffic in last 5 years
Sorted by: 

Stata’s predictable syntax

I estimate the parameters of a Poisson regression model for accidents as a function of traffic conditions (traffic), an indicator for being a male driver (male), and the number of tickets received in the last two years (tickets).

Example 2: A Poisson model for accidents

. poisson accidents traffic male tickets , vce(robust)

Iteration 0:   log pseudolikelihood = -377.98594  
Iteration 1:   log pseudolikelihood = -370.68001  
Iteration 2:   log pseudolikelihood = -370.66527  
Iteration 3:   log pseudolikelihood = -370.66527  

Poisson regression                              Number of obs     =        948
                                                Wald chi2(3)      =    1798.65
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -370.66527               Pseudo R2         =     0.8191

             |               Robust
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
     traffic |   .0764399   .0165119     4.63   0.000     .0440772    .1088027
        male |   3.228004   .1232081    26.20   0.000     2.986521    3.469488
     tickets |   1.366614   .0328218    41.64   0.000     1.302284    1.430943
       _cons |  -7.434478   .2413188   -30.81   0.000    -7.907454   -6.961502

I want to focus on the structure in this example so that you can use it to make your commands easier to use. In particular, I want to discuss the structure of the command syntax and to point out that the output is easy to read and interpret because it is a standard Stata output table. For estimators that table almost always reports estimates (often coefficients), standard errors, tests against zero and their $p$-values, and confidence intervals.

Stata syntax is predictable, which makes it easy to use. Stata users “speak Stata” and do not even notice the details. I highlight some of these details so that we can make the syntax of the commands we write predictable. Here are some of the standard syntax elements illustrated in example 2.

  1. The command has four syntactical elements;
    1. command name (poisson),
    2. list of variable names (accidents traffic male tickets),
    3. a comma,
    4. an option (vce(robust)).
  2. In the list of variable names, the name of the dependent variable is first and it is followed by the names of the independent variables.
  3. The job of the comma is to separate the command name and variable list from the option or options.

The output is also structured; it is composed of an iteration log, a header, and a standard output table.

Estimation-postestimation framework

As a Stata user, I could now use the estimation-postestimation framework. For example, I could perform a Wald test of the hypothesis that the coefficient on male is 3.

Example 3: A Wald test of a linear restriction

. test male = 3

 ( 1)  [accidents]male = 3

           chi2(  1) =    3.42
         Prob > chi2 =    0.0642

or I could perform a Wald test of the nonlinear hypothesis that the ratio of the coefficient on male to the ratio of the coefficient on tickets is 2.

Example 4: A Wald test of a nonlinear restriction

. testnl _b[male]/_b[tickets] = 2

  (1)  _b[male]/_b[tickets] = 2

               chi2(1) =       19.65
           Prob > chi2 =        0.0000

I could also predict the mean of accidents for each observation and summarize the results.

Example 5: Summarizing the predicted conditional means

. predict nhat
(option n assumed; predicted number of events)

. summarize nhat

    Variable |        Obs        Mean    Std. Dev.       Min        Max
        nhat |        948    .8512658    2.971087   .0006086    29.0763

Finally, I could use margins to estimate conditional or population-averaged parameters that are functions of the parameters in the original model. I use margins to estimate the average number of accidents that would be observed if each individual received 0 tickets, or 1 ticket, or 2 tickets, …, or 7 tickets. See [R] margins, Long and Freese (2006, sec. 4.4.2-4.4.3), and Cameron and Trivedi (2010, 10.5.6{10.6.9) for introductions to estimating functions of the the model parameters by margins.

Example 6: Estimating functions of model parameters

. margins, at(tickets=(0 1 2 3 4 5 6 7))

Predictive margins                              Number of obs     =        948
Model VCE    : Robust

Expression   : Predicted number of events, predict()

1._at        : tickets         =           0

2._at        : tickets         =           1

3._at        : tickets         =           2

4._at        : tickets         =           3

5._at        : tickets         =           4

6._at        : tickets         =           5

7._at        : tickets         =           6

8._at        : tickets         =           7

             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
         _at |
          1  |   .0097252   .0015387     6.32   0.000     .0067094     .012741
          2  |   .0381426   .0048762     7.82   0.000     .0285854    .0476998
          3  |   .1495971   .0148157    10.10   0.000      .120559    .1786353
          4  |   .5867272   .0432256    13.57   0.000     .5020066    .6714478
          5  |   2.301172   .1302033    17.67   0.000     2.045978    2.556366
          6  |   9.025308   .5049176    17.87   0.000     8.035688    10.01493
          7  |   35.39769   2.555679    13.85   0.000     30.38865    40.40673
          8  |   138.8315   13.49606    10.29   0.000     112.3797    165.2832

The glue

The estimation results stored in e() are the glue that holds together the estimation-postestimation framework. The poisson command stores lots of stuff in e(). I could use ereturn list to list all this stuff, but there are many stored objects that do not interest you yet.

Most of the estimation-postestimation features that I discussed were implemented using e(b), e(V), and e(predict), which are the vector of point estimates, the estimated VCE, and the name of the command that implements predict after poisson.

I will show how to store what you need in e() in the #StataProgramming series.

Structure of Stata commands

Here is an outline of the tasks performed by a Stata estimation command.

  1. Parse the input to the command.
  2. Compute results.
  3. Store results in e()
  4. Display output.

You need to write a predict command to complete the estimation-postestimation framework. After you have stored the estimation results and written the predict command, margins works.

I will explain each of these steps in the #StataProgramming series of posts.

Use this structure to your advantage. To make your command easy to use, design it to have the predictable syntax implemented in other commands and make it work in the estimation-postestimation framework. This task is far easier than it sounds. In fact, it is just plain easy. The Stata language steers you in this direction.

Done and undone

I will teach you how to program an estimation command in Stata in the #StataProgramming series. I will also show you how do the numerical work in Mata. I discussed the following points, in this first post.

  1. The predictable structure of Stata syntax makes Stata easy to use. You should emulate this structure, so that your commands are easy to use.
  2. The estimation-postestimation framework makes inference and advanced estimation simple. It is easy for you to make your command work with this framework.
  3. The estimation results stored in e(), and the predict command, are the glue that holds the estimation-postestimation framework together.

In the next post, I discuss do-file programming tools that I will subsequently use to parse the input to the command.


Cameron, A. C., and P. K. Trivedi. 2010. Microeconometrics Using Stata. Revised ed. College Station, Texas: Stata Press.

Long, J. S., and J. Freese. 2014. Regression models for categorical dependent variables using Stata. 3rd ed. College Station, Texas: Stata Press.

Advanced Mata: Pointers

I’m still recycling my talk called “Mata, The Missing Manual” at user meetings, a talk designed to make Mata more approachable. One of the things I say late in the talk is, “Unless you already know what pointers are and know you need them, ignore them. You don’t need them.” And here I am writing about, of all things, pointers. Well, I exaggerated a little in my talk, but just a little.

Before you take my previous advice and stop reading, let me explain: Mata serves a number of purposes and one of them is as the primary langugage we at StataCorp use to implement new features in Stata. I’m not referring to mock ups, toys, and experiments, I’m talking about ready-to-ship code. Stata 12’s Structural Equation Modeling features are written in Mata, so is Multiple Imputation, so is Stata’s optimizer that is used by nearly all estimation commands, and so are most features. Mata has a side to it that is exceedingly serious and intended for use by serious developers, and every one of those features are available to users just as they are to StataCorp developers. This is one of the reasons there are so many user-written commands are available for Stata. Even if you don’t use the serious features, you benefit.

So every so often I need to take time out and address the concerns of these user/developers. I knew I needed to do that now when Kit Baum emailed a question to me that ended with “I’m stumped.” Kit is the author of An Introduction to Stata Programming which has done more to make Mata approachable to professional researchers than anything StataCorp has done, and Kit is not often stumped.

I have a certain reptutation about how I answer most questions. “Why do you want to do that?” I invariably reply, or worse, “You don’t want to do that!” and then go on to give the answer to the question I wished they had asked. When Kit asks a question, however, I just answer it. Kit asked a question about pointers by setting up an artificial example and I have no idea what his real motivation was, so I’m not even going to try to motivate the question for you. The question is interesting in and of itself anyway.

Here is Kit’s artificial example:

real function x2(real scalar x) return(x^2)
real function x3(real scalar x) return(x^3) 

void function tryit() 
        pointer(real scalar function) scalar fn
        string rowvector                     func
        real scalar                          i

        func = ("x2", "x3")
        for(i=1;i<=length(func);i++) {
                fn = &(func[i])

Kit is working with pointers, and not just pointers to variables, but pointers to functions. A pointer is the memory address, the address where the variable or function is stored. Real compilers translate names into memory addresses which is one of the reasons real compilers produce code that runs fast. Mata is a real compiler. Anyway, pointers are memory addresses, such as 58, 212,770, 427,339,488, except the values are usually written in hexadecimal rather than decimal. In the example, Kit has two functions, x2(x) and x3(x). Kit wants to create a vector of the function addresses and then call each of the functions in the vector. In the artificial example, he's calling each with an argument of 4.

The above code does not work:

: tryit()
         tryit():  3101  matrix found where function required
         <istmt>:     -  function returned error

The error message is from the Mata compiler and it's complaining about the line


but the real problem is earlier in the tryit() code.

One corrected version of tryit() would read,

void function tryit()
        pointer(real scalar function) scalar fn
        pointer(real scalar function) vector func     // <---
        real scalar                          i

        func = (&x2(), &x3())                         // <---
        for(i=1;i<=length(func);i++) {
                fn = func[i]                          // <---

If you make the three changes I marked, tryit() works:

: tryit()

I want to explain this code and alternative ways the code could have been fixed. It will be easier if we just work interactively, so let's start all over again:

: real scalar x2(x) return(x^2)

: real scalar x3(x) return(x^3)

: func = (&x2(), &x3())

Let's take a look at what is in func:

: func
                1            2
  1 |  0x19551ef8   0x19552048  |

Those are memory addresses. When we typed &x2() and &x3() in the line

: func = (&x2(), &x3())

functions x2() and x3() were not called. &x2() and &x3() instead evaluate to the addresses of the functions named x2() and x3(). I can demonstrate this:

: &x2()

0x19551ef8 is the memory address of where the function x2() is stored. 0x19551ef8 may not look like a number, but that is only because it is presented in base 16. 0x19551ef8 is in fact the number 425,008,888, and the compiled code for the function x2() starts at the 425,008,888th byte of memory and continues thereafter.

Let's assign to fn the value of the address of one of the functions, say x2(). I could do that by typing

: fn = func[1]

or by typing

: fn = &x2()

and either way, when I look at fn, it contains a memory address:

: fn

Let's now call the function whose address we have stored in fn:

: (*fn)(2)

When we call a function and want to pass 2 as an argument, we normally code f(2). In this case, we substitute (*fn) for f because we do not want to call the function named f(), we want to call the function whose address is stored in variable fn. The operator * usually means multiplication, but when * is used as a prefix, it means something different, in much the same way the minus operator - can be subtract or negate. The meaning of unary * is "the contents of". When we code *fn, we mean not the value 425,008,888 stored in fn, we mean the contents of the memory address 425,008,888, which happens to be the function x2().

We type (*fn)(2) and not *fn(2) because *fn(2) would be interpreted to mean *(fn(2)). If there were a function named fn(), that function would be called with argument 2, the result obtained, and then the star would take the contents of that memory address, assuming fn(2) returned a memory address. If it didn't, we'd get a type mismatch error.

The syntax can be confusing until you understand the reasoning behind it. Let's start with all new names. Consider something named X. Actually, there could be two different things named X and Mata would not be confused. There could be a variable named X and there could be a function named X(). To Mata, X and X() are different things, or said in the jargon, have different name spaces. In Mata, variables and functions can have the same names. Variables and functions having the same names in C is not allowed -- C has only one name space. So in C, you can type

fn = &x2

to obtain the address of variable x2 or function x2(), but in Mata, the above means the address of the variable x2, and if there is no such variable, that's an error. In Mata, to obtain the address of function x2(), you type

fn = &x2()

The syntax &x2() is a definitional nugget; there is no taking it apart to understand its logic. But we can take apart the logic of the programmer who defined the syntax. & means "address of" and &thing means to take the address of thing. If thing is a name -- &name -- that means to look up name in the variable space and return its address. If thing is name(), that means look up name in the function space and return its address. They way we formally write this grammar is

 &thing, where 

 thing  :=   name

There are three possibilities for thing; it's a name or it's a name followed by () or it's an expression. The last is not much used. &2 creates a literal 2 and then tells you the address where the 2 is stored, which might be 0x195525d8. &(2+3) creates 5 and then tells you where the 5 is stored.

But let's get back to Kit's problem. Kit coded,

func = ("x2", "x3")

and I said no, code instead

func = (&x2(), &x3())

You do not use strings to obtain pointers, you use the actual name prefixed by ampersand.

There's a subtle difference in what Kit was trying to code and what I did code, however. In what Kit tried to code, Kit was seeking "run-time binding". I, however, coded "compile-time binding". I'm about to explain the difference and show you how to achieve run-time binding, but before I do, let me tell you that

  1. You probably want compile-time binding.
  2. Compile-time binding is faster.
  3. Run-time binding is sometimes required, but when persons new to pointers think they need run-time binding, they usually do not.

Let me define compile-time and run-time binding:

  1. Binding refers to establishing addresses corresponding to names and names(). The names are said to be bound to the address.

  2. In compile-time binding, the addresses are established at the time the code is compiled.

    More correctly, compile-time binding does not really occur at the time the code is compiled, it occurs when the code is brought together for execution, an act called linking and which happens automatically in Mata. This is a fine and unimportant distiction, but I do not want you to think that all the functions have to be compiled at the same time or that the order in which they are compiled matters.

    In compile-time binding, if any functions are missing when the code is brought together for execution, and error message is issued.

  3. In run-time binding, the addresses are established at the time the code is executed (run), which happens after compilation, and after linking, and is an explicit act performed by you, the programmer.

To obtain the address of a variable or function at run-time, you use built-in function findexternal(). findexternal() takes one argument, a string scalar, containing the name of the object to be found. The function looks up that name and returns the address corresponding to it, or it returns NULL if the object cannot be found. NULL is the word used to mean invalid memory address and is in fact defined as equaling zero.

findexternal() can be used only with globals. The other variables that appear in your program might appear to have names, but those names are used solely by the compiler and, in the compiled code, these "stack-variables" or "local variables" are referred to by their addresses. The names play no other role and are not even preserved, so findexternal() cannot be used to obtain their addresses. There would be no reason you would want findexternal() to find their addresses because, in all such cases, the ampersand prefix is a perfect substitute.

Functions, however, are global, so we can look up functions. Watch:

: findexternal("x2()")

Compare that with

: &x2()

It's the same result, but they were produced differently. In the findexternal() case, the 0x19551ef8 result was produced after the code was compiled and assembled. The value was obtained, in fact, by execution of the findexternal() function.

In the &x2() case, the 0x19551ef8 result was obtained during the compile/assembly process. We can better understand the distinction if we look up a function that does not exist. I have no function named x4(). Let's obtain x4()'s address:

: findexternal("x4()")

: &x4()
         <istmt>:  3499  x4() not found

I may have no function named x4(), but that didn't bother findexternal(). It merely returned 0x0, another way of saying NULL.

In the &x4() case, the compiler issued an error. The compiler, faced with evaluating &x4(), could not, and so complained.

Anyway, here is how we could write tryit() with run-time binding using the findexternal() function:

void function tryit() 
        pointer(real scalar function) scalar fn
        pointer(real scalar function) vector func
        real scalar                          i

        func = (findexternal("x2()"), findexternal("x3()")

        for(i=1;i<=length(func);i++) {
                fn = func[i]

To obtain run-time rather than compile-time bindings, all I did was change the line

        func = (&x2(), &x3())

to be

        func = (findexternal("x2()"), findexternal("x3()")

Or we could write it this way:

void function tryit() 
        pointer(real scalar function) scalar fn
        string vector                        func
        real scalar                          i

        func = ("x2()", "x3()")

        for(i=1;i<=length(func);i++) {
                fn = findexternal(func[i])

In this variation, I put the names in a string vector just as Kit did originally. Then I changed the line that Kit wrote,

        fn = &(func[i])

to read

        fn = findexternal(func[i])

Either way you code it, when performing run-time binding, you the programmer should deal with what is to be done if the function is not found. The loop

for(i=1;i<=length(func);i++) {
        fn = findexternal(func[i])

would better read

for(i=1;i<=length(func);i++) {
        fn = findexternal(func[i])
        if (fn!=NULL) {
        else {

Unlike C, if you do not include the code for the not-found case, the program will not crash if the function is not found. Mata will give you an "invalid use of NULL pointer" error message and a traceback log.

If you were writing a program in which the user of your program was to pass to you a function you were to use, such as a likelihood function to be maximized, you could write your program with compile-time binding by coding,

function myopt(..., pointer(real scalar function) scalar f, ...)
        ... (*f)(...) ...

and the user would call you program my coding myopt(..., &myfunc(), ...), or you could use run-time binding by coding

function myopt(..., string scalar fname, ...)
        pointer(real scalar function) scalar f

        f = findexternal(fname)
        if (f==NULL) {
                errprintf("function %s() not found\n", fname)
        ... (*f)(...) ...

and the user would call your program by coding myopt(..., "myfunc()", ...).

In this case I could be convinced to prefer the run-time binding solution for professional code because, the error being tolerated by Mata, I can write code to give a better, more professional looking error message.

Categories: Mata Tags: , ,

Mata, the missing manual, available at SSC

I gave a 1.5 hour talk on Mata at the 2010 UK Stata Users Group Meeting in September. The slides are available in pdf form here. The talk was well received, which of course pleased me. If you’re interested in Mata, I predict you will find the slides useful even if you didn’t attend the meeting.

The problem with the Mata Reference Manual is that, even though it tells you all the details, it never tells you how to put it all together, nor does it motivate you. We developers at StataCorp love the manual for just that reason: It gets right to the details that are so easy to forget.

Anyway, in outline, the talk and slides work like this

  1. They start with the mechanics of including Mata code. It begins gently, at the end of Stata’s NetCourse 151, and ends up discussing big — really big — systems.
  2. Next is a section on appropriate and inappropriate use of Mata.
  3. That’s followed by Mata concepts, from basic to advanced.
  4. And the talk includes a section on debugging!

I was nervous about how the talk would be received before I gave it. It’s been on my to-do list to write a book on Mata, but I never really found a way to approach the subject. The problem is that it’s all so obvious to me that I tend to launch immediately into tedious details. I wrote drafts of a few chapters more than once, and even I didn’t want to reread them.

I don’t know why this overview approach didn’t occur to me earlier. My excuse is that it’s a strange (I claim novel) combination of basic and advanced material, but it seems to work. I titled the talk “missing manual” with the implied promise that I would write that book if the talk was well received. It was. Nowadays, I’m not promising when. Real Soon Now.

The materials for all the talks, not just mine, are available at the SSC(*) and on For the UK 2010, go to or For other User Group Meetings, it’s easiest to start at the Stata page Meeting Proceedings.

If you have questions on the material, the appropriate place to post them is Statalist. I’m a member and am likely to reply, and that way, others who might be interested get to see the exchange, too. Please use “Mata missing manual” as the subject so that it will be easy for nonmembers to search the Statalist Archives and find the thread.

Finally, my “Stata, the missing manual” talk has no connection with the fine Missing-Manual series, “the book that should have been in the box”, created by Pogue Press and O’Reilly Media, whose website is

* The SSC is the Statistical Software Components archive, often called the Boston College Archive, provided by The SSC has become the premier Stata download site for user-written software on the Internet and also archives proceedings of Stata Users Group meetings and conferences.

Categories: Mata Tags: , , , ,