Home > Programming > Programming an estimation command in Stata: A first ado-command

Programming an estimation command in Stata: A first ado-command

I discuss the code for a simple estimation command to focus on the details of how to implement an estimation command. The command that I discuss estimates the mean by the sample average. I begin by reviewing the formulas and a do-file that implements them. I subsequently introduce ado-file programming and discuss two versions of the command. Along the way, I illustrate some of the postestimation features that work after the command.

This is the fourth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

The formulas for our estimator

The formulas for the sample average and its estimated sampling variance, assuming an independently and identically distributed process, are

\[
\widehat{\mu} = 1/N \sum_{i=1}^N y_i
\]

\[
\widehat{Var}(\widehat{\mu}) = 1/[N(N-1)] \sum_{i=1}^N (y_i-\widehat{\mu})^2
\]

The code mean1.do performs these computations on price from the auto dataset.

Code block 1: mean1.do

// version 1.0.0 20Oct2015
version 14
sysuse auto
quietly summarize price
local sum          = r(sum)
local N            = r(N)
local mu           = (1/`N')*`sum'
generate double e2 = (price - `mu')^2
quietly summarize e2
local V            = (1/((`N')*(`N'-1)))*r(sum)
display "muhat = " `mu'
display "   V  = " `V'

mean1.do uses summarize to compute the summations. Lines 5–7 and line 11 store results stored by summarize in r() into local macros that are subsequently used to compute the formulas. I recommend that you use double, instead of the default float, to compute all variables used in your formulas because it is almost always worth taking up the extra memory to gain the extra precision offered by double over float. (Essentially, each variable takes up twice as much space, but you get calculations that are correct to about \(10^{-16}\) instead of \(10^{-8}\).)

These calculations yield

Example 1: Computing the average and its sampling variance

. do mean1

. // version 1.0.0 20Oct2015
. version 14

. sysuse auto
(1978 Automobile Data)

. quietly summarize price

. local sum          = r(sum)

. local N            = r(N)

. local mu           = (1/`N')*`sum'

. generate double e2 = (price - `mu')^2

. quietly summarize e2

. local V            = (1/((`N')*(`N'-1)))*r(sum)

. display "muhat = " `mu'
muhat = 6165.2568

. display "   V  = " `V'
   V  = 117561.16

. 
end of do-file

Now I verify that mean produces the same results.

Example 2: Results from mean

. mean price

Mean estimation                   Number of obs   =         74

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       price |   6165.257   342.8719      5481.914      6848.6
--------------------------------------------------------------

. matrix list e(b)

symmetric e(b)[1,1]
        price
y1  6165.2568

. matrix list e(V)

symmetric e(V)[1,1]
           price
price  117561.16

A first ado-file

The code in mymean1.ado performs the same calculations as mean1.do. (The file mymean1.ado is in my current working directory.)

Code block 2: mymean1.ado

*! version 1.0.0 20Oct2015
program define mymean1
	version 14

	quietly summarize price
	local sum          = r(sum)
	local N            = r(N)
	local mu           = (1/`N')*`sum'
	capture drop e2				// Drop e2 if it exists
	generate double e2 = (price - `mu')^2
	quietly summarize e2
	local V            = (1/((`N')*(`N'-1)))*r(sum)
	display "muhat = " `mu'
	display "   V  = " `V'
end

Line 1 of mymean1.ado specifies that file defines the command mymean1. The command name must be the same as the file name that precedes the suffix .ado. The mymean1 command performs the same computations as the do-file mean1.do.

Example 3: Results from mymean1

. mymean1
muhat = 6165.2568
   V  = 117561.16

A slightly better command

We want our command to be reusable; we want it to estimate the mean for any variable in memory, instead of only for price as performed by mymean1.ado. On line 5 of mymean2.ado, we use the syntax command to store the name of the variable specified by the user into the local macro varlist, which we use in the remainder of the computations.

Code block 3: mymean2.ado

*! version 2.0.0 20Oct2015
program define mymean2
	version 14

	syntax varlist
	display "The local macro varlist contains `varlist'"

	quietly summarize `varlist'
	local sum          = r(sum)
	local N            = r(N)
	local mu           = (1/`N')*`sum'
	capture drop e2				// Drop e2 if it exists
	generate double e2 = (`varlist' - `mu')^2
	quietly summarize e2
	local V            = (1/((`N')*(`N'-1)))*r(sum)
	display  "The average of `varlist' is " `mu'
	display  "The estimated variance of the average is " `V'
end

The extremely powerful syntax command puts the elements of Stata syntax specified by the user into local macros and throws errors when the user makes a mistake. I will discuss syntax in greater detail in subsequent posts.

I begin by illustrating how to replicate the previous results.

Example 4: Results from mymean2 price

. mymean2 price
The local macro varlist contains price
The average of price is 6165.2568
The estimated variance of the average is 117561.16

I now illustrate that it works for another variable.

Example 5: Results from mymean2 trunk

. mymean2 trunk
The local macro varlist contains trunk
The average of trunk is 13.756757
The estimated variance of the average is .24724576

. mean trunk

Mean estimation                   Number of obs   =         74

--------------------------------------------------------------
             |       Mean   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       trunk |   13.75676   .4972381      12.76576    14.74775
--------------------------------------------------------------

. display "The variance of the estimator is " (_se[trunk])^2
The variance of the estimator is .24724576

Storing results in e()

mymean2.ado does not save the results that it displays. We fix this problem in mymean3.ado. Line 2 specifies the option e-class on program define to make mymean3 an e-class command. Line 18 uses ereturn post to move the matrix of point estimates (b) and the estimated variance-covariance of the estimator (VCE) into e(b) and e(V). The estimation-postestimation framework uses parameter names for display, hypothesis tests, and other features. In lines 15 and 16, we put those names into the column stripes of the vector of estimates and the estimated VCE. In line 17, we put those names into the row stripe of the estimated VCE.

Code block 4: mymean3.ado

*! version 3.0.0 20Oct2015
program define mymean3, eclass
	version 14

	syntax varlist

	quietly summarize `varlist'
	local sum          = r(sum)
	local N            = r(N)
	matrix b           = (1/`N')*`sum'
	capture drop e2				// Drop e2 if it exists
	generate double e2 = (`varlist' - b[1,1])^2
	quietly summarize e2
	matrix V           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames b  = `varlist'
	matrix colnames V  = `varlist'
	matrix rownames V  = `varlist'
	ereturn post b V
	ereturn display
end

The ereturn display command on line 19 of mymean3.ado easily creates a standard output table using the results now stored in e(b) and e(V).

Example 6: Results from mymean3 trunk

. mymean3 trunk
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   13.75676   .4972381    27.67   0.000     12.78219    14.73133
------------------------------------------------------------------------------

Estimation-postestimation

test, lincom, testnl, nlcom, and other Wald-based estimation-postestimation features work after mymean3 because all the required information is stored in e(b) and e(V).

To illustrate, I perform a Wald of the null hypothesis that the mean of trunk is \(11\).

Example 7: test works after mymean3

. test _b[trunk]==11

 ( 1)  trunk = 11

           chi2(  1) =   30.74
         Prob > chi2 =    0.0000

The results stored in e() are the glue that holds the estimation-postestimation framework together. We have only stored e(b) and e(V) so far, so not all the standard features are working yet. (But we will get there in the #StataProgramming series.)

Using temporary names for global objects

Stata variables and matrices are global, as discussed in my previous blog post. We need some safe names for global objects. These safe names should not be in use elsewhere, and they should be temporary in that we want Stata to drop the corresponding objects when the command finishes. The tempvar and tempname commands put safe names into local macros and then drop the corresponding objects when the ado-file or do-file finishes. We explicitly dropped e2, if it existed, in line 9 of code block 2, in line 12 of code block 3, and in line 11 of code block 4. We do not need such a line in code block, because we are using temporary variable names.

In line 7 of mymean4.ado, the tempvar command puts a safe name into the local macro e2. In line 8 of mymean4.ado, the tempname command puts safe names into the local macros b and V. I illustrate the format followed by these safe names by displaying them on lines 9–11. The output reveals that a leading pair of underscores is followed by numbers and capital letters. Line 15 illustrates the use of these safe names. Instead of creating the matrix b, we create the matrix whose name is stored in the local macro b. In line 8, the tempname command created the local macro b to hold a safe name.

Code block 5: mymean4.ado

*! version 4.0.0 20Oct2015
program define mymean4, eclass
	version 14

	syntax varlist

	tempvar e2 
	tempname b V
	display "The safe name in e2 is `e2'"
	display "The safe name in b is `b'"
	display "The safe name in V is `V'"
	quietly summarize `varlist'
	local sum            = r(sum)
	local N              = r(N)
	matrix `b'           = (1/`N')*`sum'
	generate double `e2' = (`varlist' - `b'[1,1])^2
	quietly summarize `e2'
	matrix `V'           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn post `b' `V'
	ereturn display
end

This code produces the output

Example 8: Results from mymean4 trunk

. mymean4 trunk
The safe name in e2 is __000000
The safe name in b is __000001
The safe name in V is __000002
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   13.75676   .4972381    27.67   0.000     12.78219    14.73133
------------------------------------------------------------------------------

Removing the lines that display the safe names contained in the local macros yields mymean5.ado.

Code block 6: mymean5.ado

*! version 5.0.0 20Oct2015
program define mymean5, eclass
	version 14

	syntax varlist

	tempvar e2 
	tempname b V
	quietly summarize `varlist'
	local sum            = r(sum)
	local N              = r(N)
	matrix `b'           = (1/`N')*`sum'
	generate double `e2' = (`varlist' - `b'[1,1])^2
	quietly summarize `e2'
	matrix `V'           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn post `b' `V'
	ereturn display
end

This code produces the output

Example 9: Results from mymean5 trunk

. mymean5 trunk
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   13.75676   .4972381    27.67   0.000     12.78219    14.73133
------------------------------------------------------------------------------

Done and undone

I illustrated some basic ado-file programming techniques by implementing a command that estimates the mean of variable. Even though we have a command that produces correct, easy-to-read output that has some estimation-postestimation features, we have only scratched the surface of what we usually want to do in an estimation command. I dig a little deeper in the next few posts by developing a command that performs ordinary least-squares estimation.