## Programming an estimation command in Stata: A first ado-command

I discuss the code for a simple estimation command to focus on the details of how to implement an estimation command. The command that I discuss estimates the mean by the sample average. I begin by reviewing the formulas and a do-file that implements them. I subsequently introduce ado-file programming and discuss two versions of the command. Along the way, I illustrate some of the postestimation features that work after the command.

This is the fourth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**The formulas for our estimator**

The formulas for the sample average and its estimated sampling variance, assuming an independently and identically distributed process, are

\[

\widehat{\mu} = 1/N \sum_{i=1}^N y_i

\]

\[

\widehat{Var}(\widehat{\mu}) = 1/[N(N-1)] \sum_{i=1}^N (y_i-\widehat{\mu})^2

\]

The code **mean1.do** performs these computations on **price** from the auto dataset.

**Code block 1: mean1.do**

// version 1.0.0 20Oct2015 version 14 sysuse auto quietly summarize price local sum = r(sum) local N = r(N) local mu = (1/`N')*`sum' generate double e2 = (price - `mu')^2 quietly summarize e2 local V = (1/((`N')*(`N'-1)))*r(sum) display "muhat = " `mu' display " V = " `V'

**mean1.do** uses **summarize** to compute the summations. Lines 5–7 and line 11 store results stored by **summarize** in **r()** into local macros that are subsequently used to compute the formulas. I recommend that you use **double**, instead of the default **float**, to compute all variables used in your formulas because it is almost always worth taking up the extra memory to gain the extra precision offered by **double** over **float**. (Essentially, each variable takes up twice as much space, but you get calculations that are correct to about \(10^{-16}\) instead of \(10^{-8}\).)

These calculations yield

**Example 1: Computing the average and its sampling variance**

. do mean1 . // version 1.0.0 20Oct2015 . version 14 . sysuse auto (1978 Automobile Data) . quietly summarize price . local sum = r(sum) . local N = r(N) . local mu = (1/`N')*`sum' . generate double e2 = (price - `mu')^2 . quietly summarize e2 . local V = (1/((`N')*(`N'-1)))*r(sum) . display "muhat = " `mu' muhat = 6165.2568 . display " V = " `V' V = 117561.16 . end of do-file

Now I verify that mean produces the same results.

**Example 2: Results from mean**

. mean price Mean estimation Number of obs = 74 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ price | 6165.257 342.8719 5481.914 6848.6 -------------------------------------------------------------- . matrix list e(b) symmetric e(b)[1,1] price y1 6165.2568 . matrix list e(V) symmetric e(V)[1,1] price price 117561.16

**A first ado-file**

The code in **mymean1.ado** performs the same calculations as **mean1.do**. (The file **mymean1.ado** is in my current working directory.)

**Code block 2: mymean1.ado**

*! version 1.0.0 20Oct2015 program define mymean1 version 14 quietly summarize price local sum = r(sum) local N = r(N) local mu = (1/`N')*`sum' capture drop e2 // Drop e2 if it exists generate double e2 = (price - `mu')^2 quietly summarize e2 local V = (1/((`N')*(`N'-1)))*r(sum) display "muhat = " `mu' display " V = " `V' end

Line 1 of **mymean1.ado** specifies that file defines the command **mymean1**. The command name must be the same as the file name that precedes the suffix **.ado**. The **mymean1** command performs the same computations as the do-file **mean1.do**.

**Example 3: Results from mymean1**

. mymean1 muhat = 6165.2568 V = 117561.16

**A slightly better command**

We want our command to be reusable; we want it to estimate the mean for any variable in memory, instead of only for price as performed by **mymean1.ado**. On line 5 of **mymean2.ado**, we use the **syntax** command to store the name of the variable specified by the user into the local macro **varlist**, which we use in the remainder of the computations.

**Code block 3: mymean2.ado**

*! version 2.0.0 20Oct2015 program define mymean2 version 14 syntax varlist display "The local macro varlist contains `varlist'" quietly summarize `varlist' local sum = r(sum) local N = r(N) local mu = (1/`N')*`sum' capture drop e2 // Drop e2 if it exists generate double e2 = (`varlist' - `mu')^2 quietly summarize e2 local V = (1/((`N')*(`N'-1)))*r(sum) display "The average of `varlist' is " `mu' display "The estimated variance of the average is " `V' end

The extremely powerful **syntax** command puts the elements of Stata syntax specified by the user into local macros and throws errors when the user makes a mistake. I will discuss **syntax** in greater detail in subsequent posts.

I begin by illustrating how to replicate the previous results.

**Example 4: Results from mymean2 price**

. mymean2 price The local macro varlist contains price The average of price is 6165.2568 The estimated variance of the average is 117561.16

I now illustrate that it works for another variable.

**Example 5: Results from mymean2 trunk**

. mymean2 trunk The local macro varlist contains trunk The average of trunk is 13.756757 The estimated variance of the average is .24724576 . mean trunk Mean estimation Number of obs = 74 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ trunk | 13.75676 .4972381 12.76576 14.74775 -------------------------------------------------------------- . display "The variance of the estimator is " (_se[trunk])^2 The variance of the estimator is .24724576

**Storing results in e()**

**mymean2.ado** does not save the results that it displays. We fix this problem in **mymean3.ado**. Line 2 specifies the option **e-class** on **program define** to make **mymean3** an e-class command. Line 18 uses **ereturn post** to move the matrix of point estimates (**b**) and the estimated variance-covariance of the estimator (**VCE**) into **e(b)** and **e(V)**. The estimation-postestimation framework uses parameter names for display, hypothesis tests, and other features. In lines 15 and 16, we put those names into the column stripes of the vector of estimates and the estimated **VCE**. In line 17, we put those names into the row stripe of the estimated **VCE**.

**Code block 4: mymean3.ado**

*! version 3.0.0 20Oct2015 program define mymean3, eclass version 14 syntax varlist quietly summarize `varlist' local sum = r(sum) local N = r(N) matrix b = (1/`N')*`sum' capture drop e2 // Drop e2 if it exists generate double e2 = (`varlist' - b[1,1])^2 quietly summarize e2 matrix V = (1/((`N')*(`N'-1)))*r(sum) matrix colnames b = `varlist' matrix colnames V = `varlist' matrix rownames V = `varlist' ereturn post b V ereturn display end

The **ereturn display** command on line 19 of **mymean3.ado** easily creates a standard output table using the results now stored in **e(b)** and **e(V)**.

**Example 6: Results from mymean3 trunk**

. mymean3 trunk ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- trunk | 13.75676 .4972381 27.67 0.000 12.78219 14.73133 ------------------------------------------------------------------------------

**Estimation-postestimation**

**test**, **lincom**, **testnl**, **nlcom**, and other Wald-based estimation-postestimation features work after **mymean3** because all the required information is stored in **e(b)** and **e(V)**.

To illustrate, I perform a Wald of the null hypothesis that the mean of **trunk** is \(11\).

**Example 7: test works after mymean3**

. test _b[trunk]==11 ( 1) trunk = 11 chi2( 1) = 30.74 Prob > chi2 = 0.0000

The results stored in **e()** are the glue that holds the estimation-postestimation framework together. We have only stored **e(b)** and **e(V)** so far, so not all the standard features are working yet. (But we will get there in the #StataProgramming series.)

**Using temporary names for global objects**

Stata variables and matrices are global, as discussed in my previous blog post. We need some safe names for global objects. These safe names should not be in use elsewhere, and they should be temporary in that we want Stata to drop the corresponding objects when the command finishes. The **tempvar** and **tempname** commands put safe names into local macros and then drop the corresponding objects when the ado-file or do-file finishes. We explicitly dropped **e2**, if it existed, in line 9 of code block 2, in line 12 of code block 3, and in line 11 of code block 4. We do not need such a line in code block, because we are using temporary variable names.

In line 7 of **mymean4.ado**, the **tempvar** command puts a safe name into the local macro **e2**. In line 8 of **mymean4.ado**, the **tempname** command puts safe names into the local macros **b** and **V**. I illustrate the format followed by these safe names by displaying them on lines 9–11. The output reveals that a leading pair of underscores is followed by numbers and capital letters. Line 15 illustrates the use of these safe names. Instead of creating the matrix **b**, we create the matrix whose name is stored in the local macro **b**. In line 8, the **tempname** command created the local macro **b** to hold a safe name.

**Code block 5: mymean4.ado**

*! version 4.0.0 20Oct2015 program define mymean4, eclass version 14 syntax varlist tempvar e2 tempname b V display "The safe name in e2 is `e2'" display "The safe name in b is `b'" display "The safe name in V is `V'" quietly summarize `varlist' local sum = r(sum) local N = r(N) matrix `b' = (1/`N')*`sum' generate double `e2' = (`varlist' - `b'[1,1])^2 quietly summarize `e2' matrix `V' = (1/((`N')*(`N'-1)))*r(sum) matrix colnames `b' = `varlist' matrix colnames `V' = `varlist' matrix rownames `V' = `varlist' ereturn post `b' `V' ereturn display end

This code produces the output

**Example 8: Results from mymean4 trunk**

. mymean4 trunk The safe name in e2 is __000000 The safe name in b is __000001 The safe name in V is __000002 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- trunk | 13.75676 .4972381 27.67 0.000 12.78219 14.73133 ------------------------------------------------------------------------------

Removing the lines that display the safe names contained in the local macros yields **mymean5.ado**.

**Code block 6: mymean5.ado**

*! version 5.0.0 20Oct2015 program define mymean5, eclass version 14 syntax varlist tempvar e2 tempname b V quietly summarize `varlist' local sum = r(sum) local N = r(N) matrix `b' = (1/`N')*`sum' generate double `e2' = (`varlist' - `b'[1,1])^2 quietly summarize `e2' matrix `V' = (1/((`N')*(`N'-1)))*r(sum) matrix colnames `b' = `varlist' matrix colnames `V' = `varlist' matrix rownames `V' = `varlist' ereturn post `b' `V' ereturn display end

This code produces the output

**Example 9: Results from mymean5 trunk**

. mymean5 trunk ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- trunk | 13.75676 .4972381 27.67 0.000 12.78219 14.73133 ------------------------------------------------------------------------------

**Done and undone**

I illustrated some basic ado-file programming techniques by implementing a command that estimates the mean of variable. Even though we have a command that produces correct, easy-to-read output that has some estimation-postestimation features, we have only scratched the surface of what we usually want to do in an estimation command. I dig a little deeper in the next few posts by developing a command that performs ordinary least-squares estimation.

Pingback: The Stata Blog » Programming an estimation command in Stata: A first command for OLS()

Pingback: The Stata Blog » Programming an estimation command in Stata: A first ado-command using Mata()