## Programming an estimation command in Stata: A first ado-command using Mata

I discuss a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrate a general structure for Stata/Mata programs. This post builds on Programming an estimation command in Stata: Mata 101, Programming an estimation command in Stata: Mata functions, and Programming an estimation command in Stata: A first ado-command.

This is the thirteenth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Using Mata in ado-programs**

I begin by reviewing the structure in **mymean5.ado**, which I discussed in Programming an estimation command in Stata: A first ado-command.

**Code block 1: mymean5.ado**

*! version 5.0.0 20Oct2015 program define mymean5, eclass version 14 syntax varlist(max=1) tempvar e2 tempname b V quietly summarize `varlist' local sum = r(sum) local N = r(N) matrix `b' = (1/`N')*`sum' generate double `e2' = (`varlist' - `b'[1,1])^2 quietly summarize `e2' matrix `V' = (1/((`N')*(`N'-1)))*r(sum) matrix colnames `b' = `varlist' matrix colnames `V' = `varlist' matrix rownames `V' = `varlist' ereturn post `b' `V' ereturn display end

The **syntax** command on line 5 stores the name of the variable for which the command estimates the mean. The **tempvar** and **tempname** commands on lines 7 and 8 put temporary names into the local macros **e2**, **b**, and **V**. Lines 9-15 compute the point estimates and the estimated variance-covariance of the estimator (**VCE**), using the temporary names for objects, so as not to overwrite user-created objects. Lines 16–18 put the column name on the point estimate and row and column names on the estimated **VCE**. Line 19 posts the point estimate and the estimated **VCE** to **e(b)** and **e(V)**, respectively. Line 20 produces a standard output table from the information stored in **e(b)** and **e(V)**.

By the end of this post, I will have a command that replaces the Stata computations on lines 9–15 with Mata computations. To illustrate the structure of Stata-Mata programming, I start off only computing the point estimate in **myregress6**.

**Code block 2: mymean6.ado**

*! version 6.0.0 05Dec2015 program define mymean6, eclass version 14 syntax varlist(max=1) mata: x = st_data(., "`varlist'") mata: w = mean(x) mata: st_matrix("Q", w) display "The point estimates are in Q" matrix list Q end

Line 7 executes a one-line call to Mata; in this construction, Stata drops down to Mata, executes the Mata expression, and pops back up to Stata. Popping down to Mata and back up to Stata takes almost no time, but I would like to avoid doing it three times. (Lines 8 and 9 are also one-line calls to Mata.)

Line 7 puts a copy of all the observations on the variable for which the command estimates the mean in the Mata column vector named **x**, which is stored in global Mata memory. Line 8 stores the mean of the column vector in the 1\(\times\)1 matrix named **w**, which is also stored in global Mata memory. Line 9 copies the Mata matrix **w** to the Stata matrix named **Q**. Lines 11 and 12 display the results stored in Stata.

I illustrate what **myregress6** produces in example 1.

**Example 1: myregress6 uses global Mata memory**

. sysuse auto (1978 Automobile Data) . mymean6 price The point estimates are in Q symmetric Q[1,1] c1 r1 6165.2568 . matrix dir Q[1,1] . mata: mata describe # bytes type name and extent ------------------------------------------------------------------------------- 8 real scalar w 592 real colvector x[74] -------------------------------------------------------------------------------

I use **matrix dir** to illustrate that **Q** is a Stata matrix, and I use **mata describe** to illustrate that **x** and **w** are objects in global Mata memory. Using fixed names for an object in Stata memory or in global Mata memory should be avoided, because you can overwrite users’ data.

**mymean7** does not put anything in global Mata memory; all computations are done using objects that are local to the Mata function **mymean_work()**. **mymean7** uses temporary names for objects stored in Stata memory.

**Code block 3: mymean7.ado**

*! version 7.0.0 07Dec2015 program define mymean7, eclass version 14 syntax varlist(max=1) tempname b mata: mymean_work("`varlist'", "`b'") display "b is " matrix list `b' end mata: void mymean_work(string scalar vname, string scalar mname) { real vector x real scalar w x = st_data(., vname) w = mean(x) st_matrix(mname, w) } end

There are two parts to **mymean7.ado**: an ado-program and a Mata function. The ado-program is defined on lines 2–12. The Mata function **mymean_work()** is defined on lines 14–24. The Mata function **mymean_work()** is local to the ado-program **mymean7**.

Line 8 uses a one-line call to Mata to execute **mymean_work()**. **mymean_work()** does not return anything to global Mata memory, and we are passing in two arguments. The first argument is a string scalar containing the name of the variable for which the function should compute the estimate. The second argument is a string scalar containing the temporary name stored in the local macro **b**. This temporary name will be the name of a Stata matrix that stores the point estimate computed in **mymean_work()**.

Line 15 declares the function **mymean_work()**. Function declarations specify what the function returns, the name of the function, and the arguments that the function accepts; see Programming an estimation command in Stata: Mata functions for a quick introduction.

The word **void** on line 15 specifies that the function does not return an argument; in other words, it returns nothing. What precedes the **(** is the function name; thus, **mymean_work()** is the name of the function. The words **string scalar vname** specify that the first argument of **mymean_work()** is a string scalar that is known as **vname** inside **mymean_work()**. The comma separates the first argument from the second argument. The words **string scalar mname** specify that the second argument of **mymean_work()** is a string scalar that is known as **mname** inside **mymean_work()**. **)** closes the function declaration.

Lines 17-22 define **mymean_work()** because they are enclosed between the curly braces on lines 16 and 23. Lines 17 and 18 declare the real vector **x** and real scalar **w**, which are local to **mymean_work()**. Lines 20 and 21 compute the estimate. Line 22 copies the estimate stored in the scalar **w**, which is local to the Mata function **mymean_work()**, to the Stata matrix whose name is stored in the string scalar **mname**, which contains the temporary name contained in the local macro **b** that was passed to the function on line 8.

The structure used in **mymean7** ensures three important features.

- It does not use global Mata memory.
- It uses temporary names for global Stata objects.
- It leaves nothing behind in Mata memory or Stata memory.

Examples 2 and 3 combine to illustrate feature (3); example 2 clears Stata and Mata memory, and example 3 shows that **mymean7** leaves nothing in the previously cleared memory.

**Example 2: Removing objects from Stata and Mata memory**

. clear all . matrix dir . mata: mata describe # bytes type name and extent ------------------------------------------------------------------------------- -------------------------------------------------------------------------------

In detail, I use **clear all** to drop all objects from Stata and Mata memory, use **matrix dir** to illustrate that no matrices were left in Stata memory, and use **mata describe** to illustrate that no objects were left in Mata memory.

**Example 3: mymean7 leaves nothing in Stata or Mata memory**

. sysuse auto (1978 Automobile Data) . mymean7 price b is symmetric __000000[1,1] c1 r1 6165.2568 . matrix dir . mata: mata describe # bytes type name and extent ------------------------------------------------------------------------------- -------------------------------------------------------------------------------

In example 3, I use **mymean7** to estimate the mean, and use **matrix dir** and **mata describe** to illustrate that **mymean7** did not leave Stata matrices or Mata objects in memory. The output also illustrates that the temporary name **__000000** was used for the Stata matrix that held the result before the ado-program terminated.

While it is good that **mymean7** leaves nothing in global Stata or Mata memory, it is bad that **mymean7** does not leave the estimate behind somewhere, like in **e()**.

**mymean8** stores the results in **e()** and has the features of **mymean5**, but computes its results in Mata.

**Code block 4: mymean8.ado**

*! version 8.0.0 07Dec2015 program define mymean8, eclass version 14 syntax varlist(max=1) tempname b V mata: mymean_work("`varlist'", "`b'", "`V'") matrix colnames `b' = `varlist' matrix colnames `V' = `varlist' matrix rownames `V' = `varlist' ereturn post `b' `V' ereturn display end mata: void mymean_work( /// string scalar vname, /// string scalar mname, /// string scalar vcename) { real vector x, e2 real scalar w, n, v x = st_data(., vname) n = rows(x) w = mean(x) e2 = (x :- w):^2 v = (1/(n*(n-1)))*sum(e2) st_matrix(mname, w) st_matrix(vcename, v) } end

Line 8 is a one-line call to **mymean_work()**, which now has three arguments: the name of the variable whose mean is to be estimated, a temporary name for the Stata matrix that will hold the point estimate, and a temporary name for the Stata matrix that will hold the estimated **VCE**. The declaration **mymean_work()** on lines 17-20 has been adjusted accordingly; each of the three arguments is a string scalar. Lines 22 and 23 declare objects local to **mymean_work()**. Lines 25-29 compute the mean and the estimated **VCE**. Lines 30 and 31 copy these results to Stata matrices, under the temporary names in the second and third arguments.

There is a logic to the order of the arguments in **mymean_work()**; the first argument is the name of an input, the second and third arguments are temporary names for the outputs.

Returning to the ado-code, we see that lines 9–11 put row or column names on the point estimate or the estimated **VCE**. Line 12 posts the results to **e()**, which are displayed by line 13.

Example 4 illustrates that **mymean8** produces the same point estimate and standard error as produced by **mean**.

**Example 4: Comparing mymean8 and mean**

. mymean8 price ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- price | 6165.257 342.8719 17.98 0.000 5493.24 6837.273 ------------------------------------------------------------------------------ . mean price Mean estimation Number of obs = 74 -------------------------------------------------------------- | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ price | 6165.257 342.8719 5481.914 6848.6 --------------------------------------------------------------

The confidence intervals produced by **mymean8** differ from those produced by **mean** because **mymean8** uses a normal distribution while **mean** uses a \(t\) distribution. The **mymean_work()** in **mymean9** uses a fourth argument to remove this difference.

**Code block 5: mymean9.ado**

*! version 9.0.0 07Dec2015 program define mymean9, eclass version 14 syntax varlist(max=1) tempname b V dfr mata: mymean_work("`varlist'", "`b'", "`V'", "`dfr'") matrix colnames `b' = `varlist' matrix colnames `V' = `varlist' matrix rownames `V' = `varlist' ereturn post `b' `V' ereturn scalar df_r = `dfr' ereturn display end mata: void mymean_work( /// string scalar vname, /// string scalar mname, /// string scalar vcename, /// string scalar dfrname) { real vector x, e2 real scalar w, n, v x = st_data(., vname) n = rows(x) w = mean(x) e2 = (x :- w):^2 v = (1/(n*(n-1)))*sum(e2) st_matrix(mname, w) st_matrix(vcename, v) st_numscalar(dfrname, n-1) } end

On line 8, **mymean_work()** accepts four arguments. The fourth argument is new; it contains the temporary name that is used for the Stata scalar that holds the residual degrees of freedom. Line 34 copies the value of the expression **n-1** to the Stata numeric scalar whose name is stored in the string scalar **dfrname**; this Stata scalar now contains the residual degrees of freedom. Line 13 stores the residual degrees of freedom in **e(df_r)**, which causes **ereturn display** to use a \(t\) distribution instead of a normal distribution.

**Example 5: mymean9 uses a t distribution**

. mymean9 price ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- price | 6165.257 342.8719 17.98 0.000 5481.914 6848.6 ------------------------------------------------------------------------------

**mymean9** has five basic parts.

- It parses the user input.
- It uses a one-line call to a Mata work routine to compute results and to store these results in Stata matrices whose temporary names are passed to the Mata work routine.
- It puts on the column and row names and stores the results in
**e()**. - It displays the results.
- It defines the Mata work routine after the
**end**that terminates the definition of the ado-program.

This structure can accommodate any estimator whose results we can store in **e()**. The details of each part become increasingly complicated, but the structure remains the same. In future posts, I discuss Stata/Mata programs with this structure that implement the ordinary least-squares (**OLS**) estimator and the Poisson quasi-maximum-likelihood estimator.

**Done and undone**

I discussed a sequence of ado-commands that use Mata to estimate the mean of a variable. The commands illustrated a general structure for Stata/Mata programs.

In the next post, I show some Mata computations that produce the point estimates, an **IID VCE**, a robust **VCE**, and a cluster-robust **VCE** for the **OLS** estimator.