## Programming an estimation command in Stata: A first command for OLS

\(

\newcommand{\betab}{\boldsymbol{\beta}}

\newcommand{\xb}{{\bf x}}

\newcommand{\yb}{{\bf y}}

\newcommand{\Xb}{{\bf X}}

\)I show how to write a Stata estimation command that implements the ordinary least-squares (**OLS**) estimator by explaining the code. I use concepts that I introduced in previous #StataProgramming posts. In particular, I build on Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects, in which I recalled the **OLS** formulas and showed how to compute them using Stata matrix commands and functions and on

Programming an estimation command in Stata: A first ado command, in which I introduced some ado-programming concepts. Although I introduce some local macro tricks that I use all the time, I also build on Programing an estimation command in Stata: Where to store your stuff.

This is the sixth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Local macro tricks**

I use lots of local macro tricks in my ado-files. In this section, I illustrate ones that I use in the commands that I develop in this post. In every ado-file that I write, I ask questions about lists of variable names stored in local macros. I frequently use the extended macro functions and the **gettoken** command to ask these questions and store the results in a local macro.

The syntax for storing the result of an extended macro function in a local macro is

**local** *localname* : *extended_fcn*

Below, I use the extended macro function **word count** to count the number of elements in the list and store the result in the local macro **count**.

**Example 1: Storing and extracting the result of an extended macro function**

. local count : word count a b c . display "count contains `count'" count contains 3

There are many extended macro functions, but I illustrate just the one I use in this post; type **help extended fcn** for a complete list.

A token is an element in a list. I frequently use the **gettoken** command to split lists apart. The **gettoken** command has the syntax

**gettoken** *localname1* **[***localname2***]** : *localname3*

**gettoken** stores the first token in the list stored in the local macro *localname3* into the local macro *localname1*. If the optional *localname2* is specified, the remaining tokens are stored in the local macro *localname2*.

I use **gettoken** to store the first token stored in **mylist** into the local macro **first**, whose contents I subsequently extract and display.

**Example 2: Using gettoken to store first token only**

. local mylist y x1 x2 . display "mylist contains `mylist'" mylist contains y x1 x2 . gettoken first : mylist . display "first contains `first'" first contains y

Now, I use **gettoken** to store the first token stored in **mylist** into the local macro **first** and the remaining tokens into the local macro **left**. I subsequently extract and display the contents of **first** and **left**.

**Example 3: Using gettoken to store first and remaining tokens**

. gettoken first left: mylist . display "first contains `first'" first contains y . display "left contains `left'" left contains x1 x2

I frequently want to increase the value of a local macro by some fixed amount, say, \(3\). I now illustrate a solution that I use.

**Example 4: Local macro update**

. local p = 1 . local p = `p' + 3 . display "p is now `p'" p is now 4

When the update value, also known as the increment value, is \(1\), we can use the increment operator, as below:

**Example 5: Local macro update**

. local p = 1 . local ++p . display "p is now `p'" p is now 2

**A first version of myregress**

The code in **myregress1** implements a version of the **OLS** formulas. I use **myregress1** in example 6. Below example 6, I discuss the code and the output.

**Code block 1: myregress1.ado**

*! version 1.0.0 23Oct2015 program define myregress1, eclass version 14 syntax varlist display "The syntax command puts the variables specified by the " display " user into the local macro varlist" display " varlist contains `varlist'" gettoken depvar : varlist display "The dependent variable is `depvar'" matrix accum zpz = `varlist' display "matrix accum forms Z'Z" matrix list zpz local p : word count `varlist' local p = `p' + 1 matrix xpx = zpz[2..`p', 2..`p'] matrix xpy = zpz[2..`p', 1] matrix xpxi = syminv(xpx) matrix b = (xpxi*xpy)' matrix score double xbhat = b generate double res = (`depvar' - xbhat) generate double res2 = res^2 summarize res2 local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix V = `s2'*xpxi ereturn post b V ereturn local cmd "myregress1" ereturn display end

**Example 6: myregress1 output**

. sysuse auto (1978 Automobile Data) . myregress1 price mpg trunk The syntax command puts the variables specified by the user into the local macro varlist varlist contains price mpg trunk The dependent variable is price (obs=74) matrix accum forms Z'Z symmetric zpz[4,4] price mpg trunk _cons price 3.448e+09 mpg 9132716 36008 trunk 6565725 20630 15340 _cons 456229 1576 1018 74 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- res2 | 74 6674851 1.30e+07 11.24372 9.43e+07 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -348.7241 -91.6057 trunk | 43.55851 88.71884 0.49 0.623 -130.3272 217.4442 _cons | 10254.95 2349.084 4.37 0.000 5650.83 14859.07 ------------------------------------------------------------------------------

Here are my comments on the code and the output in example 6.

- Line 2 specifies that
**myregress1**is an e-class command that stores its results in**e()**. - Lines 5–8 illustrate that the
**syntax**command stores the names of the variables specified by the user in the local macro**varlist**. This behavior is also illustrated in example 6. - Line 10 uses the
**gettoken**command to store the first variable name stored in the local macro**varlist**in the local macro**depvar**. Line 11 displays this name and the usage is illustrated in example 6. - Line 13 uses
**matrix accum**to put \((\Xb’\Xb)\) and \((\Xb’\yb)\) into a Stata matrix named**zpz**, as discussed in Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects and further illustrated in lines 14–15 and example 6. - Line 17 stores the number of variables in the local macro
**varlist**into the local macro**p**. - Line 18 increments the local macro
**p**by \(1\) to account for the constant term included by**matrix accum**by default. - Lines 20–23 extract \((\Xb’\Xb)\) and \((\Xb’\yb)\) from
**zpz**and put the vector of point estimates \(\widehat{\betab}\) into the Stata row vector**b**. - Line 25 puts \(\Xb\widehat{\betab}\) into the variable
**xbhat**. - Lines 26 and 27 calculate the residuals and the squared residuals, respectively.
- Lines 28–32 calculate the estimated variance-covariance matrix of the estimator (
**VCE**) from the sum of squared residuals. - Line 33 stores
**b**and**V**into**e(b)**and**e(V)**, respectively. - Line 34 stores the name of the estimation command (
**myregress1**) in**e(cmd)**. - Line 35 produces a standard Stata output table from the results in
**e(b)**and**e(V)**.

**myregress1** contains code to help illustrate how it works, and it uses hard-coded names for global objects like Stata variables and Stata matrices. Users do not want to see the output from the illustration lines, so they must be removed. Users do not want their global Stata matrices overwritten by a command they use, which is what **myregress1** would do to a matrix named **zpz**, **xpx**, **xpxi**, **b**, or **V**.

The code in **myregress2** fixes these problems.

*! version 2.0.0 26Oct2015 program define myregress2, eclass version 14 syntax varlist gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' generate double `res' = (`depvar' - `xbhat') generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress2" ereturn display end

- Line 8 uses
**tempname**to put safe names into the local macros**zpz**,**xpx**,**xpy**,**xpxi**,**b**, and**V**. - Line 9 uses
**tempvar**to put safe names into the local macros**xbhat**,**res**,**res2**. - Lines 11, 14–18, and 25–26 use the safe names in the local macros created by
**tempname**instead of the hard-coded names for the matrices. - Lines 18–20 use the safe names in the local macros created by
**tempvar**instead of the hard-coded names for the variables it creates.

The output below shows the output produced by **myregress2**.

**Example 7: myregress2 output**

. myregress2 price mpg trunk ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -348.7241 -91.6057 trunk | 43.55851 88.71884 0.49 0.623 -130.3272 217.4442 _cons | 10254.95 2349.084 4.37 0.000 5650.83 14859.07 ------------------------------------------------------------------------------

**Done and undone**

After reviewing some tricks with local macros that I use in most of the ado-files that I write, I discussed two versions of an ado-command that implements the (**OLS**) estimator. In the next post, I extend this command so that the user may request a robust **VCE**, or that the constant term be suppressed, or both.