Home > Programming > Programming an estimation command in Stata: A first command for OLS

Programming an estimation command in Stata: A first command for OLS

\(
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\xb}{{\bf x}}
\newcommand{\yb}{{\bf y}}
\newcommand{\Xb}{{\bf X}}
\)I show how to write a Stata estimation command that implements the ordinary least-squares (OLS) estimator by explaining the code. I use concepts that I introduced in previous #StataProgramming posts. In particular, I build on Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects, in which I recalled the OLS formulas and showed how to compute them using Stata matrix commands and functions and on
Programming an estimation command in Stata: A first ado command, in which I introduced some ado-programming concepts. Although I introduce some local macro tricks that I use all the time, I also build on Programing an estimation command in Stata: Where to store your stuff.

This is the sixth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Local macro tricks

I use lots of local macro tricks in my ado-files. In this section, I illustrate ones that I use in the commands that I develop in this post. In every ado-file that I write, I ask questions about lists of variable names stored in local macros. I frequently use the extended macro functions and the gettoken command to ask these questions and store the results in a local macro.

The syntax for storing the result of an extended macro function in a local macro is

local localname : extended_fcn

Below, I use the extended macro function word count to count the number of elements in the list and store the result in the local macro count.

Example 1: Storing and extracting the result of an extended macro function

. local count : word count a b c

. display "count contains `count'"
count contains 3

There are many extended macro functions, but I illustrate just the one I use in this post; type help extended fcn for a complete list.

A token is an element in a list. I frequently use the gettoken command to split lists apart. The gettoken command has the syntax

gettoken localname1 [localname2] : localname3

gettoken stores the first token in the list stored in the local macro localname3 into the local macro localname1. If the optional localname2 is specified, the remaining tokens are stored in the local macro localname2.

I use gettoken to store the first token stored in mylist into the local macro first, whose contents I subsequently extract and display.

Example 2: Using gettoken to store first token only

. local mylist y x1 x2

. display "mylist contains `mylist'"
mylist contains y x1 x2

. gettoken first : mylist

. display "first contains `first'"
first contains y

Now, I use gettoken to store the first token stored in mylist into the local macro first and the remaining tokens into the local macro left. I subsequently extract and display the contents of first and left.

Example 3: Using gettoken to store first and remaining tokens

. gettoken first left: mylist

. display "first contains `first'"
first contains y

. display "left  contains `left'"
left  contains  x1 x2

I frequently want to increase the value of a local macro by some fixed amount, say, \(3\). I now illustrate a solution that I use.

Example 4: Local macro update

. local p = 1

. local p = `p' + 3

. display "p is now `p'"
p is now 4

When the update value, also known as the increment value, is \(1\), we can use the increment operator, as below:

Example 5: Local macro update

. local p = 1

. local ++p

. display "p is now `p'"
p is now 2

A first version of myregress

The code in myregress1 implements a version of the OLS formulas. I use myregress1 in example 6. Below example 6, I discuss the code and the output.

Code block 1: myregress1.ado

*! version 1.0.0  23Oct2015
program define myregress1, eclass
	version 14

	syntax varlist
	display "The syntax command puts the variables specified by the "
	display "    user into the local macro varlist"
	display "    varlist contains `varlist'"

	gettoken depvar : varlist
	display "The dependent variable is `depvar'"

	matrix accum zpz = `varlist'
	display "matrix accum forms Z'Z"
	matrix list zpz

	local p : word count `varlist'
	local p = `p' + 1

	matrix xpx                = zpz[2..`p', 2..`p']
	matrix xpy                = zpz[2..`p', 1]
	matrix xpxi               = syminv(xpx)
	matrix b                  = (xpxi*xpy)'

	matrix score double xbhat = b
	generate double res       = (`depvar' - xbhat)
	generate double res2      = res^2
	summarize res2
	local N                   = r(N)
	local sum                 = r(sum)
	local s2                  = `sum'/(`N'-(`p'-1))
	matrix V                  = `s2'*xpxi
	ereturn post b V
	ereturn local         cmd   "myregress1"
	ereturn display
end

Example 6: myregress1 output

. sysuse auto
(1978 Automobile Data)

. myregress1 price mpg trunk
The syntax command puts the variables specified by the 
    user into the local macro varlist
    varlist contains price mpg trunk
The dependent variable is price
(obs=74)
matrix accum forms Z'Z

symmetric zpz[4,4]
           price        mpg      trunk      _cons
price  3.448e+09
  mpg    9132716      36008
trunk    6565725      20630      15340
_cons     456229       1576       1018         74

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        res2 |         74     6674851    1.30e+07   11.24372   9.43e+07
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -348.7241    -91.6057
       trunk |   43.55851   88.71884     0.49   0.623    -130.3272    217.4442
       _cons |   10254.95   2349.084     4.37   0.000      5650.83    14859.07
------------------------------------------------------------------------------

Here are my comments on the code and the output in example 6.

  • Line 2 specifies that myregress1 is an e-class command that stores its results in e().
  • Lines 5–8 illustrate that the syntax command stores the names of the variables specified by the user in the local macro varlist. This behavior is also illustrated in example 6.
  • Line 10 uses the gettoken command to store the first variable name stored in the local macro varlist in the local macro depvar. Line 11 displays this name and the usage is illustrated in example 6.
  • Line 13 uses matrix accum to put \((\Xb’\Xb)\) and \((\Xb’\yb)\) into a Stata matrix named zpz, as discussed in Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects and further illustrated in lines 14–15 and example 6.
  • Line 17 stores the number of variables in the local macro varlist into the local macro p.
  • Line 18 increments the local macro p by \(1\) to account for the constant term included by matrix accum by default.
  • Lines 20–23 extract \((\Xb’\Xb)\) and \((\Xb’\yb)\) from zpz and put the vector of point estimates \(\widehat{\betab}\) into the Stata row vector b.
  • Line 25 puts \(\Xb\widehat{\betab}\) into the variable xbhat.
  • Lines 26 and 27 calculate the residuals and the squared residuals, respectively.
  • Lines 28–32 calculate the estimated variance-covariance matrix of the estimator (VCE) from the sum of squared residuals.
  • Line 33 stores b and V into e(b) and e(V), respectively.
  • Line 34 stores the name of the estimation command (myregress1) in e(cmd).
  • Line 35 produces a standard Stata output table from the results in e(b) and e(V).

myregress1 contains code to help illustrate how it works, and it uses hard-coded names for global objects like Stata variables and Stata matrices. Users do not want to see the output from the illustration lines, so they must be removed. Users do not want their global Stata matrices overwritten by a command they use, which is what myregress1 would do to a matrix named zpz, xpx, xpxi, b, or V.

The code in myregress2 fixes these problems.

Code block 2: myregress2.ado

*! version 2.0.0  26Oct2015
program define myregress2, eclass
	version 14

	syntax varlist
	gettoken depvar : varlist

	tempname zpz xpx xpy xpxi b V
	tempvar  xbhat res res2 

	quietly matrix accum `zpz' = `varlist'
	local p : word count `varlist'
	local p = `p' + 1
	matrix `xpx'                = `zpz'[2..`p', 2..`p']
	matrix `xpy'                = `zpz'[2..`p', 1]
	matrix `xpxi'               = syminv(`xpx')
	matrix `b'                  = (`xpxi'*`xpy')'
	quietly matrix score double `xbhat' = `b'
	generate double `res'       = (`depvar' - `xbhat')
	generate double `res2'      = (`res')^2
	quietly summarize `res2'
	local N                     = r(N)
	local sum                   = r(sum)
	local s2                    = `sum'/(`N'-(`p'-1))
	matrix `V'                  = `s2'*`xpxi'
	ereturn post `b' `V'
	ereturn local         cmd   "myregress2"
	ereturn display
end
  • Line 8 uses tempname to put safe names into the local macros zpz, xpx, xpy, xpxi, b, and V.
  • Line 9 uses tempvar to put safe names into the local macros xbhat, res, res2.
  • Lines 11, 14–18, and 25–26 use the safe names in the local macros created by tempname instead of the hard-coded names for the matrices.
  • Lines 18–20 use the safe names in the local macros created by tempvar instead of the hard-coded names for the variables it creates.

The output below shows the output produced by myregress2.

Example 7: myregress2 output

. myregress2 price mpg trunk
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -348.7241    -91.6057
       trunk |   43.55851   88.71884     0.49   0.623    -130.3272    217.4442
       _cons |   10254.95   2349.084     4.37   0.000      5650.83    14859.07
------------------------------------------------------------------------------

Done and undone

After reviewing some tricks with local macros that I use in most of the ado-files that I write, I discussed two versions of an ado-command that implements the (OLS) estimator. In the next post, I extend this command so that the user may request a robust VCE, or that the constant term be suppressed, or both.