Programming an estimation command in Stata: A first command for OLS
\(
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\xb}{{\bf x}}
\newcommand{\yb}{{\bf y}}
\newcommand{\Xb}{{\bf X}}
\)I show how to write a Stata estimation command that implements the ordinary least-squares (OLS) estimator by explaining the code. I use concepts that I introduced in previous #StataProgramming posts. In particular, I build on Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects, in which I recalled the OLS formulas and showed how to compute them using Stata matrix commands and functions and on
Programming an estimation command in Stata: A first ado command, in which I introduced some ado-programming concepts. Although I introduce some local macro tricks that I use all the time, I also build on Programing an estimation command in Stata: Where to store your stuff.
This is the sixth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
Local macro tricks
I use lots of local macro tricks in my ado-files. In this section, I illustrate ones that I use in the commands that I develop in this post. In every ado-file that I write, I ask questions about lists of variable names stored in local macros. I frequently use the extended macro functions and the gettoken command to ask these questions and store the results in a local macro.
The syntax for storing the result of an extended macro function in a local macro is
local localname : extended_fcn
Below, I use the extended macro function word count to count the number of elements in the list and store the result in the local macro count.
Example 1: Storing and extracting the result of an extended macro function
. local count : word count a b c . display "count contains `count'" count contains 3
There are many extended macro functions, but I illustrate just the one I use in this post; type help extended fcn for a complete list.
A token is an element in a list. I frequently use the gettoken command to split lists apart. The gettoken command has the syntax
gettoken localname1 [localname2] : localname3
gettoken stores the first token in the list stored in the local macro localname3 into the local macro localname1. If the optional localname2 is specified, the remaining tokens are stored in the local macro localname2.
I use gettoken to store the first token stored in mylist into the local macro first, whose contents I subsequently extract and display.
Example 2: Using gettoken to store first token only
. local mylist y x1 x2 . display "mylist contains `mylist'" mylist contains y x1 x2 . gettoken first : mylist . display "first contains `first'" first contains y
Now, I use gettoken to store the first token stored in mylist into the local macro first and the remaining tokens into the local macro left. I subsequently extract and display the contents of first and left.
Example 3: Using gettoken to store first and remaining tokens
. gettoken first left: mylist . display "first contains `first'" first contains y . display "left contains `left'" left contains x1 x2
I frequently want to increase the value of a local macro by some fixed amount, say, \(3\). I now illustrate a solution that I use.
Example 4: Local macro update
. local p = 1 . local p = `p' + 3 . display "p is now `p'" p is now 4
When the update value, also known as the increment value, is \(1\), we can use the increment operator, as below:
Example 5: Local macro update
. local p = 1 . local ++p . display "p is now `p'" p is now 2
A first version of myregress
The code in myregress1 implements a version of the OLS formulas. I use myregress1 in example 6. Below example 6, I discuss the code and the output.
*! version 1.0.0 23Oct2015 program define myregress1, eclass version 14 syntax varlist display "The syntax command puts the variables specified by the " display " user into the local macro varlist" display " varlist contains `varlist'" gettoken depvar : varlist display "The dependent variable is `depvar'" matrix accum zpz = `varlist' display "matrix accum forms Z'Z" matrix list zpz local p : word count `varlist' local p = `p' + 1 matrix xpx = zpz[2..`p', 2..`p'] matrix xpy = zpz[2..`p', 1] matrix xpxi = syminv(xpx) matrix b = (xpxi*xpy)' matrix score double xbhat = b generate double res = (`depvar' - xbhat) generate double res2 = res^2 summarize res2 local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix V = `s2'*xpxi ereturn post b V ereturn local cmd "myregress1" ereturn display end
Example 6: myregress1 output
. sysuse auto (1978 Automobile Data) . myregress1 price mpg trunk The syntax command puts the variables specified by the user into the local macro varlist varlist contains price mpg trunk The dependent variable is price (obs=74) matrix accum forms Z'Z symmetric zpz[4,4] price mpg trunk _cons price 3.448e+09 mpg 9132716 36008 trunk 6565725 20630 15340 _cons 456229 1576 1018 74 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- res2 | 74 6674851 1.30e+07 11.24372 9.43e+07 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -348.7241 -91.6057 trunk | 43.55851 88.71884 0.49 0.623 -130.3272 217.4442 _cons | 10254.95 2349.084 4.37 0.000 5650.83 14859.07 ------------------------------------------------------------------------------
Here are my comments on the code and the output in example 6.
- Line 2 specifies that myregress1 is an e-class command that stores its results in e().
- Lines 5–8 illustrate that the syntax command stores the names of the variables specified by the user in the local macro varlist. This behavior is also illustrated in example 6.
- Line 10 uses the gettoken command to store the first variable name stored in the local macro varlist in the local macro depvar. Line 11 displays this name and the usage is illustrated in example 6.
- Line 13 uses matrix accum to put \((\Xb’\Xb)\) and \((\Xb’\yb)\) into a Stata matrix named zpz, as discussed in Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects and further illustrated in lines 14–15 and example 6.
- Line 17 stores the number of variables in the local macro varlist into the local macro p.
- Line 18 increments the local macro p by \(1\) to account for the constant term included by matrix accum by default.
- Lines 20–23 extract \((\Xb’\Xb)\) and \((\Xb’\yb)\) from zpz and put the vector of point estimates \(\widehat{\betab}\) into the Stata row vector b.
- Line 25 puts \(\Xb\widehat{\betab}\) into the variable xbhat.
- Lines 26 and 27 calculate the residuals and the squared residuals, respectively.
- Lines 28–32 calculate the estimated variance-covariance matrix of the estimator (VCE) from the sum of squared residuals.
- Line 33 stores b and V into e(b) and e(V), respectively.
- Line 34 stores the name of the estimation command (myregress1) in e(cmd).
- Line 35 produces a standard Stata output table from the results in e(b) and e(V).
myregress1 contains code to help illustrate how it works, and it uses hard-coded names for global objects like Stata variables and Stata matrices. Users do not want to see the output from the illustration lines, so they must be removed. Users do not want their global Stata matrices overwritten by a command they use, which is what myregress1 would do to a matrix named zpz, xpx, xpxi, b, or V.
The code in myregress2 fixes these problems.
*! version 2.0.0 26Oct2015 program define myregress2, eclass version 14 syntax varlist gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' generate double `res' = (`depvar' - `xbhat') generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress2" ereturn display end
- Line 8 uses tempname to put safe names into the local macros zpz, xpx, xpy, xpxi, b, and V.
- Line 9 uses tempvar to put safe names into the local macros xbhat, res, res2.
- Lines 11, 14–18, and 25–26 use the safe names in the local macros created by tempname instead of the hard-coded names for the matrices.
- Lines 18–20 use the safe names in the local macros created by tempvar instead of the hard-coded names for the variables it creates.
The output below shows the output produced by myregress2.
Example 7: myregress2 output
. myregress2 price mpg trunk ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -348.7241 -91.6057 trunk | 43.55851 88.71884 0.49 0.623 -130.3272 217.4442 _cons | 10254.95 2349.084 4.37 0.000 5650.83 14859.07 ------------------------------------------------------------------------------
Done and undone
After reviewing some tricks with local macros that I use in most of the ado-files that I write, I discussed two versions of an ado-command that implements the (OLS) estimator. In the next post, I extend this command so that the user may request a robust VCE, or that the constant term be suppressed, or both.