Home > Programming > Programming an estimation command in Stata: A better OLS command

## Programming an estimation command in Stata: A better OLS command

I use the syntax command to improve the command that implements the ordinary least-squares (OLS) estimator that I discussed in Programming an estimation command in Stata: A first command for OLS. I show how to require that all variables be numeric variables and how to make the command accept time-series operated variables.

This is the seventh post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Stata syntax and the syntax command

The myregress2 command described in Programming an estimation command in Stata: A first command for OLS has the syntax

myregress2 depvar [indepvars]

This syntax requires that the dependent variable be specified because depvar is not enclosed in square brackets. The independent variables are optional because indepvars is enclosed in square brackets. Type

for an introduction to reading Stata syntax diagrams.

This syntax is implemented by the syntax command in line 5 of myregress2.ado, which I discussed at length in Programming an estimation command in Stata: A first command for OLS. The user must specify a list of variable names because varlist is not enclosed in square brackets. The syntax of the syntax command follows the rules of a syntax diagram.

*! version 2.0.0  26Oct2015
program define myregress2, eclass
version 14

syntax varlist
gettoken depvar : varlist

tempname zpz xpx xpy xpxi b V
tempvar  xbhat res res2

quietly matrix accum zpz' = varlist'
local p : word count varlist'
local p = p' + 1
matrix xpx'                = zpz'[2..p', 2..p']
matrix xpy'                = zpz'[2..p', 1]
matrix xpxi'               = syminv(xpx')
matrix b'                  = (xpxi'*xpy')'
quietly matrix score double xbhat' = b'
quietly generate double res'       = (depvar' - xbhat')
quietly generate double res2'      = (res')^2
quietly summarize res2'
local N                     = r(N)
local sum                   = r(sum)
local s2                    = sum'/(N'-(p'-1))
matrix V'                  = s2'*xpxi'
ereturn post b' V'
ereturn local         cmd   "myregress2"
ereturn display
end


Example 1 illustrates that myregress2 runs the requested regression when I specify a varlist.

Example 1: myregress2 with specified variables

. sysuse auto
(1978 Automobile Data)

. myregress2 price mpg trunk
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |  -220.1649   65.59262    -3.36   0.001    -348.7241    -91.6057
trunk |   43.55851   88.71884     0.49   0.623    -130.3272    217.4442
_cons |   10254.95   2349.084     4.37   0.000      5650.83    14859.07
------------------------------------------------------------------------------


Example 2 illustrates that the syntax command displays an error message and stops execution when I do not specify a varlist. I use set trace on to see each line of code and the output it produces.

Example 2: myregress2 without a varlist

. set trace on

. myregress2
--------------------------------------------------------- begin myregress2 --
- version 14
- syntax varlist
varlist required
----------------------------------------------------------- end myregress2 --
r(100);


Example 3 illustrates that the syntax command is checking that the specified variables are in the current dataset. syntax throws an error because DoesNotExist is not a variable in the current dataset.

Example 3: myregress2 with a variable not in this dataset

. set trace on

. myregress2 price mpg trunk DoesNotExist
--------------------------------------------------------- begin myregress2 --
- version 14
- syntax varlist
----------------------------------------------------------- end myregress2 --
r(111);

end of do-file

r(111);


Because the syntax command on line 5 is not restricting the specified variables to be numeric, I get the no observations error in example 4 instead of an error indicating the actual problem, which is the string variable make.

Example 4: myregress2 with a string variable

. describe make

storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and Model

. myregress2 price mpg trunk make
no observations
r(2000);

end of do-file

r(2000);


On line 5 of myregress3, I modify varlist to only accept numeric variables This change produces a more informative error message when I try to include a string variable in the regression.

*! version 3.0.0  30Oct2015
program define myregress3, eclass
version 14

syntax varlist(numeric)
gettoken depvar : varlist

tempname zpz xpx xpy xpxi b V
tempvar  xbhat res res2

quietly matrix accum zpz' = varlist'
local p : word count varlist'
local p = p' + 1
matrix xpx'                = zpz'[2..p', 2..p']
matrix xpy'                = zpz'[2..p', 1]
matrix xpxi'               = syminv(xpx')
matrix b'                  = (xpxi'*xpy')'
quietly matrix score double xbhat' = b'
quietly generate double res'       = (depvar' - xbhat')
quietly generate double res2'      = (res')^2
quietly summarize res2'
local N                     = r(N)
local sum                   = r(sum)
local s2                    = sum'/(N'-(p'-1))
matrix V'                  = s2'*xpxi'
ereturn post b' V'
ereturn local         cmd   "myregress3"
ereturn display
end


Example 5: myregress3 with a string variable

. set trace on

. myregress3 price mpg trunk make
--------------------------------------------------------- begin myregress3 --
- version 14
- syntax varlist(numeric)
string variables not allowed in varlist;
make is a string variable
----------------------------------------------------------- end myregress3 --
r(109);

end of do-file

r(109);


On line 5 of myregress4, I modify the varlist to accept time-series (ts) variables. The syntax command puts time-series variables in a canonical form that is stored in the local macro varlist, as illustrated in the display on line 6, whose output appears in example 6.

*! version 4.0.0  31Oct2015
program define myregress4, eclass
version 14

syntax varlist(numeric ts)
display "varlist is varlist'"
gettoken depvar : varlist

tempname zpz xpx xpy xpxi b V
tempvar  xbhat res res2

quietly matrix accum zpz' = varlist'
local p : word count varlist'
local p = p' + 1
matrix xpx'                = zpz'[2..p', 2..p']
matrix xpy'                = zpz'[2..p', 1]
matrix xpxi'               = syminv(xpx')
matrix b'                  = (xpxi'*xpy')'
quietly matrix score double xbhat' = b'
quietly generate double res'       = (depvar' - xbhat')
quietly generate double res2'      = (res')^2
quietly summarize res2'
local N                     = r(N)
local sum                   = r(sum)
local s2                    = sum'/(N'-(p'-1))
matrix V'                  = s2'*xpxi'
ereturn post b' V'
ereturn local         cmd   "myregress4"
ereturn display
end


Example 6: myregress4 with time-series variables

. sysuse gnp96

. myregress4  L(0/3).gnp
varlist is gnp96 L.gnp96 L2.gnp96 L3.gnp96
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
gnp96 |
L1. |   1.277086   .0860652    14.84   0.000     1.108402    1.445771
L2. |   -.135549   .1407719    -0.96   0.336    -.4114568    .1403588
L3. |  -.1368326   .0871645    -1.57   0.116    -.3076719    .0340067
|
_cons |   -2.94825   14.36785    -0.21   0.837    -31.10871    25.21221
------------------------------------------------------------------------------
`

Done and undone

I used the syntax command to improve how myregress2 handles the variables specified by the user. I showed how to require that all variables be numeric variables and how to make the command accept time-series operated variables. In the next post, I show how to make the command allow for sample restrictions, how to handle missing values, how to allow for factor-operated variables, and how to deal with perfectly collinear variables.

Categories: Programming Tags: