Programming an estimation command in Stata: A better OLS command
I use the syntax command to improve the command that implements the ordinary least-squares (OLS) estimator that I discussed in Programming an estimation command in Stata: A first command for OLS. I show how to require that all variables be numeric variables and how to make the command accept time-series operated variables.
This is the seventh post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
Stata syntax and the syntax command
The myregress2 command described in Programming an estimation command in Stata: A first command for OLS has the syntax
myregress2 depvar [indepvars]
This syntax requires that the dependent variable be specified because depvar is not enclosed in square brackets. The independent variables are optional because indepvars is enclosed in square brackets. Type
for an introduction to reading Stata syntax diagrams.
This syntax is implemented by the syntax command in line 5 of myregress2.ado, which I discussed at length in Programming an estimation command in Stata: A first command for OLS. The user must specify a list of variable names because varlist is not enclosed in square brackets. The syntax of the syntax command follows the rules of a syntax diagram.
*! version 2.0.0 26Oct2015 program define myregress2, eclass version 14 syntax varlist gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress2" ereturn display end
Example 1 illustrates that myregress2 runs the requested regression when I specify a varlist.
Example 1: myregress2 with specified variables
. sysuse auto
(1978 Automobile Data)
. myregress2 price mpg trunk
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -348.7241    -91.6057
       trunk |   43.55851   88.71884     0.49   0.623    -130.3272    217.4442
       _cons |   10254.95   2349.084     4.37   0.000      5650.83    14859.07
------------------------------------------------------------------------------
Example 2 illustrates that the syntax command displays an error message and stops execution when I do not specify a varlist. I use set trace on to see each line of code and the output it produces.
Example 2: myregress2 without a varlist
. set trace on . myregress2 --------------------------------------------------------- begin myregress2 -- - version 14 - syntax varlist varlist required ----------------------------------------------------------- end myregress2 -- r(100);
Example 3 illustrates that the syntax command is checking that the specified variables are in the current dataset. syntax throws an error because DoesNotExist is not a variable in the current dataset.
Example 3: myregress2 with a variable not in this dataset
. set trace on . myregress2 price mpg trunk DoesNotExist --------------------------------------------------------- begin myregress2 -- - version 14 - syntax varlist variable DoesNotExist not found ----------------------------------------------------------- end myregress2 -- r(111); end of do-file r(111);
Because the syntax command on line 5 is not restricting the specified variables to be numeric, I get the no observations error in example 4 instead of an error indicating the actual problem, which is the string variable make.
Example 4: myregress2 with a string variable
. describe make
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and Model
. myregress2 price mpg trunk make
no observations
r(2000);
end of do-file
r(2000);
On line 5 of myregress3, I modify varlist to only accept numeric variables This change produces a more informative error message when I try to include a string variable in the regression.
*! version 3.0.0 30Oct2015 program define myregress3, eclass version 14 syntax varlist(numeric) gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress3" ereturn display end
Example 5: myregress3 with a string variable
. set trace on . myregress3 price mpg trunk make --------------------------------------------------------- begin myregress3 -- - version 14 - syntax varlist(numeric) string variables not allowed in varlist; make is a string variable ----------------------------------------------------------- end myregress3 -- r(109); end of do-file r(109);
On line 5 of myregress4, I modify the varlist to accept time-series (ts) variables. The syntax command puts time-series variables in a canonical form that is stored in the local macro varlist, as illustrated in the display on line 6, whose output appears in example 6.
*! version 4.0.0 31Oct2015 program define myregress4, eclass version 14 syntax varlist(numeric ts) display "varlist is `varlist'" gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress4" ereturn display end
Example 6: myregress4 with time-series variables
. sysuse gnp96
. myregress4  L(0/3).gnp 
varlist is gnp96 L.gnp96 L2.gnp96 L3.gnp96
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       gnp96 |
         L1. |   1.277086   .0860652    14.84   0.000     1.108402    1.445771
         L2. |   -.135549   .1407719    -0.96   0.336    -.4114568    .1403588
         L3. |  -.1368326   .0871645    -1.57   0.116    -.3076719    .0340067
             |
       _cons |   -2.94825   14.36785    -0.21   0.837    -31.10871    25.21221
------------------------------------------------------------------------------
Done and undone
I used the syntax command to improve how myregress2 handles the variables specified by the user. I showed how to require that all variables be numeric variables and how to make the command accept time-series operated variables. In the next post, I show how to make the command allow for sample restrictions, how to handle missing values, how to allow for factor-operated variables, and how to deal with perfectly collinear variables.