Programming an estimation command in Stata: A better OLS command
I use the syntax command to improve the command that implements the ordinary least-squares (OLS) estimator that I discussed in Programming an estimation command in Stata: A first command for OLS. I show how to require that all variables be numeric variables and how to make the command accept time-series operated variables.
This is the seventh post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
Stata syntax and the syntax command
The myregress2 command described in Programming an estimation command in Stata: A first command for OLS has the syntax
myregress2 depvar [indepvars]
This syntax requires that the dependent variable be specified because depvar is not enclosed in square brackets. The independent variables are optional because indepvars is enclosed in square brackets. Type
for an introduction to reading Stata syntax diagrams.
This syntax is implemented by the syntax command in line 5 of myregress2.ado, which I discussed at length in Programming an estimation command in Stata: A first command for OLS. The user must specify a list of variable names because varlist is not enclosed in square brackets. The syntax of the syntax command follows the rules of a syntax diagram.
*! version 2.0.0 26Oct2015 program define myregress2, eclass version 14 syntax varlist gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress2" ereturn display end
Example 1 illustrates that myregress2 runs the requested regression when I specify a varlist.
Example 1: myregress2 with specified variables
. sysuse auto (1978 Automobile Data) . myregress2 price mpg trunk ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -348.7241 -91.6057 trunk | 43.55851 88.71884 0.49 0.623 -130.3272 217.4442 _cons | 10254.95 2349.084 4.37 0.000 5650.83 14859.07 ------------------------------------------------------------------------------
Example 2 illustrates that the syntax command displays an error message and stops execution when I do not specify a varlist. I use set trace on to see each line of code and the output it produces.
Example 2: myregress2 without a varlist
. set trace on . myregress2 --------------------------------------------------------- begin myregress2 -- - version 14 - syntax varlist varlist required ----------------------------------------------------------- end myregress2 -- r(100);
Example 3 illustrates that the syntax command is checking that the specified variables are in the current dataset. syntax throws an error because DoesNotExist is not a variable in the current dataset.
Example 3: myregress2 with a variable not in this dataset
. set trace on . myregress2 price mpg trunk DoesNotExist --------------------------------------------------------- begin myregress2 -- - version 14 - syntax varlist variable DoesNotExist not found ----------------------------------------------------------- end myregress2 -- r(111); end of do-file r(111);
Because the syntax command on line 5 is not restricting the specified variables to be numeric, I get the no observations error in example 4 instead of an error indicating the actual problem, which is the string variable make.
Example 4: myregress2 with a string variable
. describe make storage display value variable name type format label variable label ------------------------------------------------------------------------------- make str18 %-18s Make and Model . myregress2 price mpg trunk make no observations r(2000); end of do-file r(2000);
On line 5 of myregress3, I modify varlist to only accept numeric variables This change produces a more informative error message when I try to include a string variable in the regression.
*! version 3.0.0 30Oct2015 program define myregress3, eclass version 14 syntax varlist(numeric) gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress3" ereturn display end
Example 5: myregress3 with a string variable
. set trace on . myregress3 price mpg trunk make --------------------------------------------------------- begin myregress3 -- - version 14 - syntax varlist(numeric) string variables not allowed in varlist; make is a string variable ----------------------------------------------------------- end myregress3 -- r(109); end of do-file r(109);
On line 5 of myregress4, I modify the varlist to accept time-series (ts) variables. The syntax command puts time-series variables in a canonical form that is stored in the local macro varlist, as illustrated in the display on line 6, whose output appears in example 6.
*! version 4.0.0 31Oct2015 program define myregress4, eclass version 14 syntax varlist(numeric ts) display "varlist is `varlist'" gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress4" ereturn display end
Example 6: myregress4 with time-series variables
. sysuse gnp96 . myregress4 L(0/3).gnp varlist is gnp96 L.gnp96 L2.gnp96 L3.gnp96 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gnp96 | L1. | 1.277086 .0860652 14.84 0.000 1.108402 1.445771 L2. | -.135549 .1407719 -0.96 0.336 -.4114568 .1403588 L3. | -.1368326 .0871645 -1.57 0.116 -.3076719 .0340067 | _cons | -2.94825 14.36785 -0.21 0.837 -31.10871 25.21221 ------------------------------------------------------------------------------
Done and undone
I used the syntax command to improve how myregress2 handles the variables specified by the user. I showed how to require that all variables be numeric variables and how to make the command accept time-series operated variables. In the next post, I show how to make the command allow for sample restrictions, how to handle missing values, how to allow for factor-operated variables, and how to deal with perfectly collinear variables.