## Programming an estimation command in Stata: A better OLS command

I use the **syntax** command to improve the command that implements the ordinary least-squares (**OLS**) estimator that I discussed in Programming an estimation command in Stata: A first command for OLS. I show how to require that all variables be numeric variables and how to make the command accept time-series operated variables.

This is the seventh post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Stata syntax and the syntax command**

The **myregress2** command described in Programming an estimation command in Stata: A first command for OLS has the syntax

**myregress2** *depvar* [*indepvars*]

This syntax requires that the dependent variable be specified because *depvar* is not enclosed in square brackets. The independent variables are optional because *indepvars* is enclosed in square brackets. Type

for an introduction to reading Stata syntax diagrams.

This syntax is implemented by the **syntax** command in line 5 of **myregress2.ado**, which I discussed at length in Programming an estimation command in Stata: A first command for OLS. The user must specify a list of variable names because **varlist** is not enclosed in square brackets. The syntax of the **syntax** command follows the rules of a syntax diagram.

**Code block 1: myregress2.ado**

*! version 2.0.0 26Oct2015 program define myregress2, eclass version 14 syntax varlist gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress2" ereturn display end

Example 1 illustrates that **myregress2** runs the requested regression when I specify a varlist.

**Example 1: myregress2 with specified variables**

. sysuse auto (1978 Automobile Data) . myregress2 price mpg trunk ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -348.7241 -91.6057 trunk | 43.55851 88.71884 0.49 0.623 -130.3272 217.4442 _cons | 10254.95 2349.084 4.37 0.000 5650.83 14859.07 ------------------------------------------------------------------------------

Example 2 illustrates that the **syntax** command displays an error message and stops execution when I do not specify a varlist. I use **set trace on** to see each line of code and the output it produces.

**Example 2: myregress2 without a varlist**

. set trace on . myregress2 --------------------------------------------------------- begin myregress2 -- - version 14 - syntax varlist varlist required ----------------------------------------------------------- end myregress2 -- r(100);

Example 3 illustrates that the **syntax** command is checking that the specified variables are in the current dataset. **syntax** throws an error because **DoesNotExist** is not a variable in the current dataset.

**Example 3: myregress2 with a variable not in this dataset**

. set trace on . myregress2 price mpg trunk DoesNotExist --------------------------------------------------------- begin myregress2 -- - version 14 - syntax varlist variable DoesNotExist not found ----------------------------------------------------------- end myregress2 -- r(111); end of do-file r(111);

Because the **syntax** command on line 5 is not restricting the specified variables to be numeric, I get the **no observations** error in example 4 instead of an error indicating the actual problem, which is the string variable **make**.

**Example 4: myregress2 with a string variable**

. describe make storage display value variable name type format label variable label ------------------------------------------------------------------------------- make str18 %-18s Make and Model . myregress2 price mpg trunk make no observations r(2000); end of do-file r(2000);

On line 5 of **myregress3**, I modify **varlist** to only accept numeric variables This change produces a more informative error message when I try to include a string variable in the regression.

**Code block 2: myregress3.ado**

*! version 3.0.0 30Oct2015 program define myregress3, eclass version 14 syntax varlist(numeric) gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress3" ereturn display end

**Example 5: myregress3 with a string variable**

. set trace on . myregress3 price mpg trunk make --------------------------------------------------------- begin myregress3 -- - version 14 - syntax varlist(numeric) string variables not allowed in varlist; make is a string variable ----------------------------------------------------------- end myregress3 -- r(109); end of do-file r(109);

On line 5 of **myregress4**, I modify the **varlist** to accept time-series (**ts**) variables. The **syntax** command puts time-series variables in a canonical form that is stored in the local macro **varlist**, as illustrated in the display on line 6, whose output appears in example 6.

**Code block 3: myregress4.ado**

*! version 4.0.0 31Oct2015 program define myregress4, eclass version 14 syntax varlist(numeric ts) display "varlist is `varlist'" gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' local p : word count `varlist' local p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' quietly generate double `res' = (`depvar' - `xbhat') quietly generate double `res2' = (`res')^2 quietly summarize `res2' local N = r(N) local sum = r(sum) local s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn post `b' `V' ereturn local cmd "myregress4" ereturn display end

**Example 6: myregress4 with time-series variables**

. sysuse gnp96 . myregress4 L(0/3).gnp varlist is gnp96 L.gnp96 L2.gnp96 L3.gnp96 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gnp96 | L1. | 1.277086 .0860652 14.84 0.000 1.108402 1.445771 L2. | -.135549 .1407719 -0.96 0.336 -.4114568 .1403588 L3. | -.1368326 .0871645 -1.57 0.116 -.3076719 .0340067 | _cons | -2.94825 14.36785 -0.21 0.837 -31.10871 25.21221 ------------------------------------------------------------------------------

**Done and undone**

I used the **syntax** command to improve how **myregress2** handles the variables specified by the user. I showed how to require that all variables be numeric variables and how to make the command accept time-series operated variables. In the next post, I show how to make the command allow for sample restrictions, how to handle missing values, how to allow for factor-operated variables, and how to deal with perfectly collinear variables.