## Programming an estimation command in Stata: Allowing for options

I make three improvements to the command that implements the ordinary least-squares (**OLS**) estimator that I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables. First, I allow the user to request a robust estimator of the variance-covariance of the estimator (**VCE**). Second, I allow the user to suppress the constant term. Third, I store the residual degrees of freedom in **e(df_r)** so that **test** will use the \(t\) or \(F\) distribution instead of the normal or \(\chi^2\) distribution to compute the \(p\)-value of Wald tests.

**Allowing for robust standard errors**

The syntax of **myregress6**, which I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, is

**myregress6** *depvar* [*indepvars*] [if] [in]

where the independent variables can be time-series or factor variables. **myregress7** has the syntax

**myregress7** *depvar* [*indepvars*] [if] [in] [, __r__obust ]

By default, **myregress7** estimates the **VCE** assuming that the errors are independently and identically distributed (**IID**). If the option **robust** is specified, **myregress7** uses the robust estimator of the **VCE**. See Cameron and Trivedi (2005), Stock and Watson (2010), Wooldridge (2015) for introductions to **OLS**; see Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects for the formulas and Stata matrix implementations. Click on the file name to download any code block. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

**Code block 1: myregress7.ado**

*! version 7.0.0 30Nov2015 program define myregress7, eclass version 14 syntax varlist(numeric ts fv) [if] [in] [, Robust] marksample touse gettoken depvar indeps : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' local N = r(N) local p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') local k = `p' - diag0cnt(`xpxi') - 1 matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' if "`robust'" == "" { quietly summarize `res2' if `touse' , meanonly local sum = r(sum) local s2 = `sum'/(`N'-(`k')) matrix `V' = `s2'*`xpxi' } else { tempname M quietly matrix accum `M' = `indeps' [iweight=`res2'] if `touse' matrix `V' = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi' local vce "robust" local vcetype "Robust" } ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `k' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local cmd "myregress7" ereturn display end

A user may specify the **robust** option by typing **robust**, **robus**, **robu**, **rob**, **ro**, or **r**. In other words, **r** is the minimal abbreviation of the option **robust**. Line 5 of **myregress7** implements this syntax. Specifying **robust** is optional because **Robust** is enclosed in the square brackets. **r** is the minimal abbreviation because the **R** is in uppercase and the remaining letters are in lowercase.

If the user specifies **robust**, or a valid abbreviation thereof, the local macro **robust** contains the word “robust”; otherwise, the local macro **robust** is empty. Line 25 uses this fact to determine which **VCE** should be computed; it specifies that lines 26–31 should be executed if the local macro **robust** is empty and that lines 32-36 should otherwise be executed. Lines 26-31 compute the **IID** estimator of the **VCE**. Lines 32-34 compute the robust estimator of the **VCE**. Lines 35 and 36 respectively put “robust” and “Robust” into the local macros **vce** and **vcetype**.

Line 41 puts the contents of the local macro **vce** into the local macro **e(vce)**, which informs users and postestimation commands which **VCE** estimator was used. By convention, **e(vce)** is empty for the **IID** case. Line 42 puts the contents of the local macro **vcetype** into the local macro **e(vcetype)**, which is used by **ereturn display** to correctly label the standard errors as robust.

I now run a regression with robust standard errors.

**Example 1: myregress7 with robust standard errors**

. sysuse auto (1978 Automobile Data) . myregress7 price mpg trunk i.rep78, robust ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -262.7053 74.75538 -3.51 0.000 -409.2232 -116.1875 trunk | 41.75706 73.71523 0.57 0.571 -102.7221 186.2362 | rep78 | 2 | 654.7905 1132.425 0.58 0.563 -1564.721 2874.302 3 | 1170.606 823.9454 1.42 0.155 -444.2979 2785.509 4 | 1473.352 650.4118 2.27 0.023 198.5679 2748.135 5 | 2896.888 937.6981 3.09 0.002 1059.034 4734.743 | _cons | 9726.377 2040.335 4.77 0.000 5727.393 13725.36 ------------------------------------------------------------------------------

**Suppressing the constant term**

**myregress8** has the syntax

**myregress8** *depvar* [*indepvars*] [if] [in] [, __r__obust __nocons__tant ]

**Code block 2: myregress8.ado**

*! version 8.0.0 30Nov2015 program define myregress8, eclass version 14 syntax varlist(numeric ts fv) [if] [in] [, Robust noCONStant ] marksample touse gettoken depvar indeps : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' , `constant' local N = r(N) local p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') local k = `p' - diag0cnt(`xpxi') - 1 matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' if "`robust'" == "" { quietly summarize `res2' if `touse' , meanonly local sum = r(sum) local s2 = `sum'/(`N'-(`k')) matrix `V' = `s2'*`xpxi' } else { tempname M quietly matrix accum `M' = `indeps' [iweight=`res2'] /// if `touse' , `constant' matrix `V' = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi' local vce "robust" local vcetype "Robust" } ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `k' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local cmd "myregress8" ereturn display end

The syntax command on line 5 puts “noconstant” into the local macro **constant** if the user types **nocons**, **noconst**, **noconsta**, **noconstan**, or **noconstant**; otherwise, the local macro **constant** is empty. The minimal abbreviation of option **noconstant** is **nocons** because the lowercase **no** is followed by **CONStant**. Note that specifying the option creates the local macro **constant** because the **no** is followed by uppercase letters specifying the minimum abbreviation.

To implement the option, I specified what is contained in the local macro **constant** as an option on the **matrix accum** command on line 14 and on the **matrix accum** command spread over lines 33 and 34. The **matrix accum** command that begins on line 33 is too long for one line. I used **///** to comment out the end-of-line character and continue the command on line 34.

I now illustrate the **noconstant** option.

**Example 2: myregress8 with option noconstant**

. myregress8 price mpg trunk ibn.rep78, noconstant ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -262.7053 73.49434 -3.57 0.000 -406.7516 -118.6591 trunk | 41.75706 93.9671 0.44 0.657 -142.4151 225.9292 | rep78 | 1 | 9726.377 2790.009 3.49 0.000 4258.06 15194.69 2 | 10381.17 2607.816 3.98 0.000 5269.943 15492.39 3 | 10896.98 2555.364 4.26 0.000 5888.561 15905.4 4 | 11199.73 2588.19 4.33 0.000 6126.97 16272.49 5 | 12623.27 2855.763 4.42 0.000 7026.073 18220.46 ------------------------------------------------------------------------------

**Using t or F distributions**

The output tables reported in examples 1 and 2 use the normal distribution to compute \(p\)-values and confidence intervals, because Wald-based postestimation commmands like **test** and **ereturn display** use the normal or the \(\chi^2\) distribution unless the residual degrees of freedom are stored in **e(df_r)**.

**Code block 3: myregress9.ado**

*! version 9.0.0 30Nov2015 program define myregress9, eclass version 14 syntax varlist(numeric ts fv) [if] [in] [, Robust noCONStant ] marksample touse gettoken depvar indeps : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' , `constant' local N = r(N) local p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') local k = `p' - diag0cnt(`xpxi') - 1 matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' if "`robust'" == "" { quietly summarize `res2' if `touse' , meanonly local sum = r(sum) local s2 = `sum'/(`N'-(`k')) matrix `V' = `s2'*`xpxi' } else { tempname M quietly matrix accum `M' = `indeps' [iweight=`res2'] /// if `touse' , `constant' matrix `V' = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi' local vce "robust" local vcetype "Robust" } ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `k' ereturn scalar df_r = `N'-`k' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local cmd "myregress8" ereturn display end

Line 42 of **myregress9.ado** stores the residual degrees of freedom in **e(df_r)**. Example 3 illustrates that **ereturn display** and **test** now use the \(t\) and \(F\) distributions.

**Example 3: t or F distributions after myregress9**

. myregress9 price mpg trunk ibn.rep78, noconstant ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -262.7053 73.49434 -3.57 0.001 -409.6184 -115.7923 trunk | 41.75706 93.9671 0.44 0.658 -146.0805 229.5946 | rep78 | 1 | 9726.377 2790.009 3.49 0.001 4149.229 15303.53 2 | 10381.17 2607.816 3.98 0.000 5168.219 15594.12 3 | 10896.98 2555.364 4.26 0.000 5788.882 16005.08 4 | 11199.73 2588.19 4.33 0.000 6026.011 16373.45 5 | 12623.27 2855.763 4.42 0.000 6914.677 18331.85 ------------------------------------------------------------------------------ . test trunk ( 1) trunk = 0 F( 1, 62) = 0.20 Prob > F = 0.6583

**Done and undone**

I added an option for the robust estimator of the **VCE**, I added an option to suppress the constant term, and I stored the residual degrees of freedom in **e(df_r)** so that Wald-based postestimation commands will use \(t\) or \(F\) distributions. I illustrated option parsing by example, but I skipped the general theory and many details. Type . help syntax for more details about parsing options using the **syntax** command.

In the next post, I implement the modern syntax for robust and cluster-robust standard errors.

**References**

Cameron, A. C., and P. K. Trivedi. 2005. *Microeconometrics: Methods and applications*. Cambridge: Cambridge University Press.

Stock, J. H., and M. W. Watson. 2010. *Introduction to Econometrics*. 3rd ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2015. *Introductory Econometrics: A Modern Approach*. 6th ed. Cincinnati, Ohio: South-Western.