Programming an estimation command in Stata: Allowing for options
I make three improvements to the command that implements the ordinary least-squares (OLS) estimator that I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables. First, I allow the user to request a robust estimator of the variance-covariance of the estimator (VCE). Second, I allow the user to suppress the constant term. Third, I store the residual degrees of freedom in e(df_r) so that test will use the \(t\) or \(F\) distribution instead of the normal or \(\chi^2\) distribution to compute the \(p\)-value of Wald tests.
This is the ninth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
Allowing for robust standard errors
The syntax of myregress6, which I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, is
myregress6 depvar [indepvars] [if] [in]
where the independent variables can be time-series or factor variables. myregress7 has the syntax
myregress7 depvar [indepvars] [if] [in] [, robust ]
By default, myregress7 estimates the VCE assuming that the errors are independently and identically distributed (IID). If the option robust is specified, myregress7 uses the robust estimator of the VCE. See Cameron and Trivedi (2005), Stock and Watson (2010), Wooldridge (2015) for introductions to OLS; see Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects for the formulas and Stata matrix implementations. Click on the file name to download any code block. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.
*! version 7.0.0 30Nov2015 program define myregress7, eclass version 14 syntax varlist(numeric ts fv) [if] [in] [, Robust] marksample touse gettoken depvar indeps : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' local N = r(N) local p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') local k = `p' - diag0cnt(`xpxi') - 1 matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' if "`robust'" == "" { quietly summarize `res2' if `touse' , meanonly local sum = r(sum) local s2 = `sum'/(`N'-(`k')) matrix `V' = `s2'*`xpxi' } else { tempname M quietly matrix accum `M' = `indeps' [iweight=`res2'] if `touse' matrix `V' = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi' local vce "robust" local vcetype "Robust" } ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `k' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local cmd "myregress7" ereturn display end
A user may specify the robust option by typing robust, robus, robu, rob, ro, or r. In other words, r is the minimal abbreviation of the option robust. Line 5 of myregress7 implements this syntax. Specifying robust is optional because Robust is enclosed in the square brackets. r is the minimal abbreviation because the R is in uppercase and the remaining letters are in lowercase.
If the user specifies robust, or a valid abbreviation thereof, the local macro robust contains the word “robust”; otherwise, the local macro robust is empty. Line 25 uses this fact to determine which VCE should be computed; it specifies that lines 26–31 should be executed if the local macro robust is empty and that lines 32-36 should otherwise be executed. Lines 26-31 compute the IID estimator of the VCE. Lines 32-34 compute the robust estimator of the VCE. Lines 35 and 36 respectively put “robust” and “Robust” into the local macros vce and vcetype.
Line 41 puts the contents of the local macro vce into the local macro e(vce), which informs users and postestimation commands which VCE estimator was used. By convention, e(vce) is empty for the IID case. Line 42 puts the contents of the local macro vcetype into the local macro e(vcetype), which is used by ereturn display to correctly label the standard errors as robust.
I now run a regression with robust standard errors.
Example 1: myregress7 with robust standard errors
. sysuse auto (1978 Automobile Data) . myregress7 price mpg trunk i.rep78, robust ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -262.7053 74.75538 -3.51 0.000 -409.2232 -116.1875 trunk | 41.75706 73.71523 0.57 0.571 -102.7221 186.2362 | rep78 | 2 | 654.7905 1132.425 0.58 0.563 -1564.721 2874.302 3 | 1170.606 823.9454 1.42 0.155 -444.2979 2785.509 4 | 1473.352 650.4118 2.27 0.023 198.5679 2748.135 5 | 2896.888 937.6981 3.09 0.002 1059.034 4734.743 | _cons | 9726.377 2040.335 4.77 0.000 5727.393 13725.36 ------------------------------------------------------------------------------
Suppressing the constant term
myregress8 has the syntax
myregress8 depvar [indepvars] [if] [in] [, robust noconstant ]
*! version 8.0.0 30Nov2015 program define myregress8, eclass version 14 syntax varlist(numeric ts fv) [if] [in] [, Robust noCONStant ] marksample touse gettoken depvar indeps : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' , `constant' local N = r(N) local p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') local k = `p' - diag0cnt(`xpxi') - 1 matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' if "`robust'" == "" { quietly summarize `res2' if `touse' , meanonly local sum = r(sum) local s2 = `sum'/(`N'-(`k')) matrix `V' = `s2'*`xpxi' } else { tempname M quietly matrix accum `M' = `indeps' [iweight=`res2'] /// if `touse' , `constant' matrix `V' = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi' local vce "robust" local vcetype "Robust" } ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `k' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local cmd "myregress8" ereturn display end
The syntax command on line 5 puts “noconstant” into the local macro constant if the user types nocons, noconst, noconsta, noconstan, or noconstant; otherwise, the local macro constant is empty. The minimal abbreviation of option noconstant is nocons because the lowercase no is followed by CONStant. Note that specifying the option creates the local macro constant because the no is followed by uppercase letters specifying the minimum abbreviation.
To implement the option, I specified what is contained in the local macro constant as an option on the matrix accum command on line 14 and on the matrix accum command spread over lines 33 and 34. The matrix accum command that begins on line 33 is too long for one line. I used /// to comment out the end-of-line character and continue the command on line 34.
I now illustrate the noconstant option.
Example 2: myregress8 with option noconstant
. myregress8 price mpg trunk ibn.rep78, noconstant ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -262.7053 73.49434 -3.57 0.000 -406.7516 -118.6591 trunk | 41.75706 93.9671 0.44 0.657 -142.4151 225.9292 | rep78 | 1 | 9726.377 2790.009 3.49 0.000 4258.06 15194.69 2 | 10381.17 2607.816 3.98 0.000 5269.943 15492.39 3 | 10896.98 2555.364 4.26 0.000 5888.561 15905.4 4 | 11199.73 2588.19 4.33 0.000 6126.97 16272.49 5 | 12623.27 2855.763 4.42 0.000 7026.073 18220.46 ------------------------------------------------------------------------------
Using t or F distributions
The output tables reported in examples 1 and 2 use the normal distribution to compute \(p\)-values and confidence intervals, because Wald-based postestimation commmands like test and ereturn display use the normal or the \(\chi^2\) distribution unless the residual degrees of freedom are stored in e(df_r).
*! version 9.0.0 30Nov2015 program define myregress9, eclass version 14 syntax varlist(numeric ts fv) [if] [in] [, Robust noCONStant ] marksample touse gettoken depvar indeps : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' , `constant' local N = r(N) local p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') local k = `p' - diag0cnt(`xpxi') - 1 matrix `b' = (`xpxi'*`xpy')' quietly matrix score double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' if "`robust'" == "" { quietly summarize `res2' if `touse' , meanonly local sum = r(sum) local s2 = `sum'/(`N'-(`k')) matrix `V' = `s2'*`xpxi' } else { tempname M quietly matrix accum `M' = `indeps' [iweight=`res2'] /// if `touse' , `constant' matrix `V' = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi' local vce "robust" local vcetype "Robust" } ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `k' ereturn scalar df_r = `N'-`k' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local cmd "myregress8" ereturn display end
Line 42 of myregress9.ado stores the residual degrees of freedom in e(df_r). Example 3 illustrates that ereturn display and test now use the \(t\) and \(F\) distributions.
Example 3: t or F distributions after myregress9
. myregress9 price mpg trunk ibn.rep78, noconstant ------------------------------------------------------------------------------ | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -262.7053 73.49434 -3.57 0.001 -409.6184 -115.7923 trunk | 41.75706 93.9671 0.44 0.658 -146.0805 229.5946 | rep78 | 1 | 9726.377 2790.009 3.49 0.001 4149.229 15303.53 2 | 10381.17 2607.816 3.98 0.000 5168.219 15594.12 3 | 10896.98 2555.364 4.26 0.000 5788.882 16005.08 4 | 11199.73 2588.19 4.33 0.000 6026.011 16373.45 5 | 12623.27 2855.763 4.42 0.000 6914.677 18331.85 ------------------------------------------------------------------------------ . test trunk ( 1) trunk = 0 F( 1, 62) = 0.20 Prob > F = 0.6583
Done and undone
I added an option for the robust estimator of the VCE, I added an option to suppress the constant term, and I stored the residual degrees of freedom in e(df_r) so that Wald-based postestimation commands will use \(t\) or \(F\) distributions. I illustrated option parsing by example, but I skipped the general theory and many details. Type . help syntax for more details about parsing options using the syntax command.
In the next post, I implement the modern syntax for robust and cluster-robust standard errors.
References
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.
Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.
Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.