Programming an estimation command in Stata: Using a subroutine to parse a complex option
I make two improvements to the command that implements the ordinary least-squares (OLS) estimator that I discussed in Programming an estimation command in Stata: Allowing for options. First, I add an option for a cluster-robust estimator of the variance-covariance of the estimator (VCE). Second, I make the command accept the modern syntax for either a robust or a cluster-robust estimator of the VCE. In the process, I use subroutines in my ado-program to facilitate the parsing, and I discuss some advanced parsing tricks.
This is the tenth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
Allowing for a robust or a cluster-robust VCE
The syntax of myregress9, which I discussed in Programming an estimation command in Stata: Allowing for options, is
myregress9 depvar [indepvars] [if] [in] [, robust noconstant]
The syntax of myregress10, which I discuss here, is
myregress10 depvar [indepvars] [if] [in] [, vce(robust | cluster clustervar) noconstant]
By default, myregress10 estimates the VCE assuming that the errors are independently and identically distributed (IID). If the option vce(robust) is specified, myregress10 uses the robust estimator of the VCE. If the option vce(cluster clustervar) is specified, myregress10 uses the cluster-robust estimator of the VCE. See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2010, 2015) for introductions to OLS; see Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects for the formulas and Stata matrix implementations.
I recommend that you click on the file name to download the code for my myregress10.ado. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.
*! version 10.0.0 02Dec2015 program define myregress10, eclass sortpreserve version 14 syntax varlist(numeric ts fv) [if] [in] [, vce(string) noCONStant ] marksample touse gettoken depvar indeps : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 if `"`vce'"' != "" { my_vce_parse , vce(`vce') local vcetype "robust" local clustervar "`r(clustervar)'" if "`clustervar'" != "" { markout `touse' `clustervar' sort `clustervar' } } quietly matrix accum `zpz' = `varlist' if `touse' , `constant' local N = r(N) local p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' local k = `p' - diag0cnt(`xpxi') - 1 quietly matrix score double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' if "`vcetype'" == "robust" { if "`clustervar'" == "" { tempname M quietly matrix accum `M' = `indeps' /// [iweight=`res2'] if `touse' , `constant' local fac = (`N'/(`N'-`k')) local df_r = (`N'-`k') } else { tempvar idvar tempname M quietly egen `idvar' = group(`clustervar') if `touse' quietly summarize `idvar' if `touse', meanonly local Nc = r(max) local fac = ((`N'-1)/(`N'-`k')*(`Nc'/(`Nc'-1))) local df_r = (`Nc'-1) matrix opaccum `M' = `indeps' if `touse' /// , group(`clustervar') opvar(`res') } matrix `V' = (`fac')*`xpxi'*`M'*`xpxi' local vce "robust" local vcetype "Robust" } else { // IID Case quietly summarize `res2' if `touse' , meanonly local sum = r(sum) local s2 = `sum'/(`N'-`k') local df_r = (`N'-`k') matrix `V' = `s2'*`xpxi' } ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `k' ereturn scalar df_r = `df_r' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local clustvar "`clustvar'" ereturn local cmd "myregress10" ereturn display end program define my_vce_parse, rclass syntax [, vce(string) ] local case : word count `vce' if `case' > 2 { my_vce_error , typed(`vce') } local 0 `", `vce'"' syntax [, Robust CLuster * ] if `case' == 2 { if "`robust'" == "robust" | "`cluster'" == "" { my_vce_error , typed(`vce') } capture confirm numeric variable `options' if _rc { my_vce_error , typed(`vce') } local clustervar "`options'" } else { // case = 1 if "`robust'" == "" { my_vce_error , typed(`vce') } } return clear return local clustervar "`clustervar'" end program define my_vce_error syntax , typed(string) display `"{red}{bf:vce(`typed')} invalid"' error 498 end
The syntax command on line 5 puts whatever the user encloses in vce() into a local macro called vce. For example, if the user types
. myregress10 price mpg trunk , vce(hello there)
the local macro vce will contain “hello there”. If the user does not specify something in the vce() option, the local macro vce will be empty. Line 14 uses this condition to execute lines 15–21 only if the user has specified something in option vce().
When the user specifies something in the vce() option, line 15 calls the ado subroutine my_vce_parse to parse what is in the local macro vce. my_vce_parse stores the name of the cluster variable in r(clustervar) and deals with error conditions, as I discuss below. Line 16 stores “robust” into the local macro vcetype, and line 17 stores the contents of the local macro r(clustervar) created by my_vce_parse into the local macro and clustervar.
If the user does not specify something in vce(), the local macro vcetype will be empty and line 36 ensures that myregress10 will compute an IID estimator of the VCE.
Lines 19 and 20 are only executed if the local macro clustervar is not empty. Line 19 updates the touse variable, whose name is stored in the local macro touse, to account for missing values in the cluster variable, whose name is stored in clustervar. Line 20 sorts the dataset in the ascending order of the cluster variable. Users do not want estimation commands resorting their datasets. On line 2, I specified the sortpreserve option on program define to keep the dataset in the order it was in when myregress10 was executed by the user.
Lines 36–65 compute the requested estimator for the VCE. Recall that the local macro vcetype is empty or it contains “robust” and that the local macro clustervar is empty or it contains the name of the cluster variable. The if and else statements use the values stored in vcetype and clustervar to execute one of three blocks of code.
- Lines 38–42 compute a robust estimator of the VCE when vcetype contains “robust” and clustervar is empty.
- Lines 45–53 compute a cluster-robust of the VCE when vcetype contains “robust” and clustervar contains the name of the cluster variable.
- Lines 60–64 compute an IID estimator of the VCE when vcetype does not contain “robust”.
Line 73 stores the name of the cluster variable in e(clustervar), if the local macro clustervar is not empty.
Lines 78–111 define the rclass ado-subroutine my_vce_parse, which performs two tasks. First, it stores the name of the cluster variable in the local macro r(clustervar) when the user specifies vce(cluster clustervar). Second, it finds cases in which the user specified a syntax error in vce() and returns an error in such cases.
Putting these parsing details into a subroutine makes the main command much easier to follow. I recommend that you encapsulate details in subroutines.
The ado-subroutine my_vce_parse is local to the ado-command myregress10; the name my_vce_parse is in a namespace local to myregress10, and my_vce_parse can only be executed from within myregress10.
Line 79 uses syntax to store whatever the user specified in the option vce() in the local macro vce. Line 81 puts the number of words in vce into the local macro case. Line 83 causes the ado-subroutine my_vce_error to display an error message and return error code 498 when there are more than two words in vce. (Recall that vce should contain either robust or cluster clustervar.)
Having ruled out the cases with more than two words, line 87 stores what the local macro vce contains in the local macro 0. Line 88 uses syntax to parse what is in the local macro 0. If the user specified vce(robust), or a valid abbreviation thereof, syntax stores “robust” in the local macro robust; otherwise, the local macro robust is empty. If the user specified vce(cluster something), or a valid abbreviation of cluster, syntax stores “cluster” in the local macro cluster; otherwise, the local macro cluster is empty. The option * causes syntax to put any remaining options into the local macro options. In this case, syntax will store the something in the local macro options.
Remember the trick used in lines 87 and 88. Option parsing is frequently made much easier by storing what a local macro contains in the local macro 0 and using syntax to parse it.
When there are two words in the local macro vce, lines 91–100 ensure that the first word is “cluster” and that the second word, stored in the local macro options, is the name of a numeric variable. When all is well, line 100 stores the name of this numeric variable in the local macro clustervar. Lines 95–98 use a subtle construction to display a custom error message. Rather than let confirm display an error message, lines 95–98 use capture and an if condition to display our custom error message. In detail, line 95 uses confirm to confirm that the local macro options contains the name of a numeric variable. capture puts the return code produced by confirm in the scalar _rc. When options contains the name of a numeric variable, confirm produces the return code 0 and capture stores 0 in _rc; otherwise, confirm produces a positive return code, and capture stores this positive return code in _rc.
When all is well, line 109 clears whatever was in r(), and line 110 stores the name of the cluster variable in r(clustervar).
Lines 113–118 define the ado-subroutine my_vce_error, which displays a custom error message. Like my_vce_parse, my_vce_error is local to myregress10.ado.
Done and undone
I added an option for a cluster-robust estimator of the VCE, and I made myregress10 accept the modern syntax for either a robust or a cluster-robust estimator of the VCE. In the process, I used subroutines in myregress10.ado to facilitate the parsing, and I discussed some advanced parsing tricks.
Reading myregress10.ado would have been more difficult to read if I had not used subroutines to simplify the main routine.
Although it may seem that I have covered every possible nuance, I have only dealt with a few. Type help syntax for more details about parsing options using the syntax command.
References
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.
Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.
Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.