Home > Programming > Programming an estimation command in Stata: Allowing for options

Programming an estimation command in Stata: Allowing for options

I make three improvements to the command that implements the ordinary least-squares (OLS) estimator that I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables. First, I allow the user to request a robust estimator of the variance-covariance of the estimator (VCE). Second, I allow the user to suppress the constant term. Third, I store the residual degrees of freedom in e(df_r) so that test will use the \(t\) or \(F\) distribution instead of the normal or \(\chi^2\) distribution to compute the \(p\)-value of Wald tests.

This is the ninth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Allowing for robust standard errors

The syntax of myregress6, which I discussed in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, is

myregress6 depvar [indepvars] [if] [in]

where the independent variables can be time-series or factor variables. myregress7 has the syntax

myregress7 depvar [indepvars] [if] [in] [, robust ]

By default, myregress7 estimates the VCE assuming that the errors are independently and identically distributed (IID). If the option robust is specified, myregress7 uses the robust estimator of the VCE. See Cameron and Trivedi (2005), Stock and Watson (2010), Wooldridge (2015) for introductions to OLS; see Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects for the formulas and Stata matrix implementations. Click on the file name to download any code block. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

Code block 1: myregress7.ado

*! version 7.0.0  30Nov2015
program define myregress7, eclass
    version 14

    syntax varlist(numeric ts fv) [if] [in] [, Robust]
    marksample touse

    gettoken depvar indeps : varlist
    _fv_check_depvar `depvar'

    tempname zpz xpx xpy xpxi b V
    tempvar  xbhat res res2 

    quietly matrix accum `zpz' = `varlist' if `touse'
    local N                    = r(N)
    local p                    = colsof(`zpz')
    matrix `xpx'               = `zpz'[2..`p', 2..`p']
    matrix `xpy'               = `zpz'[2..`p', 1]
    matrix `xpxi'              = syminv(`xpx')
    local k                    = `p' - diag0cnt(`xpxi') - 1
    matrix `b'                 = (`xpxi'*`xpy')'
    quietly matrix score double `xbhat' = `b' if `touse'
    quietly generate double `res'       = (`depvar' - `xbhat') if `touse'
    quietly generate double `res2'      = (`res')^2 if `touse'
    if "`robust'" == "" {
        quietly summarize `res2' if `touse' , meanonly
        local sum           = r(sum)
        local s2            = `sum'/(`N'-(`k'))
        matrix `V'          = `s2'*`xpxi'
    }
    else {
        tempname M
        quietly matrix accum `M' = `indeps' [iweight=`res2'] if `touse'
        matrix `V'               = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi'
        local vce                   "robust"          
        local vcetype               "Robust"          
    }
    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `k'
    ereturn local  vce     "`vce'"
    ereturn local  vcetype "`vcetype'"
    ereturn local  cmd     "myregress7"
    ereturn display
end

A user may specify the robust option by typing robust, robus, robu, rob, ro, or r. In other words, r is the minimal abbreviation of the option robust. Line 5 of myregress7 implements this syntax. Specifying robust is optional because Robust is enclosed in the square brackets. r is the minimal abbreviation because the R is in uppercase and the remaining letters are in lowercase.

If the user specifies robust, or a valid abbreviation thereof, the local macro robust contains the word “robust”; otherwise, the local macro robust is empty. Line 25 uses this fact to determine which VCE should be computed; it specifies that lines 26–31 should be executed if the local macro robust is empty and that lines 32-36 should otherwise be executed. Lines 26-31 compute the IID estimator of the VCE. Lines 32-34 compute the robust estimator of the VCE. Lines 35 and 36 respectively put “robust” and “Robust” into the local macros vce and vcetype.

Line 41 puts the contents of the local macro vce into the local macro e(vce), which informs users and postestimation commands which VCE estimator was used. By convention, e(vce) is empty for the IID case. Line 42 puts the contents of the local macro vcetype into the local macro e(vcetype), which is used by ereturn display to correctly label the standard errors as robust.

I now run a regression with robust standard errors.

Example 1: myregress7 with robust standard errors

. sysuse auto
(1978 Automobile Data)

. myregress7 price mpg trunk i.rep78, robust
------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -262.7053   74.75538    -3.51   0.000    -409.2232   -116.1875
       trunk |   41.75706   73.71523     0.57   0.571    -102.7221    186.2362
             |
       rep78 |
          2  |   654.7905   1132.425     0.58   0.563    -1564.721    2874.302
          3  |   1170.606   823.9454     1.42   0.155    -444.2979    2785.509
          4  |   1473.352   650.4118     2.27   0.023     198.5679    2748.135
          5  |   2896.888   937.6981     3.09   0.002     1059.034    4734.743
             |
       _cons |   9726.377   2040.335     4.77   0.000     5727.393    13725.36
------------------------------------------------------------------------------

Suppressing the constant term

myregress8 has the syntax

myregress8 depvar [indepvars] [if] [in] [, robust noconstant ]

Code block 2: myregress8.ado

*! version 8.0.0  30Nov2015
program define myregress8, eclass
    version 14

    syntax varlist(numeric ts fv) [if] [in] [, Robust noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist
    _fv_check_depvar `depvar'

    tempname zpz xpx xpy xpxi b V
    tempvar  xbhat res res2 

    quietly matrix accum `zpz' = `varlist' if `touse' , `constant'
    local N                    = r(N)
    local p                    = colsof(`zpz')
    matrix `xpx'               = `zpz'[2..`p', 2..`p']
    matrix `xpy'               = `zpz'[2..`p', 1]
    matrix `xpxi'              = syminv(`xpx')
    local k                    = `p' - diag0cnt(`xpxi') - 1
    matrix `b'                 = (`xpxi'*`xpy')'
    quietly matrix score double `xbhat' = `b' if `touse'
    quietly generate double `res'       = (`depvar' - `xbhat') if `touse'
    quietly generate double `res2'      = (`res')^2 if `touse'
    if "`robust'" == "" {
        quietly summarize `res2' if `touse' , meanonly
        local sum           = r(sum)
        local s2            = `sum'/(`N'-(`k'))
        matrix `V'          = `s2'*`xpxi'
    }
    else {
        tempname M
        quietly matrix accum `M' = `indeps' [iweight=`res2']     ///
            if `touse' , `constant'
        matrix `V'               = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi'
        local vce                   "robust"          
        local vcetype               "Robust"          
    }
    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `k'
    ereturn local  vce     "`vce'"
    ereturn local  vcetype "`vcetype'"
    ereturn local  cmd     "myregress8"
    ereturn display
end

The syntax command on line 5 puts “noconstant” into the local macro constant if the user types nocons, noconst, noconsta, noconstan, or noconstant; otherwise, the local macro constant is empty. The minimal abbreviation of option noconstant is nocons because the lowercase no is followed by CONStant. Note that specifying the option creates the local macro constant because the no is followed by uppercase letters specifying the minimum abbreviation.

To implement the option, I specified what is contained in the local macro constant as an option on the matrix accum command on line 14 and on the matrix accum command spread over lines 33 and 34. The matrix accum command that begins on line 33 is too long for one line. I used /// to comment out the end-of-line character and continue the command on line 34.

I now illustrate the noconstant option.

Example 2: myregress8 with option noconstant

. myregress8 price mpg trunk ibn.rep78, noconstant
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -262.7053   73.49434    -3.57   0.000    -406.7516   -118.6591
       trunk |   41.75706    93.9671     0.44   0.657    -142.4151    225.9292
             |
       rep78 |
          1  |   9726.377   2790.009     3.49   0.000      4258.06    15194.69
          2  |   10381.17   2607.816     3.98   0.000     5269.943    15492.39
          3  |   10896.98   2555.364     4.26   0.000     5888.561     15905.4
          4  |   11199.73    2588.19     4.33   0.000      6126.97    16272.49
          5  |   12623.27   2855.763     4.42   0.000     7026.073    18220.46
------------------------------------------------------------------------------

Using t or F distributions

The output tables reported in examples 1 and 2 use the normal distribution to compute \(p\)-values and confidence intervals, because Wald-based postestimation commmands like test and ereturn display use the normal or the \(\chi^2\) distribution unless the residual degrees of freedom are stored in e(df_r).

Code block 3: myregress9.ado

*! version 9.0.0  30Nov2015
program define myregress9, eclass
    version 14

    syntax varlist(numeric ts fv) [if] [in] [, Robust noCONStant ]
    marksample touse

    gettoken depvar indeps : varlist
    _fv_check_depvar `depvar'

    tempname zpz xpx xpy xpxi b V
    tempvar  xbhat res res2 

    quietly matrix accum `zpz' = `varlist' if `touse' , `constant'
    local N                    = r(N)
    local p                    = colsof(`zpz')
    matrix `xpx'               = `zpz'[2..`p', 2..`p']
    matrix `xpy'               = `zpz'[2..`p', 1]
    matrix `xpxi'              = syminv(`xpx')
    local k                    = `p' - diag0cnt(`xpxi') - 1
    matrix `b'                 = (`xpxi'*`xpy')'
    quietly matrix score double `xbhat' = `b' if `touse'
    quietly generate double `res'       = (`depvar' - `xbhat') if `touse'
    quietly generate double `res2'      = (`res')^2 if `touse'
    if "`robust'" == "" {
        quietly summarize `res2' if `touse' , meanonly
        local sum           = r(sum)
        local s2            = `sum'/(`N'-(`k'))
        matrix `V'          = `s2'*`xpxi'
    }
    else {
        tempname M
        quietly matrix accum `M' = `indeps' [iweight=`res2']     ///
            if `touse' , `constant'
        matrix `V'               = (`N'/(`N'-(`k')))*`xpxi'*`M'*`xpxi'
        local vce                   "robust"          
        local vcetype               "Robust"          
    }
    ereturn post `b' `V', esample(`touse') buildfvinfo
    ereturn scalar N       = `N'
    ereturn scalar rank    = `k'
    ereturn scalar df_r    = `N'-`k'
    ereturn local  vce     "`vce'"
    ereturn local  vcetype "`vcetype'"
    ereturn local  cmd     "myregress8"
    ereturn display
end

Line 42 of myregress9.ado stores the residual degrees of freedom in e(df_r). Example 3 illustrates that ereturn display and test now use the \(t\) and \(F\) distributions.

Example 3: t or F distributions after myregress9

. myregress9 price mpg trunk ibn.rep78, noconstant
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -262.7053   73.49434    -3.57   0.001    -409.6184   -115.7923
       trunk |   41.75706    93.9671     0.44   0.658    -146.0805    229.5946
             |
       rep78 |
          1  |   9726.377   2790.009     3.49   0.001     4149.229    15303.53
          2  |   10381.17   2607.816     3.98   0.000     5168.219    15594.12
          3  |   10896.98   2555.364     4.26   0.000     5788.882    16005.08
          4  |   11199.73    2588.19     4.33   0.000     6026.011    16373.45
          5  |   12623.27   2855.763     4.42   0.000     6914.677    18331.85
------------------------------------------------------------------------------

. test trunk

 ( 1)  trunk = 0

       F(  1,    62) =    0.20
            Prob > F =    0.6583

Done and undone

I added an option for the robust estimator of the VCE, I added an option to suppress the constant term, and I stored the residual degrees of freedom in e(df_r) so that Wald-based postestimation commands will use \(t\) or \(F\) distributions. I illustrated option parsing by example, but I skipped the general theory and many details. Type . help syntax for more details about parsing options using the syntax command.

In the next post, I implement the modern syntax for robust and cluster-robust standard errors.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.

Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.