Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects
\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}
\newcommand{\ebi}{\boldsymbol{\epsilon}_i}
\newcommand{\Sigmab}{\boldsymbol{\Sigma}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\eb}{{\bf e}}
\newcommand{\xb}{{\bf x}}
\newcommand{\zb}{{\bf z}}
\newcommand{\yb}{{\bf y}}
\newcommand{\Xb}{{\bf X}}
\newcommand{\Mb}{{\bf M}}
\newcommand{\Eb}{{\bf E}}
\newcommand{\Xtb}{\tilde{\bf X}}
\newcommand{\Vb}{{\bf V}}\)I present the formulas for computing the ordinary least-squares (OLS) estimator, and I discuss some do-file implementations of them. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors. I introduce the Stata matrix commands and matrix functions that I use in ado-commands that I discuss in upcoming posts.
This is the fifth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
OLS formulas
Recall that the OLS point estimates are given by
\[
\widehat{\betab} =
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\left(
\sum_{i=1}^N \xb_i’y_i
\right)
\]
where \({\bf x}_i\) is the \(1\times k\) vector of independent variables, \(y_i\) is the dependent variable for each of the \(N\) sample observations, and the model for \(y_i\) is
\[
y_i = \xb_i\betab’ + \epsilon_i
\]
If the \(\epsilon_i\) are independently and identically distributed, we estimate the variance-covariance matrix of the estimator (VCE) by
\[
\widehat{\Vb} = \widehat{s}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]
where \(\widehat{s} = 1/(N-k)\sum_{i=1}^N e_i^2\) and \(e_i=y_i-{\bf x}_i\widehat{{\boldsymbol \beta}}\).
See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for introductions to OLS.
Stata matrix implementation
I use the matrix accum command to compute the sum of the products over the observations. Typing
. matrix accum zpz = z1 z2 z3
puts \(\left( \sum_{i=1}^N {\bf z}_i'{\bf z}_i \right)\) into the Stata matrix zpz, where \({\bf z}_i=( {\tt z1}_i, {\tt z2}_i, {\tt z3}_i, 1)\). The \(1\) appears because matrix accum has included the constant term by default, like almost all estimation commands.
Below, I use matrix accum to compute \(\left( \sum_{i=1}^N {\bf z}_i'{\bf z}_i \right)\), which contains \(\left( \sum_{i=1}^N {\bf x}_i'{\bf x}_i \right)\) and \(\left( \sum_{i=1}^N {\bf x}_i’y_i \right)\).
Example 1: Using matrix accum
. sysuse auto (1978 Automobile Data) . matrix accum zpz = price mpg trunk (obs=74) . matrix list zpz symmetric zpz[4,4] price mpg trunk _cons price 3.448e+09 mpg 9132716 36008 trunk 6565725 20630 15340 _cons 456229 1576 1018 74
Now, I extract \(\left( \sum_{i=1}^N {\bf x}_i'{\bf x}_i \right)\) from rows 2–4 and columns 2–4 of zpz and \(\left( \sum_{i=1}^N {\bf x}_i’y_i \right)\) from rows 2–4 and column 1 of zpz.
Example 2: Extracting submatrices
. matrix xpx = zpz[2..4, 2..4] . matrix xpy = zpz[2..4, 1] . matrix list xpx symmetric xpx[3,3] mpg trunk _cons mpg 36008 trunk 20630 15340 _cons 1576 1018 74 . matrix list xpy xpy[3,1] price mpg 9132716 trunk 6565725 _cons 456229
I now compute \(\widehat{{\boldsymbol \beta}}\) from the matrices formed in example 2.
Example 3: Computing \(\widehat{\betab}\)
. matrix xpxi = invsym(xpx) . matrix b = xpxi*xpy . matrix list b b[3,1] price mpg -220.16488 trunk 43.55851 _cons 10254.95 . matrix b = b' . matrix list b b[1,3] mpg trunk _cons price -220.16488 43.55851 10254.95
I transposed b to make it a row vector because point estimates in Stata are stored as row vectors.
Example 3 illustrates that the Stata matrix b contains the estimated coefficients and the names of the variables on which these values are estimated coefficients. To clarify, our model is
\[
\Eb[{\tt price}|{\tt mpg}, {\tt trunk} ] = {\tt mpg}*\beta_{\tt mpg}
+ {\tt trunk}*\beta_{\tt trunk} + {\tt \_cons}
\]
and b contains the information that \(-220.16\) is the estimated coefficient on mpg, that \(43.56\) is the estimated coefficient on trunk, and that \(10254.95\) is the estimated constant. We can compute the linear combination \(\xb_i\widehat{\betab}\) over the observations using the information in b, because b contains both the value and the name for each coefficient.
I use matrix score to compute this linear combination for each observation, and I use generate to reiterate what this linear combination is.
Example 4: Using matrix score to compute \(\xb_i\widehat{\betab}’\)
. matrix score double xbhat1 = b . generate double xbhat2 = mpg*(-220.16488) + trunk*(43.55851) + 10254.95 . list xbhat1 xbhat2 in 1/4 +-----------------------+ | xbhat1 xbhat2 | |-----------------------| 1. | 5890.4661 5890.4663 | 2. | 6991.2905 6991.2907 | 3. | 5934.0246 5934.0248 | 4. | 6548.5884 6548.5886 | +-----------------------+
I use the predictions for \(\Eb[{\tt price}|{\tt mpg}, {\tt trunk} ]\) in xbhat1 to compute the residuals and the estimated VCE.
Example 5: Computing the estimated VCE
. generate double res = (price - xbhat1) . generate double res2 = res^2 . summarize res2 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- res2 | 74 6674851 1.30e+07 11.24372 9.43e+07 . return list scalars: r(N) = 74 r(sum_w) = 74 r(mean) = 6674850.504745401 r(Var) = 168983977867533.1 r(sd) = 12999383.74952956 r(min) = 11.24371634723049 r(max) = 94250157.2111593 r(sum) = 493938937.3511598 . local N = r(N) . local sum = r(sum) . local s2 = `sum'/(`N'-3) . matrix V = (`s2')*xpxi
(See Programing an estimation command in Stata: Where to store your stuff for discussions of using results from r-class commands and using local macros.)
I verify that my computations for \(\widehat{\betab}\) and the VCE match those of regress.
Example 6: Comparing against regress
. regress price mpg trunk Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 10.14 Model | 141126459 2 70563229.4 Prob > F = 0.0001 Residual | 493938937 71 6956886.44 R-squared = 0.2222 -------------+---------------------------------- Adj R-squared = 0.2003 Total | 635065396 73 8699525.97 Root MSE = 2637.6 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -350.9529 -89.3769 trunk | 43.55851 88.71884 0.49 0.625 -133.3418 220.4589 _cons | 10254.95 2349.084 4.37 0.000 5571.01 14938.89 ------------------------------------------------------------------------------ . matrix list e(b) e(b)[1,3] mpg trunk _cons y1 -220.16488 43.55851 10254.95 . matrix list b b[1,3] mpg trunk _cons price -220.16488 43.55851 10254.95 . matrix list e(V) symmetric e(V)[3,3] mpg trunk _cons mpg 4302.3924 trunk 3384.4186 7871.0326 _cons -138187.95 -180358.85 5518194.7 . matrix list V symmetric V[3,3] mpg trunk _cons mpg 4302.3924 trunk 3384.4186 7871.0326 _cons -138187.95 -180358.85 5518194.7
Robust standard errors
The frequently used robust estimator of the VCE is given by
\[
\widehat{V}_{robust}=\frac{N}{N-k}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]
where
\[\Mb=\sum_{i=1}^N \widehat{e}_i^2\xb_i’\xb_i\]
See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for derivations and discussions.
matrix accum with weights \(\widehat{e}_i^2\) computes the formula for \(\Mb\). Below, I use matrix accum to compute \(\Mb\) and \(\widehat{V}_{robust}\)
Example 7: A robust VCE
. matrix accum M = mpg trunk [iweight=res2] (obs=493938937.4) . matrix V2 = (`N'/(`N'-3))*xpxi*M*xpxi
I now verify that my computations match those reported by regress.
Example 8: Comparing computations of robust VCE
. regress price mpg trunk, robust Linear regression Number of obs = 74 F(2, 71) = 11.59 Prob > F = 0.0000 R-squared = 0.2222 Root MSE = 2637.6 ------------------------------------------------------------------------------ | Robust price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 72.45388 -3.04 0.003 -364.6338 -75.69595 trunk | 43.55851 71.4537 0.61 0.544 -98.91613 186.0331 _cons | 10254.95 2430.641 4.22 0.000 5408.39 15101.51 ------------------------------------------------------------------------------ . matrix list e(V) symmetric e(V)[3,3] mpg trunk _cons mpg 5249.5646 trunk 3569.5316 5105.6316 _cons -169049.76 -147284.49 5908013.8 . matrix list V2 symmetric V2[3,3] mpg trunk _cons mpg 5249.5646 trunk 3569.5316 5105.6316 _cons -169049.76 -147284.49 5908013.8
Cluster-robust standard errors
The cluster-robust estimator of the VCE is frequently used when the data have a panel structure, also known as a longitudinal structure. This VCE accounts for the within-group correlation of the errors, and it is given by
\[
\widehat{V}_{cluster}=\frac{N-1}{N-k}\frac{g}{g-1}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb_c
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]
where
\[\Mb_c=\sum_{j=1}^g
\Xb_j’
(\widehat{\eb}_j \widehat{\eb}_j’)
\Xb_j \]
\(\Xb_j\) is the \(n_j\times k\) matrix of observations on \(\xb_i\) in group \(j\), \(\widehat{\eb}_j\) is the \(n_j\times 1\) vector of residuals in group \(j\), and \(g\) is the number of groups. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.
matrix opaccum computes the formula for \(\Mb_c\). Below, I create the group variable cvar from rep78 and use matrix opaccum to compute \(\Mb_c\) and \(\widehat{V}_{cluster}\)
Example 9: A cluster-robust VCE
. generate cvar = cond( missing(rep78), 6, rep78) . tab cvar cvar | Freq. Percent Cum. ------------+----------------------------------- 1 | 2 2.70 2.70 2 | 8 10.81 13.51 3 | 30 40.54 54.05 4 | 18 24.32 78.38 5 | 11 14.86 93.24 6 | 5 6.76 100.00 ------------+----------------------------------- Total | 74 100.00 . local Nc = r(r) . sort cvar . matrix opaccum M2 = mpg trunk , group(cvar) opvar(res) . matrix V2 = ((`N'-1)/(`N'-3))*(`Nc'/(`Nc'-1))*xpxi*M2*xpxi
I now verify that my computations match those reported by regress.
Example 10: Comparing computations of cluster-robust VCE
. regress price mpg trunk, vce(cluster cvar) Linear regression Number of obs = 74 F(2, 5) = 9.54 Prob > F = 0.0196 R-squared = 0.2222 Root MSE = 2637.6 (Std. Err. adjusted for 6 clusters in cvar) ------------------------------------------------------------------------------ | Robust price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 93.28127 -2.36 0.065 -459.952 19.62226 trunk | 43.55851 58.89644 0.74 0.493 -107.8396 194.9566 _cons | 10254.95 2448.547 4.19 0.009 3960.758 16549.14 ------------------------------------------------------------------------------ . matrix list e(V) symmetric e(V)[3,3] mpg trunk _cons mpg 8701.3957 trunk 4053.5381 3468.7911 _cons -223021 -124190.97 5995384.3 . matrix list V2 symmetric V2[3,3] mpg trunk _cons mpg 8701.3957 trunk 4053.5381 3468.7911 _cons -223021 -124190.97 5995384.3
Done and undone
I reviewed the formulas that underlie the OLS estimator and showed how to compute them using Stata matrix commands and functions. In the next two posts, I write an ado-command that implements these formulas.
References
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.
Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.
Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.