Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects
\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}
\newcommand{\ebi}{\boldsymbol{\epsilon}_i}
\newcommand{\Sigmab}{\boldsymbol{\Sigma}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\eb}{{\bf e}}
\newcommand{\xb}{{\bf x}}
\newcommand{\zb}{{\bf z}}
\newcommand{\yb}{{\bf y}}
\newcommand{\Xb}{{\bf X}}
\newcommand{\Mb}{{\bf M}}
\newcommand{\Eb}{{\bf E}}
\newcommand{\Xtb}{\tilde{\bf X}}
\newcommand{\Vb}{{\bf V}}\)I present the formulas for computing the ordinary least-squares (OLS) estimator, and I discuss some do-file implementations of them. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors. I introduce the Stata matrix commands and matrix functions that I use in ado-commands that I discuss in upcoming posts.
This is the fifth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
OLS formulas
Recall that the OLS point estimates are given by
\[
\widehat{\betab} =
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\left(
\sum_{i=1}^N \xb_i’y_i
\right)
\]
where \({\bf x}_i\) is the \(1\times k\) vector of independent variables, \(y_i\) is the dependent variable for each of the \(N\) sample observations, and the model for \(y_i\) is
\[
y_i = \xb_i\betab’ + \epsilon_i
\]
If the \(\epsilon_i\) are independently and identically distributed, we estimate the variance-covariance matrix of the estimator (VCE) by
\[
\widehat{\Vb} = \widehat{s}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]
where \(\widehat{s} = 1/(N-k)\sum_{i=1}^N e_i^2\) and \(e_i=y_i-{\bf x}_i\widehat{{\boldsymbol \beta}}\).
See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for introductions to OLS.
Stata matrix implementation
I use the matrix accum command to compute the sum of the products over the observations. Typing
. matrix accum zpz = z1 z2 z3
puts \(\left( \sum_{i=1}^N {\bf z}_i'{\bf z}_i \right)\) into the Stata matrix zpz, where \({\bf z}_i=( {\tt z1}_i, {\tt z2}_i, {\tt z3}_i, 1)\). The \(1\) appears because matrix accum has included the constant term by default, like almost all estimation commands.
Below, I use matrix accum to compute \(\left( \sum_{i=1}^N {\bf z}_i'{\bf z}_i \right)\), which contains \(\left( \sum_{i=1}^N {\bf x}_i'{\bf x}_i \right)\) and \(\left( \sum_{i=1}^N {\bf x}_i’y_i \right)\).
Example 1: Using matrix accum
. sysuse auto
(1978 Automobile Data)
. matrix accum zpz = price mpg trunk
(obs=74)
. matrix list zpz
symmetric zpz[4,4]
price mpg trunk _cons
price 3.448e+09
mpg 9132716 36008
trunk 6565725 20630 15340
_cons 456229 1576 1018 74
Now, I extract \(\left( \sum_{i=1}^N {\bf x}_i'{\bf x}_i \right)\) from rows 2–4 and columns 2–4 of zpz and \(\left( \sum_{i=1}^N {\bf x}_i’y_i \right)\) from rows 2–4 and column 1 of zpz.
Example 2: Extracting submatrices
. matrix xpx = zpz[2..4, 2..4]
. matrix xpy = zpz[2..4, 1]
. matrix list xpx
symmetric xpx[3,3]
mpg trunk _cons
mpg 36008
trunk 20630 15340
_cons 1576 1018 74
. matrix list xpy
xpy[3,1]
price
mpg 9132716
trunk 6565725
_cons 456229
I now compute \(\widehat{{\boldsymbol \beta}}\) from the matrices formed in example 2.
Example 3: Computing \(\widehat{\betab}\)
. matrix xpxi = invsym(xpx)
. matrix b = xpxi*xpy
. matrix list b
b[3,1]
price
mpg -220.16488
trunk 43.55851
_cons 10254.95
. matrix b = b'
. matrix list b
b[1,3]
mpg trunk _cons
price -220.16488 43.55851 10254.95
I transposed b to make it a row vector because point estimates in Stata are stored as row vectors.
Example 3 illustrates that the Stata matrix b contains the estimated coefficients and the names of the variables on which these values are estimated coefficients. To clarify, our model is
\[
\Eb[{\tt price}|{\tt mpg}, {\tt trunk} ] = {\tt mpg}*\beta_{\tt mpg}
+ {\tt trunk}*\beta_{\tt trunk} + {\tt \_cons}
\]
and b contains the information that \(-220.16\) is the estimated coefficient on mpg, that \(43.56\) is the estimated coefficient on trunk, and that \(10254.95\) is the estimated constant. We can compute the linear combination \(\xb_i\widehat{\betab}\) over the observations using the information in b, because b contains both the value and the name for each coefficient.
I use matrix score to compute this linear combination for each observation, and I use generate to reiterate what this linear combination is.
Example 4: Using matrix score to compute \(\xb_i\widehat{\betab}’\)
. matrix score double xbhat1 = b
. generate double xbhat2 = mpg*(-220.16488) + trunk*(43.55851) + 10254.95
. list xbhat1 xbhat2 in 1/4
+-----------------------+
| xbhat1 xbhat2 |
|-----------------------|
1. | 5890.4661 5890.4663 |
2. | 6991.2905 6991.2907 |
3. | 5934.0246 5934.0248 |
4. | 6548.5884 6548.5886 |
+-----------------------+
I use the predictions for \(\Eb[{\tt price}|{\tt mpg}, {\tt trunk} ]\) in xbhat1 to compute the residuals and the estimated VCE.
Example 5: Computing the estimated VCE
. generate double res = (price - xbhat1)
. generate double res2 = res^2
. summarize res2
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
res2 | 74 6674851 1.30e+07 11.24372 9.43e+07
. return list
scalars:
r(N) = 74
r(sum_w) = 74
r(mean) = 6674850.504745401
r(Var) = 168983977867533.1
r(sd) = 12999383.74952956
r(min) = 11.24371634723049
r(max) = 94250157.2111593
r(sum) = 493938937.3511598
. local N = r(N)
. local sum = r(sum)
. local s2 = `sum'/(`N'-3)
. matrix V = (`s2')*xpxi
(See Programing an estimation command in Stata: Where to store your stuff for discussions of using results from r-class commands and using local macros.)
I verify that my computations for \(\widehat{\betab}\) and the VCE match those of regress.
Example 6: Comparing against regress
. regress price mpg trunk
Source | SS df MS Number of obs = 74
-------------+---------------------------------- F(2, 71) = 10.14
Model | 141126459 2 70563229.4 Prob > F = 0.0001
Residual | 493938937 71 6956886.44 R-squared = 0.2222
-------------+---------------------------------- Adj R-squared = 0.2003
Total | 635065396 73 8699525.97 Root MSE = 2637.6
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -220.1649 65.59262 -3.36 0.001 -350.9529 -89.3769
trunk | 43.55851 88.71884 0.49 0.625 -133.3418 220.4589
_cons | 10254.95 2349.084 4.37 0.000 5571.01 14938.89
------------------------------------------------------------------------------
. matrix list e(b)
e(b)[1,3]
mpg trunk _cons
y1 -220.16488 43.55851 10254.95
. matrix list b
b[1,3]
mpg trunk _cons
price -220.16488 43.55851 10254.95
. matrix list e(V)
symmetric e(V)[3,3]
mpg trunk _cons
mpg 4302.3924
trunk 3384.4186 7871.0326
_cons -138187.95 -180358.85 5518194.7
. matrix list V
symmetric V[3,3]
mpg trunk _cons
mpg 4302.3924
trunk 3384.4186 7871.0326
_cons -138187.95 -180358.85 5518194.7
Robust standard errors
The frequently used robust estimator of the VCE is given by
\[
\widehat{V}_{robust}=\frac{N}{N-k}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]
where
\[\Mb=\sum_{i=1}^N \widehat{e}_i^2\xb_i’\xb_i\]
See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for derivations and discussions.
matrix accum with weights \(\widehat{e}_i^2\) computes the formula for \(\Mb\). Below, I use matrix accum to compute \(\Mb\) and \(\widehat{V}_{robust}\)
Example 7: A robust VCE
. matrix accum M = mpg trunk [iweight=res2] (obs=493938937.4) . matrix V2 = (`N'/(`N'-3))*xpxi*M*xpxi
I now verify that my computations match those reported by regress.
Example 8: Comparing computations of robust VCE
. regress price mpg trunk, robust
Linear regression Number of obs = 74
F(2, 71) = 11.59
Prob > F = 0.0000
R-squared = 0.2222
Root MSE = 2637.6
------------------------------------------------------------------------------
| Robust
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -220.1649 72.45388 -3.04 0.003 -364.6338 -75.69595
trunk | 43.55851 71.4537 0.61 0.544 -98.91613 186.0331
_cons | 10254.95 2430.641 4.22 0.000 5408.39 15101.51
------------------------------------------------------------------------------
. matrix list e(V)
symmetric e(V)[3,3]
mpg trunk _cons
mpg 5249.5646
trunk 3569.5316 5105.6316
_cons -169049.76 -147284.49 5908013.8
. matrix list V2
symmetric V2[3,3]
mpg trunk _cons
mpg 5249.5646
trunk 3569.5316 5105.6316
_cons -169049.76 -147284.49 5908013.8
Cluster-robust standard errors
The cluster-robust estimator of the VCE is frequently used when the data have a panel structure, also known as a longitudinal structure. This VCE accounts for the within-group correlation of the errors, and it is given by
\[
\widehat{V}_{cluster}=\frac{N-1}{N-k}\frac{g}{g-1}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb_c
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]
where
\[\Mb_c=\sum_{j=1}^g
\Xb_j’
(\widehat{\eb}_j \widehat{\eb}_j’)
\Xb_j \]
\(\Xb_j\) is the \(n_j\times k\) matrix of observations on \(\xb_i\) in group \(j\), \(\widehat{\eb}_j\) is the \(n_j\times 1\) vector of residuals in group \(j\), and \(g\) is the number of groups. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.
matrix opaccum computes the formula for \(\Mb_c\). Below, I create the group variable cvar from rep78 and use matrix opaccum to compute \(\Mb_c\) and \(\widehat{V}_{cluster}\)
Example 9: A cluster-robust VCE
. generate cvar = cond( missing(rep78), 6, rep78)
. tab cvar
cvar | Freq. Percent Cum.
------------+-----------------------------------
1 | 2 2.70 2.70
2 | 8 10.81 13.51
3 | 30 40.54 54.05
4 | 18 24.32 78.38
5 | 11 14.86 93.24
6 | 5 6.76 100.00
------------+-----------------------------------
Total | 74 100.00
. local Nc = r(r)
. sort cvar
. matrix opaccum M2 = mpg trunk , group(cvar) opvar(res)
. matrix V2 = ((`N'-1)/(`N'-3))*(`Nc'/(`Nc'-1))*xpxi*M2*xpxi
I now verify that my computations match those reported by regress.
Example 10: Comparing computations of cluster-robust VCE
. regress price mpg trunk, vce(cluster cvar)
Linear regression Number of obs = 74
F(2, 5) = 9.54
Prob > F = 0.0196
R-squared = 0.2222
Root MSE = 2637.6
(Std. Err. adjusted for 6 clusters in cvar)
------------------------------------------------------------------------------
| Robust
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -220.1649 93.28127 -2.36 0.065 -459.952 19.62226
trunk | 43.55851 58.89644 0.74 0.493 -107.8396 194.9566
_cons | 10254.95 2448.547 4.19 0.009 3960.758 16549.14
------------------------------------------------------------------------------
. matrix list e(V)
symmetric e(V)[3,3]
mpg trunk _cons
mpg 8701.3957
trunk 4053.5381 3468.7911
_cons -223021 -124190.97 5995384.3
. matrix list V2
symmetric V2[3,3]
mpg trunk _cons
mpg 8701.3957
trunk 4053.5381 3468.7911
_cons -223021 -124190.97 5995384.3
Done and undone
I reviewed the formulas that underlie the OLS estimator and showed how to compute them using Stata matrix commands and functions. In the next two posts, I write an ado-command that implements these formulas.
References
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.
Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.
Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.