## Programming an estimation command in Stata: Computing OLS objects in Mata

\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}

\newcommand{\ebi}{\boldsymbol{\epsilon}_i}

\newcommand{\Sigmab}{\boldsymbol{\Sigma}}

\newcommand{\betab}{\boldsymbol{\beta}}

\newcommand{\eb}{{\bf e}}

\newcommand{\xb}{{\bf x}}

\newcommand{\xbit}{{\bf x}_{it}}

\newcommand{\xbi}{{\bf x}_{i}}

\newcommand{\zb}{{\bf z}}

\newcommand{\zbi}{{\bf z}_i}

\newcommand{\wb}{{\bf w}}

\newcommand{\yb}{{\bf y}}

\newcommand{\ub}{{\bf u}}

\newcommand{\Xb}{{\bf X}}

\newcommand{\Mb}{{\bf M}}

\newcommand{\Xtb}{\tilde{\bf X}}

\newcommand{\Wb}{{\bf W}}

\newcommand{\Vb}{{\bf V}}\)I present the formulas for computing the ordinary least-squares (**OLS**) estimator and show how to compute them in Mata. This post is a Mata version of Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors.

This is the fourteenth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**OLS formulas**

Recall that the **OLS** point estimates are given by

\[

\widehat{\betab} =

\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}

\left(

\sum_{i=1}^N \xb_i’y_i

\right)

\]

where \(\xb_i\) is the \(1\times k\) vector of independent variables, \(y_i\) is the dependent variable for each of the \(N\) sample observations, and the model for \(y_i\) is

\[

y_i = \xb_i\betab’ + \epsilon_i

\]

If the \(\epsilon_i\) are independently and identically distributed (**IID**), we estimate the variance-covariance matrix of the estimator (**VCE**) by

\[

\widehat{\Vb} = \widehat{s}

\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}

\]

where \(\widehat{s} = 1/(N-k)\sum_{i=1}^N e_i^2\) and \(e_i=y_i-\xb_i\widehat{\betab}\). See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for introductions to **OLS**.

**Mata implementation**

I compute the **OLS** point estimates in Mata in example 1.

**Example 1: Computing OLS point estimates in Mata**

. sysuse auto (1978 Automobile Data) . mata: ------------------------------------------------- mata (type end to exit) ------ : y = st_data(., "price") : X = st_data(., "mpg trunk") : n = rows(X) : X = X,J(n,1,1) : XpX = quadcross(X, X) : XpXi = invsym(XpX) : b = XpXi*quadcross(X, y) : end --------------------------------------------------------------------------------

I used **st_data()** to put a copy of the observations on **price** into the Mata vector **y** and to put a copy of the observations on **mpg** and **trunk** into the Mata matrix **X**. I used **rows(X)** to put the number of observations into **n**. After adding a column of ones onto **X** for the constant term, I used **quadcross()** to calculate \(\Xb’\Xb\) in quad precision. After using **invsym()** to calculate the inverse of the symmetric matrix **XpXi**, I calculated the point estimates from the **OLS** formula.

In example 1, I computed the **OLS** point estimates after forming the cross products. As discussed in Lange (2010, chapter 7), I could compute more accurate estimates using a QR decomposition; type help mf_qrd for details about computing QR decompositions in Mata. By computing the cross products in quad precision, I obtained point estimates that are almost as accurate as those obtainable from a QR decomposition in double precision, but that is a topic for another post.

Here are the point estimates I computed in Mata and comparable results from **regress**.

**Example 2: Results from Mata and regress**

. mata: b' 1 2 3 +----------------------------------------------+ 1 | -220.1648801 43.55851009 10254.94983 | +----------------------------------------------+ . regress price mpg trunk Source | SS df MS Number of obs = 74 -------------+---------------------------------- F(2, 71) = 10.14 Model | 141126459 2 70563229.4 Prob > F = 0.0001 Residual | 493938937 71 6956886.44 R-squared = 0.2222 -------------+---------------------------------- Adj R-squared = 0.2003 Total | 635065396 73 8699525.97 Root MSE = 2637.6 ------------------------------------------------------------------------------ price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 65.59262 -3.36 0.001 -350.9529 -89.3769 trunk | 43.55851 88.71884 0.49 0.625 -133.3418 220.4589 _cons | 10254.95 2349.084 4.37 0.000 5571.01 14938.89 ------------------------------------------------------------------------------

Given the **OLS** point estimates, I can now compute the **IID** estimator of the **VCE**.

**Example 3: Computing the IID VCE**

. mata: ------------------------------------------------- mata (type end to exit) ------ : e = y - X*b : e2 = e:^2 : k = cols(X) : V = (quadsum(e2)/(n-k))*XpXi : sqrt(diagonal(V))' 1 2 3 +-------------------------------------------+ 1 | 65.59262431 88.71884015 2349.08381 | +-------------------------------------------+ : end --------------------------------------------------------------------------------

I put the residuals into the Mata vector **e**, which I subsequently element-wise squared. I used **cols(X)** to put the number of covariates into **k**. I used **quadsum()** to compute the sum of the squared residuals in quad precision when computing **V**, an **IID** estimator for the **VCE**. The standard errors displayed by **sqrt(diagonal(V))** are the same as the ones displayed by **regress** in example 2.

**Robust standard errors**

The frequently used robust estimator of the **VCE** is given by

\[

\widehat{V}_{robust}=\frac{N}{N-k}

\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}

\Mb

\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}

\]

where

\[\Mb=\sum_{i=1}^N \widehat{e}_i^2\xb_i’\xb_i\]

See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for derivations and discussions.

Example 4 implements this estimator in Mata.

**Example 4: A robust VCE**

. mata: ------------------------------------------------- mata (type end to exit) ------ : M = quadcross(X, e2, X) : V = (n/(n-k))*XpXi*M*XpXi : sqrt(diagonal(V))' 1 2 3 +-------------------------------------------+ 1 | 72.45387946 71.45370224 2430.640607 | +-------------------------------------------+ : end --------------------------------------------------------------------------------

Using **quadcross(X, e2, X)** to compute **M** is more accurate and faster than looping over the observations. The accuracy comes from the quad precision offered by **quadcross()**. The speed comes from performing the loops in compiled C code instead of compiled Mata code. Mata is fast but C is faster, because C imposes much more structure and because C is compiled using much more platform-specific information than Mata.

**quadcross()** is also faster because it has been parallelized, like many Mata functions. For example, a call to **quadcross()** from Stata/MP with 2 processors will run about twice as fast as a call to **quadcross()** from Stata/SE when there are many rows in **X**. A detailed discussion of the performance increases offered by Mata in Stata/MP is a subject for another post.

I now verify that my computations match those reported by **regress**.

**Example 5: Comparing computations of robust VCE**

. regress price mpg trunk, vce(robust) Linear regression Number of obs = 74 F(2, 71) = 11.59 Prob > F = 0.0000 R-squared = 0.2222 Root MSE = 2637.6 ------------------------------------------------------------------------------ | Robust price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 72.45388 -3.04 0.003 -364.6338 -75.69595 trunk | 43.55851 71.4537 0.61 0.544 -98.91613 186.0331 _cons | 10254.95 2430.641 4.22 0.000 5408.39 15101.51 ------------------------------------------------------------------------------

**Cluster-robust standard errors**

The cluster-robust estimator of the **VCE** is frequently used when the data have a group structure, also known as a panel structure or as a longitudinal structure. This **VCE** accounts for the within-group correlation of the errors, and it is given by

\[

\widehat{V}_{cluster}=\frac{N-1}{N-k}\frac{g}{g-1}

\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}

\Mb_c

\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}

\]

where

\[

\Mb_c=\sum_{j=1}^g

\Xb_j’

(\widehat{\eb}_j \widehat{\eb}_j’)

\Xb_j

\]

\(\Xb_j\) is the \(n_j\times k\) matrix of observations on \(\xb_i\) in group \(j\), \(\widehat{\eb}_j\) is the \(n_j\times 1\) vector of residuals in group \(j\), and \(g\) is the number of groups. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.

Computing \(\Mb_c\) requires sorting the data by group. I use **rep78**, with the missing values replaced by **6**, as the group variable in my example. In example 6, I sort the dataset in Stata, put a copy of the observations on the modified **rep78** into the column vector **id**, and recompute the **OLS** objects that I need. I could have sorted the dataset in Mata, but I usually sort it in Stata, so that is what I illustrated. Type help mf_sort for sorting in Mata. In a real program, I would not need to recompute everything. I do here because I did not want to discuss the group variable or sorting the dataset until I discussed cluster-robust standard errors.

**Example 6: Setup for computing M**

. replace rep78=6 if missing(rep78) (5 real changes made) . sort rep78 . mata: ------------------------------------------------- mata (type end to exit) ------ : id = st_data(., "rep78") : y = st_data(., "price") : X = st_data(., "mpg trunk") : n = rows(X) : X = X,J(n,1,1) : k = cols(X) : XpX = quadcross(X, X) : XpXi = invsym(XpX) : b = XpXi*quadcross(X, y) : e = y - X*b : end --------------------------------------------------------------------------------

The Mata function **panelsetup(Q,p)** returns a matrix describing the group structure of the data when **Q** is sorted by the group variable in column **p**. I illustrate this function in example 7.

**Example 7: panelsetup()**

. list rep78 if rep78<3, sepby(rep78) +-------+ | rep78 | |-------| 1. | 1 | 2. | 1 | |-------| 3. | 2 | 4. | 2 | 5. | 2 | 6. | 2 | 7. | 2 | 8. | 2 | 9. | 2 | 10. | 2 | +-------+ . mata: ------------------------------------------------- mata (type end to exit) ------ : info = panelsetup(id, 1) : info 1 2 +-----------+ 1 | 1 2 | 2 | 3 10 | 3 | 11 40 | 4 | 41 58 | 5 | 59 69 | 6 | 70 74 | +-----------+ : end --------------------------------------------------------------------------------

I begin by listing out the group variable, **rep78**, in Stata for the first two groups. I then use **panelsetup()** to create **info**, which has one row for each group with the first column containing the first row of that group and the second column containing the second row of that group. I display **info** to illustrate what it contains. The first row of **info** specifies that the first group starts in row 1 and ends in row 2, which matches the results produced by **list**. The second row of **info** specifies that the second group starts in row 3 and ends in row 10, which also matches the results produced by **list**.

Having created **info**, I can use it and the **panelsubmatrix()** to compute \(\Mb_c\).

**Example 8: A cluster-robust VCE**

. mata: ------------------------------------------------- mata (type end to exit) ------ : nc = rows(info) : M = J(k, k, 0) : for(i=1; i<=nc; i++) { > xi = panelsubmatrix(X,i,info) > ei = panelsubmatrix(e,i,info) > M = M + xi'*(ei*ei')*xi > } : V = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi : sqrt(diagonal(V))' 1 2 3 +-------------------------------------------+ 1 | 93.28127184 58.89644366 2448.547376 | +-------------------------------------------+ : end --------------------------------------------------------------------------------

After storing the number of groups in **nc**, I created an initial **M** to be a **k** \(\times\) **k** matrix of zeros. For each group, I used **panelsubmatrix()** to extract the covariate for that group from **X**, I used **panelsubmatrix()** to extract the residuals for that group from **e**, and I added that group's contribution into **M**. After looping over the groups, I computed **V** and displayed the standard errors.

I now verify that my computations match those reported by **regress**.

**Example 9: Comparing computations of cluster-robust VCE**

. regress price mpg trunk, vce(cluster rep78) Linear regression Number of obs = 74 F(2, 5) = 9.54 Prob > F = 0.0196 R-squared = 0.2222 Root MSE = 2637.6 (Std. Err. adjusted for 6 clusters in rep78) ------------------------------------------------------------------------------ | Robust price | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | -220.1649 93.28127 -2.36 0.065 -459.952 19.62226 trunk | 43.55851 58.89644 0.74 0.493 -107.8396 194.9566 _cons | 10254.95 2448.547 4.19 0.009 3960.758 16549.14 ------------------------------------------------------------------------------

**Done and undone**

I reviewed the formulas that underlie the **OLS** estimator and showed how to compute them in Mata. In the next two posts, I write an ado-command that implements these formulas.

**References**

Cameron, A. C., and P. K. Trivedi. 2005. *Microeconometrics: Methods and applications*. Cambridge: Cambridge University Press.

Lange, K. 2010. *Numerical Analysis for Statisticians*. 2nd ed. New York: Springer.

Stock, J. H., and M. W. Watson. 2010. *Introduction to Econometrics*. 3rd ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, Massachusetts: MIT Press.

Wooldridge, J. M. 2015. *Introductory Econometrics: A Modern Approach*. 6th ed. Cincinnati, Ohio: South-Western.