Home > Programming > Programming an estimation command in Stata: Computing OLS objects in Mata

Programming an estimation command in Stata: Computing OLS objects in Mata

\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}
\newcommand{\ebi}{\boldsymbol{\epsilon}_i}
\newcommand{\Sigmab}{\boldsymbol{\Sigma}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\eb}{{\bf e}}
\newcommand{\xb}{{\bf x}}
\newcommand{\xbit}{{\bf x}_{it}}
\newcommand{\xbi}{{\bf x}_{i}}
\newcommand{\zb}{{\bf z}}
\newcommand{\zbi}{{\bf z}_i}
\newcommand{\wb}{{\bf w}}
\newcommand{\yb}{{\bf y}}
\newcommand{\ub}{{\bf u}}
\newcommand{\Xb}{{\bf X}}
\newcommand{\Mb}{{\bf M}}
\newcommand{\Xtb}{\tilde{\bf X}}
\newcommand{\Wb}{{\bf W}}
\newcommand{\Vb}{{\bf V}}\)I present the formulas for computing the ordinary least-squares (OLS) estimator and show how to compute them in Mata. This post is a Mata version of Programming an estimation command in Stata: Using Stata matrix commands and functions to compute OLS objects. I discuss the formulas and the computation of independence-based standard errors, robust standard errors, and cluster-robust standard errors.

This is the fourteenth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

OLS formulas

Recall that the OLS point estimates are given by

\[
\widehat{\betab} =
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\left(
\sum_{i=1}^N \xb_i’y_i
\right)
\]

where \(\xb_i\) is the \(1\times k\) vector of independent variables, \(y_i\) is the dependent variable for each of the \(N\) sample observations, and the model for \(y_i\) is

\[
y_i = \xb_i\betab’ + \epsilon_i
\]

If the \(\epsilon_i\) are independently and identically distributed (IID), we estimate the variance-covariance matrix of the estimator (VCE) by

\[
\widehat{\Vb} = \widehat{s}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]

where \(\widehat{s} = 1/(N-k)\sum_{i=1}^N e_i^2\) and \(e_i=y_i-\xb_i\widehat{\betab}\). See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for introductions to OLS.

Mata implementation

I compute the OLS point estimates in Mata in example 1.

Example 1: Computing OLS point estimates in Mata

. sysuse auto
(1978 Automobile Data)

. mata:
------------------------------------------------- mata (type end to exit) ------
: y    = st_data(., "price")

: X    = st_data(., "mpg trunk")

: n    = rows(X)

: X    = X,J(n,1,1)

: XpX  = quadcross(X, X)

: XpXi = invsym(XpX)

: b    = XpXi*quadcross(X, y)

: end
--------------------------------------------------------------------------------

I used st_data() to put a copy of the observations on price into the Mata vector y and to put a copy of the observations on mpg and trunk into the Mata matrix X. I used rows(X) to put the number of observations into n. After adding a column of ones onto X for the constant term, I used quadcross() to calculate \(\Xb’\Xb\) in quad precision. After using invsym() to calculate the inverse of the symmetric matrix XpXi, I calculated the point estimates from the OLS formula.

In example 1, I computed the OLS point estimates after forming the cross products. As discussed in Lange (2010, chapter 7), I could compute more accurate estimates using a QR decomposition; type help mf_qrd for details about computing QR decompositions in Mata. By computing the cross products in quad precision, I obtained point estimates that are almost as accurate as those obtainable from a QR decomposition in double precision, but that is a topic for another post.

Here are the point estimates I computed in Mata and comparable results from regress.

Example 2: Results from Mata and regress

. mata: b'
                  1              2              3
    +----------------------------------------------+
  1 |  -220.1648801    43.55851009    10254.94983  |
    +----------------------------------------------+

. regress price mpg trunk

      Source |       SS           df       MS      Number of obs   =        74
-------------+----------------------------------   F(2, 71)        =     10.14
       Model |   141126459         2  70563229.4   Prob > F        =    0.0001
    Residual |   493938937        71  6956886.44   R-squared       =    0.2222
-------------+----------------------------------   Adj R-squared   =    0.2003
       Total |   635065396        73  8699525.97   Root MSE        =    2637.6

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -350.9529    -89.3769
       trunk |   43.55851   88.71884     0.49   0.625    -133.3418    220.4589
       _cons |   10254.95   2349.084     4.37   0.000      5571.01    14938.89
------------------------------------------------------------------------------

Given the OLS point estimates, I can now compute the IID estimator of the VCE.

Example 3: Computing the IID VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: e    = y - X*b

: e2   = e:^2

: k    = cols(X)

: V    = (quadsum(e2)/(n-k))*XpXi

: sqrt(diagonal(V))'
                 1             2             3
    +-------------------------------------------+
  1 |  65.59262431   88.71884015    2349.08381  |
    +-------------------------------------------+

: end
--------------------------------------------------------------------------------

I put the residuals into the Mata vector e, which I subsequently element-wise squared. I used cols(X) to put the number of covariates into k. I used quadsum() to compute the sum of the squared residuals in quad precision when computing V, an IID estimator for the VCE. The standard errors displayed by sqrt(diagonal(V)) are the same as the ones displayed by regress in example 2.

Robust standard errors

The frequently used robust estimator of the VCE is given by

\[
\widehat{V}_{robust}=\frac{N}{N-k}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]

where
\[\Mb=\sum_{i=1}^N \widehat{e}_i^2\xb_i’\xb_i\]

See Cameron and Trivedi (2005), Stock and Watson (2010), or Wooldridge (2015) for derivations and discussions.

Example 4 implements this estimator in Mata.

Example 4: A robust VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: M    = quadcross(X, e2, X)

: V    = (n/(n-k))*XpXi*M*XpXi

: sqrt(diagonal(V))'
                 1             2             3
    +-------------------------------------------+
  1 |  72.45387946   71.45370224   2430.640607  |
    +-------------------------------------------+

: end
--------------------------------------------------------------------------------

Using quadcross(X, e2, X) to compute M is more accurate and faster than looping over the observations. The accuracy comes from the quad precision offered by quadcross(). The speed comes from performing the loops in compiled C code instead of compiled Mata code. Mata is fast but C is faster, because C imposes much more structure and because C is compiled using much more platform-specific information than Mata.

quadcross() is also faster because it has been parallelized, like many Mata functions. For example, a call to quadcross() from Stata/MP with 2 processors will run about twice as fast as a call to quadcross() from Stata/SE when there are many rows in X. A detailed discussion of the performance increases offered by Mata in Stata/MP is a subject for another post.

I now verify that my computations match those reported by regress.

Example 5: Comparing computations of robust VCE

. regress price mpg trunk, vce(robust)

Linear regression                               Number of obs     =         74
                                                F(2, 71)          =      11.59
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   72.45388    -3.04   0.003    -364.6338   -75.69595
       trunk |   43.55851    71.4537     0.61   0.544    -98.91613    186.0331
       _cons |   10254.95   2430.641     4.22   0.000      5408.39    15101.51
------------------------------------------------------------------------------

Cluster-robust standard errors

The cluster-robust estimator of the VCE is frequently used when the data have a group structure, also known as a panel structure or as a longitudinal structure. This VCE accounts for the within-group correlation of the errors, and it is given by

\[
\widehat{V}_{cluster}=\frac{N-1}{N-k}\frac{g}{g-1}
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\Mb_c
\left( \sum_{i=1}^N \xb_i’\xb_i \right)^{-1}
\]

where
\[
\Mb_c=\sum_{j=1}^g
\Xb_j’
(\widehat{\eb}_j \widehat{\eb}_j’)
\Xb_j
\]

\(\Xb_j\) is the \(n_j\times k\) matrix of observations on \(\xb_i\) in group \(j\), \(\widehat{\eb}_j\) is the \(n_j\times 1\) vector of residuals in group \(j\), and \(g\) is the number of groups. See Cameron and Trivedi (2005), Wooldridge (2010), and [R] regress for derivations and discussions.

Computing \(\Mb_c\) requires sorting the data by group. I use rep78, with the missing values replaced by 6, as the group variable in my example. In example 6, I sort the dataset in Stata, put a copy of the observations on the modified rep78 into the column vector id, and recompute the OLS objects that I need. I could have sorted the dataset in Mata, but I usually sort it in Stata, so that is what I illustrated. Type help mf_sort for sorting in Mata. In a real program, I would not need to recompute everything. I do here because I did not want to discuss the group variable or sorting the dataset until I discussed cluster-robust standard errors.

Example 6: Setup for computing M

. replace rep78=6 if missing(rep78)
(5 real changes made)

. sort rep78

. mata:
------------------------------------------------- mata (type end to exit) ------
: id   = st_data(., "rep78")

: y    = st_data(., "price")

: X    = st_data(., "mpg trunk")

: n    = rows(X)

: X    = X,J(n,1,1)

: k    = cols(X)

: XpX  = quadcross(X, X)

: XpXi = invsym(XpX)

: b    = XpXi*quadcross(X, y)

: e    = y - X*b

: end
--------------------------------------------------------------------------------

The Mata function panelsetup(Q,p) returns a matrix describing the group structure of the data when Q is sorted by the group variable in column p. I illustrate this function in example 7.

Example 7: panelsetup()

. list rep78 if rep78<3, sepby(rep78)

     +-------+
     | rep78 |
     |-------|
  1. |     1 |
  2. |     1 |
     |-------|
  3. |     2 |
  4. |     2 |
  5. |     2 |
  6. |     2 |
  7. |     2 |
  8. |     2 |
  9. |     2 |
 10. |     2 |
     +-------+

. mata:
------------------------------------------------- mata (type end to exit) ------
: info = panelsetup(id, 1)

: info
        1    2
    +-----------+
  1 |   1    2  |
  2 |   3   10  |
  3 |  11   40  |
  4 |  41   58  |
  5 |  59   69  |
  6 |  70   74  |
    +-----------+

: end
--------------------------------------------------------------------------------

I begin by listing out the group variable, rep78, in Stata for the first two groups. I then use panelsetup() to create info, which has one row for each group with the first column containing the first row of that group and the second column containing the second row of that group. I display info to illustrate what it contains. The first row of info specifies that the first group starts in row 1 and ends in row 2, which matches the results produced by list. The second row of info specifies that the second group starts in row 3 and ends in row 10, which also matches the results produced by list.

Having created info, I can use it and the panelsubmatrix() to compute \(\Mb_c\).

Example 8: A cluster-robust VCE

. mata:
------------------------------------------------- mata (type end to exit) ------
: nc   = rows(info)

: M    = J(k, k, 0)

: for(i=1; i<=nc; i++) {
>     xi = panelsubmatrix(X,i,info)
>     ei = panelsubmatrix(e,i,info)
>     M  = M + xi'*(ei*ei')*xi
> }

: V    = ((n-1)/(n-k))*(nc/(nc-1))*XpXi*M*XpXi

: sqrt(diagonal(V))'
                 1             2             3
    +-------------------------------------------+
  1 |  93.28127184   58.89644366   2448.547376  |
    +-------------------------------------------+

: end
--------------------------------------------------------------------------------

After storing the number of groups in nc, I created an initial M to be a k \(\times\) k matrix of zeros. For each group, I used panelsubmatrix() to extract the covariate for that group from X, I used panelsubmatrix() to extract the residuals for that group from e, and I added that group's contribution into M. After looping over the groups, I computed V and displayed the standard errors.

I now verify that my computations match those reported by regress.

Example 9: Comparing computations of cluster-robust VCE

. regress price mpg trunk, vce(cluster rep78)

Linear regression                               Number of obs     =         74
                                                F(2, 5)           =       9.54
                                                Prob > F          =     0.0196
                                                R-squared         =     0.2222
                                                Root MSE          =     2637.6

                                  (Std. Err. adjusted for 6 clusters in rep78)
------------------------------------------------------------------------------
             |               Robust
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   93.28127    -2.36   0.065     -459.952    19.62226
       trunk |   43.55851   58.89644     0.74   0.493    -107.8396    194.9566
       _cons |   10254.95   2448.547     4.19   0.009     3960.758    16549.14
------------------------------------------------------------------------------

Done and undone

I reviewed the formulas that underlie the OLS estimator and showed how to compute them in Mata. In the next two posts, I write an ado-command that implements these formulas.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and applications. Cambridge: Cambridge University Press.

Lange, K. 2010. Numerical Analysis for Statisticians. 2nd ed. New York: Springer.

Stock, J. H., and M. W. Watson. 2010. Introduction to Econometrics. 3rd ed. Boston, MA: Addison Wesley New York.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

Wooldridge, J. M. 2015. Introductory Econometrics: A Modern Approach. 6th ed. Cincinnati, Ohio: South-Western.