In the frequentist approach to statistics, estimators are random variables because they are functions of random data. The finite-sample distributions of most of the estimators used in applied work are not known, because the estimators are complicated nonlinear functions of random data. These estimators have large-sample convergence properties that we use to approximate their behavior in finite samples.

Two key convergence properties are consistency and asymptotic normality. A consistent estimator gets arbitrarily close in probability to the true value. The distribution of an asymptotically normal estimator gets arbitrarily close to a normal distribution as the sample size increases. We use a recentered and rescaled version of this normal distribution to approximate the finite-sample distribution of our estimators.

I illustrate the meaning of consistency and asymptotic normality by Monte Carlo simulation (MCS). I use some of the Stata mechanics I discussed in Monte Carlo simulations using Stata.

**Consistent estimator**

A consistent estimator gets arbitrarily close in probability to the true value as you increase the sample size. In other words, the probability that a consistent estimator is outside a neighborhood of the true value goes to zero as the sample size increases. Figure 1 illustrates this convergence for an estimator \(\theta\) at sample sizes 100, 1,000, and 5,000, when the true value is 0. As the sample size increases, the density is more tightly distributed around the true value. As the sample size becomes infinite, the density collapses to a spike at the true value.

**Figure 1: Densities of an estimator for sample sizes 100, 1,000, 5,000, and \(\infty\)**

I now illustrate that the sample average is a consistent estimator for the mean of an independently and identically distributed (i.i.d.) random variable with a finite mean and a finite variance. In this example, the data are i.i.d. draws from a \(\chi^2\) distribution with 1 degree of freedom. The true value is 1, because the mean of a \(\chi^2(1)\) is 1.

Code block 1 implements an MCS of the sample average for the mean from samples of size 1,000 of i.i.d. \(\chi^2(1)\) variates.

clear all set seed 12345 postfile sim m1000 using sim1000, replace forvalues i = 1/1000 { quietly capture drop y quietly set obs 1000 quietly generate y = rchi2(1) quietly summarize y quietly post sim (r(mean)) } postclose sim

Line 1 clears Stata, and line 2 sets the seed of the random number generator. Line 3 uses **postfile** to create a place in memory named **sim**, in which I store observations on the variable **m1000**, which will be the new dataset **sim1000**. Note that the keyword **using** separates the name of the new variable from the name of the new dataset. The **replace** option specifies that **sim1000.dta** be replaced, if it already exists.

Lines 5 and 11 use **forvalues** to repeat the code in lines 6–10 1,000 times. Each time through the **forvalues** loop, line 6 drops **y**, line 7 sets the number of observations to 1,000, line 8 generates a sample of size 1,000 of i.i.d. \(\chi^2(1)\) variates, line 9 estimates the mean of y in this sample, and line 10 uses **post** to store the estimated mean in what will be the new variable **m1000**. Line 12 writes everything stored in **sim** to the new dataset **sim100.dta**. See Monte Carlo simulations using Stata for more details about using **post** to implement an MCS in Stata.

In example 1, I run **mean1000.do** and then summarize the results.

**Example 1: Estimating the mean from a sample of size 1,000**

. do mean1000 . clear all . set seed 12345 . postfile sim m1000 using sim1000, replace . . forvalues i = 1/1000 { 2. quietly capture drop y 3. quietly set obs 1000 4. quietly generate y = rchi2(1) 5. quietly summarize y 6. quietly post sim (r(mean)) 7. } . postclose sim . . end of do-file . use sim1000, clear . summarize m1000 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- m1000 | 1,000 1.00017 .0442332 .8480308 1.127382

The mean of the 1,000 estimates is close to 1. The standard deviation of the 1,000 estimates is 0.0442, which measures how tightly the estimator is distributed around the true value of 1.

Code block 2 contains **mean100000.do**, which implements the analogous MCS with

a sample size of 100,000.

clear all // no seed, just keep drawing postfile sim m100000 using sim100000, replace forvalues i = 1/1000 { quietly capture drop y quietly set obs 100000 quietly generate y = rchi2(1) quietly summarize y quietly post sim (r(mean)) } postclose sim

Example 2 runs **mean100000.do** and summarizes the results.

**Example 2: Estimating the mean from a sample of size 100,000**

. do mean100000 . clear all . // no seed, just keep drawing . postfile sim m100000 using sim100000, replace . . forvalues i = 1/1000 { 2. quietly capture drop y 3. quietly set obs 100000 4. quietly generate y = rchi2(1) 5. quietly summarize y 6. quietly post sim (r(mean)) 7. } . postclose sim . . end of do-file . use sim100000, clear . summarize m100000 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- m100000 | 1,000 1.000008 .0043458 .9837129 1.012335

The standard deviation of 0.0043 indicates that the distribution of the estimator with a sample size 100,000 is much more tightly distributed around the true value of 1 than the estimator with a sample size of 1,000.

Example 3 merges the two datasets of estimates and plots the densities of the estimator for the two sample sizes in figure 2. The distribution of the estimator for the sample size of 100,000 is much tighter around 1 than the estimator for the sample size of 1,000.

**Example 3: Densities of sample-average estimator for 1,000 and 100,000**

. merge 1:1 _n using sim1000 Result # of obs. ----------------------------------------- not matched 0 matched 1,000 (_merge==3) ----------------------------------------- . kdensity m1000, n(500) generate(x_1000 f_1000) kernel(gaussian) nograph . label variable f_1000 "N=1000" . kdensity m100000, n(500) generate(x_100000 f_100000) kernel(gaussian) nograph . label variable f_100000 "N=100000" . graph twoway (line f_1000 x_1000) (line f_100000 x_100000)

**Figure 2: Densities of the sample-average estimator for sample sizes 1,000 and 100,000**

The sample average is a consistent estimator for the mean of an i.i.d. \(\chi^2(1)\) random variable because a weak law of large numbers applies. This theorem specifies that the sample average converges in probability to the true mean if the data are i.i.d., the mean is finite, and the variance is finite. Other versions of this theorem weaken the i.i.d. assumption or the moment assumptions, see Cameron and Trivedi (2005, sec. A.3), Wasserman (2003, sec. 5.3), and Wooldridge (2010, 41–42) for details.

**Asymptotic normality**

So the good news is that distribution of a consistent estimator is arbitrarily tight around the true value. The bad news is the distribution of the estimator changes with the sample size, as illustrated in figures 1 and 2.

If I knew the distribution of my estimator for every sample size, I could use it to perform inference using this finite-sample distribution, also known as the exact distribution. But the finite-sample distribution of most of the estimators used in applied research is unknown. Fortunately, the distributions of a recentered and rescaled version of these estimators gets arbitrarily close to a normal distribution as the sample size increases. Estimators for which a recentered and rescaled version converges to a normal distribution are said to be asymptotically normal. We use this large-sample distribution to approximate the finite-sample distribution of the estimator.

Figure 2 shows that the distribution of the sample average becomes increasingly tight around the true value as the sample size increases. Instead of looking at the distribution of the estimator \(\widehat{\theta}_N\) for sample size \(N\), let’s look at the distribution of \(\sqrt{N}(\widehat{\theta}_N – \theta_0)\), where \(\theta_0\) is the true value for which \(\widehat{\theta}_N\) is consistent.

Example 4 estimates the densities of the recentered and rescaled estimators, which are shown in figure 3.

**Example 4: Densities of the recentered and rescaled estimator**

. generate double m1000n = sqrt(1000)*(m1000 - 1) . generate double m100000n = sqrt(100000)*(m100000 - 1) . kdensity m1000n, n(500) generate(x_1000n f_1000n) kernel(gaussian) nograph . label variable f_1000n "N=1000" . kdensity m100000n, n(500) generate(x_100000n f_100000n) kernel(gaussian) /// > nograph . label variable f_100000n "N=100000" . graph twoway (line f_1000n x_1000n) (line f_100000n x_100000n)

**Figure 3: Densities of the recentered and rescaled estimator for sample sizes 1,000 and 100,000**

The densities of the recentered and rescaled estimators in figure 3 are indistinguishable from each and look close to a normal density. The Lindberg–Levy central limit theorem guarantees that the distribution of the recentered and rescaled sample average of i.i.d. random variables with finite mean \(\mu\) and finite variance \(\sigma^2\) gets arbitrarily closer to a normal distribution with mean 0 and variance \(\sigma^2\) as the sample size increases. In other words, the distribution of \(\sqrt{N}(\widehat{\theta}_N-\mu)\) gets arbitrarily close to a \(N(0,\sigma^2)\) distribution as \(\rightarrow\infty\), where \(\widehat{\theta}_N=1/N\sum_{i=1}^N y_i\) and \(y_i\) are realizations of the i.i.d. random variable. This convergence in distribution justifies our use of the distribution \(\widehat{\theta}_N\sim N(\mu,\frac{\sigma^2}{N})\) in practice.

Given that \(\sigma^2=2\) for the \(\chi^2(1)\) distribution, in example 5, we add a plot of a normal density with mean 0 and variance 2 for comparison.

**Example 5: Densities of the recentered and rescaled estimator**

. twoway (line f_1000n x_1000n) /// > (line f_100000n x_100000n) /// > (function normalden(x, sqrt(2)), range(-4 4)) /// > ,legend( label(3 "Normal(0, 2)") cols(3))

We see that the densities of recentered and rescaled estimators are indistinguishable from the density of a normal distribution with mean 0 and variance 2, as predicted by the theory.

**Figure 4: Densities of the recentered and rescaled estimates and a Normal(0,2)**

Other versions of the central limit theorem weaken the i.i.d. assumption or the moment assumptions, see Cameron and Trivedi (2005, sec. A.3), Wasserman (2003, sec. 5.3), and Wooldridge (2010, 41–42) for details.

**Done and undone**

I used MCS to illustrate that the sample average is consistent and asymptotically normal for data drawn from an i.i.d. process with finite mean and variance.

Many method-of-moments estimators, maximum likelihood estimators, and M-estimators are consistent and asymptotically normal under assumptions about the true data-generating process and the estimators themselves. See Cameron and Trivedi (2005, sec. 5.3), Newey and McFadden (1994), Wasserman (2003, chap. 9), and Wooldridge (2010, chap. 12) for discussions.

Cameron, A. C., and P. K. Trivedi. 2005. *Microeconometrics: Methods and Applications*. Cambridge: Cambridge University Press.

Newey, W. K., and D. McFadden. 1994. Large sample estimation and hypothesis testing. In *Handbook of Econometrics*, ed. R. F. Engle and D. McFadden, vol. 4, 2111–2245. Amsterdam: Elsevier.

Wasserman, L. A. 2003. *All of Statistics: A Concise Course in Statistical Inference*. New York: Springer.

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, Massachusetts: MIT Press.

\newcommand{\betab}{\boldsymbol{\beta}}\)Before you use or distribute your estimation command, you should verify that it produces correct results and write a do-file that certifies that it does so. I discuss the processes of verifying and certifying an estimation command, and I present some techniques for writing a do-file that certifies

This is the twenty-fifth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Verification versus certification**

Verification is the process of establishing that a command produces the correct results. Verfication produces true values that can be compared with the values produced by a command. Certification is the process of checking that the differences between the verified true results and the results produced by a command are sufficiently small.

Verification can be easy or difficult. If there is another command or program that you trust, you can use it to create verified values. For example, I trust the **poisson** command, so I can use it to create true test-case values for **mypoisson5**. When another command or program is not available, I use simulation techniques to obtain certified values. See Monte Carlo simulations using Stata and Efficiency comparisons by Monte Carlo simulation for discussions of Monte Carlo simulations.

I certify a command by writing do-files that check that the results of a command are close to the verified true values in many specific cases. These do-files are called certification scripts, and I run them every time I make any change to my command. The process that I present is a greatly simplified version of that used to certify Stata; see Gould (2001) for another introduction to certification and for more about Stata certification.

**Comparing numbers**

The **assert** command checks that a logical expression is true. Here I use it to check that two integer values are equal or that two noninteger values are sufficiently close.

I check for equality between integer values and closeness between noninteger values because of how computers do math. You cannot fit the entire real-number line on a computer; there are too many real numbers. Computers use finite-precision base-two approximations to the real numbers. Integers have an exact representation in this approximation, and integer calculations can be performed without approximation error. Most noninteger values do not have an exact representation in the base-two approximation used on computers, and noninteger calculations are performed with approximation error. See The Penultimate Guide to Precision for details.

Example 1 illustrates that **assert** produces no output or error when asked to assert that a true logical expression is true.

**Example 1: assert a true expression**

. assert 3==3

In contrast, example 2 illustrates that **assert** produces an error when asked to assert that a false logical expression is true.

**Example 2: assert a false expression**

. assert 3==2 assertion is false r(9);

In example 3, I use **mreldif()** to compute the maximum of the element-wise relative differences between two integer-valued vectors, and I then use **assert** to check for equality.

**Example 3: assert and mreldif()**

. matrix a = (1, 2, 3) . matrix b = a . display mreldif(a, b) 0 . assert mreldif(a,b)==0

Examples 1–3 illustrated **assert** by comparing integers. In certification, we usually compare noninteger values. Because of finite-precision approximation errors, a small change in how the results are computed—such as using 1 processor instead of 8 processors in Stata/MP, changing the sort order of the data, or using a Mac instead of a PC—may cause the results to change slightly. These changes can be surprising if your intuition is guided by infinite-precision math, but they should be small enough to be ignorable. Example 4 illustrates this point by comparing the point estimates obtained using 8 processors and 1 processor.

**Example 4: The effect of using 1 instead of 8 processors**

. clear all . use accident3 . gsort - cvalue . set processors 8 The maximum number of processors or cores being used is changed from 1 to 8. It can be set to any number between 1 and 8 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . set processors 1 The maximum number of processors or cores being used is changed from 8 to 1. It can be set to any number between 1 and 8 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 2.420e-17

Because differences like these are unimportant, I check that the computed results are close to the verified results instead of requiring that they be exactly the same. (You might not see these differences if you run this example on your 8-processor machine. The differences depend on what else your computer is doing when you run the example.)

**Writing a certification script**

I routinely use the four following techniques to write a certification script:

- I check that my command reproduces results that I have previously verified.
- I check that my command produces results that are close to those produced by a series of hand calculations.
- I check my command against itself.
- I check that my command produces results sufficiently close to another Stata command.

**Certifying my command against previously verified results**

Consider the results produced by **mypoisson5.ado** displayed in example 5.

**Example 5: mypoisson5 results**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------

The results displayed in example 5 are stored in **e()**. I previously verified that these results are correct by comparing the higher-precision results displayed in example 6 with other results.

**Example 6: mypoisson5 e() results**

. ereturn list scalars: e(N) = 505 e(rank) = 4 macros: e(cmd) : "mypoisson5" e(predict) : "mypoisson5_p" e(properties) : "b V" matrices: e(b) : 1 x 4 e(V) : 4 x 4 functions: e(sample) . matrix list e(b), format(%16.15g) e(b)[1,4] cvalue kids traffic _cons y1 -.65588706902223 -1.0090169724739 .1467114650851 .57435412474423

I could use the results from example 6 to create a certification script like **test1.do**.

clear all use accident3 mypoisson5 accidents cvalue kids traffic matrix b1 = e(b) matrix btrue = (-.65588706902223, -1.0090169724739, /// .1467114650851, .57435412474423) display mreldif(b1, btrue) assert mreldif(b1, btrue) < 1e-14

Running **test1.do** produces

**Example 7: test1**

. do test1 . clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . matrix btrue = (-.65588706902223, -1.0090169724739, /// > .1467114650851, .57435412474423) . display mreldif(b1, btrue) 6.742e-15 . assert mreldif(b1, btrue) < 1e-14 . . end of do-file

Note the process. After verifing the results produced by **mypoisson5**, I write a certification script to ensure that **mypoisson5** will always produce approximately these numbers. Following this process protects me from accidentally causing my command to produce incorrect results as I make it "better" or faster. Do not underestimate the importance of this protection. Putting bugs into your calculations as you attempt to improve your command is remarkably easy. This process also documents that I have checked this particular case. If someone claims to have a program that differs from mine in this case, I can ask that person to compute the results for this example in which I know that my command works. This request almost always yields a discussion in which that person debugs his or her own program so that it produces my verified results.

Here I copied and pasted numbers from a log file into a do-file to create **test1.do**. The copy-paste method is error–prone, tedious, and should be avoided. The **mkassert** command solves this problem. **mkassert** creates **assert** commands that certify results stored in **e()**, **r()**, the dataset, or other Stata objects. I use it all the time.

I begin writing a certification script using **mkassert** with code like that in **test2.do**, whose output appears in example 8.

clear all use accident3 mypoisson5 accidents cvalue kids traffic mkassert eclass, mtol(1e-12) saving(test3.do, replace)

**Example 8: Using mkassert**

. do test2 . clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . mkassert eclass, mtol(1e-12) saving(test3.do, replace) . end of do-file

**test2.do** produces results that I previously verified and uses **mkassert** to write an **assert** command for every result stored in **e()** in the file **test3.do**, which is in code block 2.

assert `"`e(cmd)'"' == `"mypoisson5"' assert `"`e(predict)'"' == `"mypoisson5_p"' assert `"`e(properties)'"' == `"b V"' assert e(N) == 505 assert e(rank) == 4 qui { mat T_b = J(1,4,0) mat T_b[1,1] = -.6558870690222316 mat T_b[1,2] = -1.009016972473914 mat T_b[1,3] = .1467114650851019 mat T_b[1,4] = .5743541247442324 } matrix C_b = e(b) assert mreldif( C_b , T_b ) < 1e-12 _assert_streq `"`: rowfullnames C_b'"' `"y1"' _assert_streq `"`: colfullnames C_b'"' `"cvalue kids traffic _cons"' mat drop C_b T_b qui { mat T_V = J(4,4,0) mat T_V[1,1] = .0049911902167341 mat T_V[1,2] = .0002953642487161 mat T_V[1,3] = -.0000506909358346 mat T_V[1,4] = -.0089523155601508 mat T_V[2,1] = .0002953642487161 mat T_V[2,2] = .0065280055261688 mat T_V[2,3] = .0002050149836939 mat T_V[2,4] = -.0054776138886792 mat T_V[3,1] = -.0000506909358346 mat T_V[3,2] = .0002050149836939 mat T_V[3,3] = .0009844631577381 mat T_V[3,4] = -.0075052131640854 mat T_V[4,1] = -.0089523155601508 mat T_V[4,2] = -.0054776138886792 mat T_V[4,3] = -.0075052131640854 mat T_V[4,4] = .0806284655627814 } matrix C_V = e(V) assert mreldif( C_V , T_V ) < 1e-12 _assert_streq `"`: rowfullnames C_V'"' `"cvalue kids traffic _cons"' _assert_streq `"`: colfullnames C_V'"' `"cvalue kids traffic _cons"' mat drop C_V T_V

Each **assert** command checks that what is currently in **e()** is sufficiently close to the corresponding value stored in **e()** by **mypoisson5**. Lines 2–4 check the local macros, lines 6–7 check the scalars, lines 9–20 check **e(b)**, and lines 22–45 check **e(V)**.

I create the script **test4.do**, which checks this case by replacing the **mkassert** command in **test2.do** with the **assert** commands it created in **test3.do**; see code block 4.

// Test case 1 clear all use accident3 mypoisson5 accidents cvalue kids traffic assert `"`e(cmd)'"' == `"mypoisson5"' assert `"`e(predict)'"' == `"mypoisson5_p"' assert `"`e(properties)'"' == `"b V"' assert e(N) == 505 assert e(rank) == 4 qui { mat T_b = J(1,4,0) mat T_b[1,1] = -.6558870690222316 mat T_b[1,2] = -1.009016972473914 mat T_b[1,3] = .1467114650851019 mat T_b[1,4] = .5743541247442324 } matrix C_b = e(b) assert mreldif( C_b , T_b ) < 1e-12 _assert_streq `"`: rowfullnames C_b'"' `"y1"' _assert_streq `"`: colfullnames C_b'"' `"cvalue kids traffic _cons"' mat drop C_b T_b qui { mat T_V = J(4,4,0) mat T_V[1,1] = .0049911902167341 mat T_V[1,2] = .0002953642487161 mat T_V[1,3] = -.0000506909358346 mat T_V[1,4] = -.0089523155601508 mat T_V[2,1] = .0002953642487161 mat T_V[2,2] = .0065280055261688 mat T_V[2,3] = .0002050149836939 mat T_V[2,4] = -.0054776138886792 mat T_V[3,1] = -.0000506909358346 mat T_V[3,2] = .0002050149836939 mat T_V[3,3] = .0009844631577381 mat T_V[3,4] = -.0075052131640854 mat T_V[4,1] = -.0089523155601508 mat T_V[4,2] = -.0054776138886792 mat T_V[4,3] = -.0075052131640854 mat T_V[4,4] = .0806284655627814 } matrix C_V = e(V) assert mreldif( C_V , T_V ) < 1e-12 _assert_streq `"`: rowfullnames C_V'"' `"cvalue kids traffic _cons"' _assert_streq `"`: colfullnames C_V'"' `"cvalue kids traffic _cons"' mat drop C_V T_V

Every time I run **test4.do**, it checks that **mypoisson5** produces correct results for this one case. The more cases that I verify and certify, the more certain I am that my command works.

I summarize this important process below.

- I write a do-file, here called
**test2.do**, that produces results for a case in which I have verified that my command produces correct results. - At the end of
**test2.do**, I use**mkassert**to create another do-file, here called**test3.do**, that contains**assert**commands for each result that my command stored in**e()**. - I replace the
**mkassert**command in**test2.do**with the commands it created in**test3.do**to create the certification script, here called**test4.do**.

This method assumes that I have already verified that my command produces correct results for a specific example. The common case of verification by simulation makes this method more applicable than you might think.

**Certifying my command against hand-calculated results**

I can almost always find another way to compute estimation results in Stata that should be numerically equivalent. In the Poisson–regression case at hand, I can use **gmm**. As discussed by Cameron and Trivedi (2005) and Wooldridge (2010), Poisson regression finds the \(\widehat{\betab}\) that solves the score equations,

$$

\sum_{i=1}^N \left[y_i - \exp(\xb_i\widehat{\betab})\right]\xb_i = {\bf 0}

$$

We showed how to use the **gmm** for similar problems in Understanding the generalized method of moments (GMM): A simple example. In example 9, I use **gmm**, and I use **assert** to check that the point estimates produced by **gmm** and **mypoisson5** are sufficiently close.

**Example 9: Using gmm to certify mypoisson5**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . gmm (accidents - exp({xb:cvalue kids traffic _cons})), /// > instruments(cvalue kids traffic) onestep Step 1 Iteration 0: GMM criterion Q(b) = .57041592 Iteration 1: GMM criterion Q(b) = .01710408 Iteration 2: GMM criterion Q(b) = .00015313 Iteration 3: GMM criterion Q(b) = 2.190e-08 Iteration 4: GMM criterion Q(b) = 3.362e-16 note: model is exactly identified GMM estimation Number of parameters = 4 Number of moments = 4 Initial weight matrix: Unadjusted Number of obs = 505 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .1094934 -5.99 0.000 -.8704901 -.441284 kids | -1.009017 .1884791 -5.35 0.000 -1.378429 -.6396047 traffic | .1467115 .0923401 1.59 0.112 -.0342718 .3276947 _cons | .5743542 .6039059 0.95 0.342 -.6092797 1.757988 ------------------------------------------------------------------------------ Instruments for equation 1: cvalue kids traffic _cons . matrix b2 = e(b) . display mreldif(b1, b2) 5.554e-08 . assert mreldif(b1, b2) <1e-7

I used a weak tolerance when comparing the two vectors of point estimates because the commands use different algorithms to find their solutions. If I reduced the convergence tolerance in each command, the solutions would be closer to each other.

For a real certification script, I would also check everything else stored in **e()** by **mypoisson5** against a value computed by **gmm**. I skip these details to present other methods.

**Certifying my command against itself**

Almost all estimation commands accept **if** or **in** sample restrictions, and these restrictions can usually be tested by comparing other results produced by the same command. Example 10 provides an example.

**Example 10: Testing a command against itself**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic if cvalue <=3 Iteration 0: f(p) = -712.62548 Iteration 1: f(p) = -540.56297 Iteration 2: f(p) = -529.54572 Iteration 3: f(p) = -529.44627 Iteration 4: f(p) = -529.44618 Iteration 5: f(p) = -529.44618 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.3646368 .0872777 -4.18 0.000 -.5356979 -.1935756 kids | -.9874777 .0805708 -12.26 0.000 -1.145394 -.8295618 traffic | .1488243 .0317338 4.69 0.000 .0866272 .2110214 _cons | .1081705 .3015328 0.36 0.720 -.4828229 .6991638 ------------------------------------------------------------------------------ . matrix b1 = e(b) . keep if cvalue <=3 (121 observations deleted) . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -712.62548 Iteration 1: f(p) = -540.56297 Iteration 2: f(p) = -529.54572 Iteration 3: f(p) = -529.44627 Iteration 4: f(p) = -529.44618 Iteration 5: f(p) = -529.44618 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.3646368 .0872777 -4.18 0.000 -.5356979 -.1935756 kids | -.9874777 .0805708 -12.26 0.000 -1.145394 -.8295618 traffic | .1488243 .0317338 4.69 0.000 .0866272 .2110214 _cons | .1081705 .3015328 0.36 0.720 -.4828229 .6991638 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 0 . assert mreldif(b1, b2) <1e-14

I begin by storing the point estimates obtained from the sample in which **cvalue<=3** in **b1**. Next, I keep only these observations in the sample and use **mypoisson5** without an **if** restriction to compute the point estimates stored in **b2**. Finally, I assert that **b1** and **b2** are sufficiently close. In this case, the results are exactly the same, but I only test that they are close because I should not rely on this equality. (I am using Stata/MP, and other jobs on my computer could change the number of processors I effectively have, which can cause the results to change slightly.)

An analogous process works for testing **in** restrictions and integer weights.

**Certifying my command against another Stata command**

Sometimes constraining a parameter in the new estimator produces the same results as another estimator already implemented in Stata. For example, a random-effects estimator may reduce to a cross-sectional estimator when the variance of the random-effect is constrained to zero.

In the case at hand, I could check that my command produces the same values as **poisson**, as shown in example 11.

**Example 11: Certifying against an existing command**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . poisson accidents cvalue kids traffic Iteration 0: log likelihood = -555.86605 Iteration 1: log likelihood = -555.8154 Iteration 2: log likelihood = -555.81538 Poisson regression Number of obs = 505 LR chi2(3) = 340.20 Prob > chi2 = 0.0000 Log likelihood = -555.81538 Pseudo R2 = 0.2343 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506594 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .574354 .2839515 2.02 0.043 .0178193 1.130889 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 1.081e-07 . assert mreldif(b1, b2) <1e-6

**Done and undone**

I presented some techniques that I use to write certification scripts. A real certification script would cover many more cases. In the next post, I discuss using and creating Mata libraries.

**References**

Cameron, A. C., and P. K. Trivedi. 2005. *Microeconometrics: Methods and Applications*. Cambridge: Cambridge University Press.

Gould, W. 2001. Statistical software certification. *Stata Journal* 1: 29–50.

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, Massachusetts: MIT Press.

As of update 03 Mar 2016, **bayesmh** provides a more convenient way of fitting distributions to the outcome variable. By design, **bayesmh** is a regression command, which models the mean of the outcome distribution as a function of predictors. There are cases when we do not have any predictors and want to model the outcome distribution directly. For example, we may want to fit a Poisson distribution or a binomial distribution to our outcome. This can now be done by specifying one of the four new distributions supported by **bayesmh** in the **likelihood()** option: **dexponential()**, **dbernoulli()**, **dbinomial()**, or **dpoisson()**. Previously, the suboption **noglmtransform** of **bayesmh**‘s option **likelihood()** was used to fit the exponential, binomial, and Poisson distributions to the outcome variable. This suboption continues to work but is now undocumented.

For examples, see *Beta-binomial model*, *Bayesian analysis of change-point problem*, and *Item response theory* under *Remarks and examples* in **[BAYES] bayesmh**.

We have also updated our earlier “Bayesian binary item response theory models using bayesmh” blog entry to use the new **dbernoulli()** specification when fitting 3PL, 4PL, and 5PL IRT models.

This is the twenty-fourth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**An ado-command that computes predictions**

The syntax of **mypoisson5_p** is

**mypoisson5_p** [*type*] *newvarname* [*if*] [*in*] [**, n xb**]

**mypoisson5_p** computes the expected number of counts when option **n** is specified, and it computes the linear predictions when option **xb** is specified. **n** is the default option, if neither **xb** or **n** is specified by the user. Despite the syntax diagram, the user may not specify both **xb** and **n**.

Now consider the code for this command in code block 1.

*! version 1.0.0 10Mar2016 program define mypoisson5_p version 14 syntax newvarname [if] [in] , [ xb n ] marksample touse, novarlist local nopts : word count `xb' `n' if `nopts' >1 { display "{err}only one statistic may be specified" exit 498 } if `nopts' == 0 { local n n display "expected counts" } if "`xb'" != "" { _predict `typlist' `varlist' if `touse' , xb } else { tempvar xbv quietly _predict double `xbv' if `touse' , xb generate `typlist' `varlist' = exp(`xbv') if `touse' } end

Line 5 uses **syntax** to parse the syntax specified in the syntax diagram above. Line 5 specifies that **mypoisson5_p** requires the name of a new variable, that it allows an **if** or **in** condition, and that it accepts the options **xb** and **n**. **syntax newvarname** specifies that the user must specify a name for a variable that is not in the dataset in memory. **syntax** stores the name of the new variable in the local macro **varlist**. If the user specifies a variable type in addition to the variable name, the type will be stored in the local macro **typlist**. For example, if the user specified

. mypoisson5_p double yhat

the local macro **varlist** would contain “yhat” and the local macro **typlist** would contain “double”. If the user does not specify a type, the local macro **typlist** contains nothing.

Line 7 uses **marksample** to create a sample-identification variable whose name will be in the local macro **touse**. Unlike the examples in Programming an estimation command in Stata: Allowing for sample restrictions and factor variables, I specified the option **novarlist** on **marksample** so that **marksample** will use only the user-specified **if** or **in** restrictions to create the sample-identification variable and not use the nonexistent observations in the new variable.

The options **xb** and **n** specify which statistic to compute. The syntax command on line 5 allows users to specify

- the
**xb**option, - the
**n**option, - both the
**xb**option and the**n**option, or - neither the
**xb**option nor the**n**option.

In case (1), the local macro **xb** will contain “xb” and the local macro **n** will contain nothing. In case (2), the local macro **xb** will contain nothing and the local macro **n** will contain “n”. In case (3), the local macro **xb** will contain “xb” and the local macro **n** will contain “n”. In case (4), the local macro **xb** will contain nothing and the local macro **n** will contain nothing.

The syntax diagram and its discussion imply that cases (1), (2), and (4) are valid, but that case (3) would be an error. Line 9 puts the number of options specified by the user in the local macro **nopts**. The rest of the code uses **nopts**, **xb**, and **n** to handle cases (1)–(4).

Lines 10–13 handle case (3) by exiting with a polite error message when **nopts** contains 2.

Lines 15–18 handle case (4) by putting “n” in the local macro **n** when **nopts** contains 0.

At this point, we have handled cases (3) and (4), and we use **xb** and **n** to handle cases (1) and (2), because either **xb** is not empty and **n** is empty, or **xb** is empty and **n** is not empty.

Lines 20–22 handle case (1) by using **_predict** to compute the **xb** predictions when the local macro **xb** is not empty. Note that the predictions are computed at the precision specified by the user.

Lines 23–27 handle case (2) by using **_predict** to compute **xb** in a temporary variable that is subsequently used to compute **n**. Note that the temporary variable for **xb** is always computed in double precision and that the **n** is computed at the precision specified by the user.

**Storing the name of the prediction command in e(predict)**

To compute the **xb** statistic, users type

. predict double yhat, xb

instead of typing

. mypoisson5_p double yhat, xb

This syntax works because the **predict** command uses the ado-command whose name is stored in **e(predict)**. On line 50 of **mypoisson5** in code block 2, I store “mypoisson5_p” in **e(predict)**. This addition is the only difference between **mypoisson5.ado** in code block 2 and **mypoisson4.ado** in code block 5 in Programming an estimation command in Stata: Adding analytical derivatives to a poisson command using Mata.

*! version 5.0.0 10Mar2016 program define mypoisson5, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant vce(string) ] marksample touse _vce_parse `touse' , optlist(Robust) argoptlist(CLuster) : , vce(`vce') local vce "`r(vce)'" local clustervar "`r(cluster)'" if "`vce'" == "robust" | "`vce'" == "cluster" { local vcetype "Robust" } if "`clustervar'" != "" { capture confirm numeric variable `clustervar' if _rc { display in red "invalid vce() option" display in red "cluster variable {bf:`clustervar'} is " /// "string variable instead of a numeric variable" exit(198) } sort `clustervar' } gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'", "`vce'", "`clustervar'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local clustvar "`clustervar'" ereturn local predict "mypoisson5_p" ereturn local cmd "mypoisson5" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo, string scalar vcetype, string scalar clustervar) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf2") optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) if (vcetype == "robust") { V = optimize_result_V_robust(S) } else if (vcetype == "cluster") { cvar = st_data(., clustervar, touse) optimize_init_cluster(S, cvar) V = optimize_result_V_robust(S) } else { // vcetype must IID V = optimize_result_V_oim(S) } rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb, mu xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y)) if (todo>=1) { grad = (y - mu):*X } if (todo==2) { hess = -quadcross(X, mu, X) } } end

Example 1 illustrates that our implementation works by comparing the predictions obtained after **mypoisson5** with those obtained after **poisson**.

**Example 1: predict after mypoisson5**

. clear all . use accident3 . quietly poisson accidents cvalue kids traffic . predict double n1 (option n assumed; predicted number of events) . quietly mypoisson5 accidents cvalue kids traffic . predict double n2 expected counts . list n1 n2 in 1/5 +-----------------------+ | n1 n2 | |-----------------------| 1. | .15572052 .15572052 | 2. | .47362502 .47362483 | 3. | .46432954 .46432946 | 4. | .84841301 .84841286 | 5. | .40848207 .40848209 | +-----------------------+

**Done and undone**

I made **predict** work after **mypoisson5** by writing an ado-command that computes the prediction and by storing the name of this ado-command in **e(predict)**.

In my next post, I discuss how to check that a working command is still working, a topic known as certification.

]]>I describe how to generate random numbers and discuss some features added in Stata 14. In particular, Stata 14 includes a new default random-number generator (RNG) called the Mersenne Twister (Matsumoto and Nishimura 1998), a new function that generates random integers, the ability to generate random numbers from an interval, and several new functions that generate random variates from nonuniform distributions.

**Random numbers from the uniform distribution**

In the example below, we use **runiform()** to create a simulated dataset with 10,000 observations on a (0,1)-uniform variable. Prior to using **runiform()**, we set the seed so that the results are reproducible.

. set obs 10000 number of observations (_N) was 0, now 10,000 . set seed 98034 . generate u1 = runiform()

The mean of a (0,1)-uniform is .5, and the standard deviation is \(\sqrt{1/12}\approx .289\). The estimates from the simulated data reported in the output below are close to the true values.

summarize u1 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- u1 | 10,000 .5004244 .2865088 .0000502 .999969

To draw uniform variates over (a, b) instead of over (0, 1), we specify **runiform(a, b)**. In the example below, we draw uniform variates over (1, 2) and then estimate the mean and the standard deviation, which we could compare with their theoretical values of 1.5 and \(\sqrt{(1/12)} \approx .289\).

. generate u2 = runiform(1, 2) . summarize u2 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- u2 | 10,000 1.495698 .2887136 1.000088 1.999899

To draw integers uniformly over {a, a+1, …, b}, we specify **runiformint(a, b)**. In the example below, we draw integers uniformly over {0, 1, …, 100} and then estimate the mean and the standard deviation, which we could compare with their theoretical values of 50 and \(\sqrt{(101^2-1)/12}\approx 29.155\).

. generate u3 = runiformint(0, 100) . summarize u3 Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- u3 | 10,000 49.9804 29.19094 0 100

**Set the seed and make results reproducible**

We use **set seed** *#* to obtain the same random numbers, which makes the subsequent results reproducible. RNGs come from a recursive formula. The “random” numbers produced are actually deterministic, but they appear to be random. Setting the seed specifies a starting place for the recursion, which causes the random numbers to be the same, as in the example below.

. drop _all . set obs 6 number of observations (_N) was 0, now 6 . set seed 12345 . generate x = runiform() . set seed 12345 . generate y = runiform() . list x y +---------------------+ | x y | |---------------------| 1. | .3576297 .3576297 | 2. | .4004426 .4004426 | 3. | .6893833 .6893833 | 4. | .5597356 .5597356 | 5. | .5744513 .5744513 | |---------------------| 6. | .2076905 .2076905 | +---------------------+

Every time Stata is launched, the seed is set to 123456789.

After generating \(N\) random numbers, the RNG wraps around and starts generating the same sequence all over again. \(N\) is called the *period* of the RNG. Larger periods are better because we get more random numbers before the sequence wraps. The period of Mersenne Twister is \(2^{19937}-1\), which is huge. Large periods are important when performing complicated simulation studies.

In Stata, the seed is a positive integer (between 0 and \(2^{31}-1\)) that Stata maps onto the state of the RNG. The state of an RNG corresponds to a spot in the sequence. The mapping is not one to one because there are more states than seeds. If you want to pick up where you left off in the sequence, you need to restore the state, as in the example below.

drop _all . set obs 3 number of observations (_N) was 0, now 3 . set seed 12345 . generate x = runiform() . local state `c(rngstate)' . generate y = runiform() . set rngstate `state' . generate z = runiform() . list +--------------------------------+ | x y z | |--------------------------------| 1. | .3576297 .5597356 .5597356 | 2. | .4004426 .5744513 .5744513 | 3. | .6893833 .2076905 .2076905 | +--------------------------------+

After dropping the data and setting the number of observations to 3, we use **generate** to put random variates in **x**, store the state of the RNG in the local macro **state**, and then put random numbers in **y**. Next, we use **set rngstate** to restore the state to what it was before we generated **y**, and then we generate **z**. The random numbers in **z** are the same as those in **y** because restoring the state caused Stata to start at the same place in the sequence as before we generated **y**. See Programming an estimation command in Stata: Where to store your stuff for an introduction to local macros.

**Random variates from various distributions**

So far, we have talked about generating uniformly distributed random numbers. Stata also provides functions that generate random numbers from other distributions. The function names are easy to remember: the letter *r* followed by the name of the distribution. Some common examples are **rnormal()**, **rbeta()**, and **rweibull()**. In the example below, we draw 5,000 observations from a standard normal distribution and summarize the results.

. drop _all . set seed 12345 . set obs 5000 number of observations (_N) was 0, now 5,000 . generate w = rnormal() . summarize w Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- w | 5,000 .0008946 .9903156 -3.478898 3.653764

The estimated mean and standard deviation are close to their true values of 0 and 1.

**A note on precision**

So far, we generated random numbers with the default data type of *float*. Generating the random numbers with type *double* makes ties occur less frequently. Ties can still occur with type *double* because the huge period of Mersenne Twister exceeds the precison of \(2^{-53}\), so a long enough sequence of random numbers will have repeated numbers.

**Conclusion**

In this post, I showed how to generate random numbers using random-number functions in Stata. I also discussed how to make results reproducible by setting the seed. In subsequent posts, I will delve into other aspects of RNGs, including methods to generate random variates from other distributions and in Mata.

**Reference**

Matsumoto, M., and T. Nishimura. 1998. Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. *ACM Transactions on Modeling and Computer Simulation* 8: 3–30.

\newcommand{\betab}{\boldsymbol{\beta}}\)Using analytically computed derivatives can greatly reduce the time required to solve a nonlinear estimation problem. I show how to use analytically computed derivatives with

This is the twenty-third post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Analytically computed derivatives for Poisson**

The contribution of the *i*(th) observation to the log-likelihood function for the Poisson maximum-likelihood estimator is

$$

L_i = -\exp(\xb_i\betab’) + y_i\xb_i\betab’ – \ln(y_i\!)

$$

The vector of observation-level contributions can be coded in Mata by

xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y))

where **X** is the matrix of observations on the covariates, **b** is the row vector of parameters, **y** is the vector of observations on the dependent variable, **mu** is the vector of observations on **xb=X*b’**, and **val** is the vector of observation-level contributions.

The gradient for the *i*(th) observation is

$$

g_i = (y_i-\exp(\xb_i\betab’))\xb_i

$$

The vector of all the observation-level gradients can be coded in Mata by **(y-mu):*X**.

The sum of the Hessians calculated at each observation *i* is

$$

H = -\sum_{i=1}^N \exp(\xb_i\betab)\xb_i’\xb_i

$$

which can be coded in Mata by **-quadcross(X, mu, X)**.

**Using analytically computed gradients in optimize()**

The code in **dex1.do** implements the observation-level gradients in the evaluator function **plleval3()** used by **optimize()** in **dowork()** to maximize the Poisson log-likelihood function for the given data.

mata: void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb, mu xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y)) if (todo>=1) { grad = (y - mu):*X } } void dowork( ) { real vector y, b real matrix X real scalar n, p transmorphic S y = st_data(., "accidents") X = st_data(., "cvalue kids traffic ") n = rows(y) X = X, J(n, 1, 1) p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf1debug") optimize_init_params(S, J(1, p, .01)) b = optimize(S) } dowork() end

Lines 2–16 define the evaluator function **plleval3()**, which stores the observation-level contributions to the log likelihood in **val** and the observation-level gradients in **grad**. **grad** is only calculated when

**todo>=1**.

**optimize()** uses **todo** to tell the evaluator function what it needs. At some points in the optimization process, **optimize()** needs only the value of the objective function, which **optimize()** communicates to the evaluator by setting **todo=0**. At other points in the optimization process, **optimize()** needs the value of the objective function and the gradient, which **optimize()** communicates to the evaluator by setting **todo=1**. At still other points in the optimization process, **optimize()** needs the value of the objective function, the gradient, and the Hessian, which **optimize()** communicates to the evaluator by setting **todo=2**. An evaluator function that calculates the gradient analytically must compute it when **todo=1** or **todo=2**. Coding **>=** instead of **==** on line 13 is crucial.

Lines 18–40 define **dowork()**, which implements a call to **optimize()** to maximize the Poisson log-likelihood function for these data. Line 35 differs from the examples that I previously discussed; it sets the evaluator type to **gf1debug**. This evaluator type has two parts: **gf1** and **debug**. **gf1** specifies that the evaluator return observation-level contributions to the objective function and that it return a matrix of observation-level gradients when **todo==1** or **todo==2**. Appending **debug** to **gf1** tells **optimize()** to produce a report comparing the analytically computed derivatives with those computed numerically by **optimize()** and to use the numerically computed derivatives for the optimization.

Example 1 illustrates the derivative comparison report.

**Example 1: gf1debug output**

. clear all . use accident3 . do dex1 . mata: ------------------------------------------------- mata (type end to exit) ------ : : void plleval3(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb, mu > > xb = X*b' > mu = exp(xb) > val = (-mu + y:*xb - lnfactorial(y)) > > if (todo>=1) { > grad = (y - mu):*X > } > } note: argument hess unused : : void dowork( ) > { > real vector y, b > real matrix X > real scalar n, p > transmorphic S > > y = st_data(., "accidents") > X = st_data(., "cvalue kids traffic ") > n = rows(y) > X = X, J(n, 1, 1) > p = cols(X) > > S = optimize_init() > optimize_init_argument(S, 1, y) > optimize_init_argument(S, 2, X) > optimize_init_evaluator(S, &plleval3()) > optimize_init_evaluatortype(S, "gf1debug") > optimize_init_params(S, J(1, p, .01)) > > b = optimize(S) > > } note: variable b set but not used : : dowork() gf1debug: Begin derivative-comparison report ---------------------------------- gf1debug: mreldif(gradient vectors) = 9.91e-07 gf1debug: Warning: evaluator did not compute Hessian matrix gf1debug: End derivative-comparison report ------------------------------------ Iteration 0: f(p) = -851.18669 gf1debug: Begin derivative-comparison report ---------------------------------- gf1debug: mreldif(gradient vectors) = 2.06e-10 gf1debug: Warning: evaluator did not compute Hessian matrix gf1debug: End derivative-comparison report ------------------------------------ Iteration 1: f(p) = -556.66874 gf1debug: Begin derivative-comparison report ---------------------------------- gf1debug: mreldif(gradient vectors) = 1.59e-07 gf1debug: Warning: evaluator did not compute Hessian matrix gf1debug: End derivative-comparison report ------------------------------------ Iteration 2: f(p) = -555.81731 gf1debug: Begin derivative-comparison report ---------------------------------- gf1debug: mreldif(gradient vectors) = .0000267 gf1debug: Warning: evaluator did not compute Hessian matrix gf1debug: End derivative-comparison report ------------------------------------ Iteration 3: f(p) = -555.81538 gf1debug: Begin derivative-comparison report ---------------------------------- gf1debug: mreldif(gradient vectors) = .0000272 gf1debug: Warning: evaluator did not compute Hessian matrix gf1debug: End derivative-comparison report ------------------------------------ Iteration 4: f(p) = -555.81538 : : end -------------------------------------------------------------------------------- . end of do-file

For each iteration, **mreldif(gradient vectors)** reports the maximum relative difference between the analytically and numerically computed derivatives. Away from the optimum, a correctly coded analytical gradient will yield an **mreldif** of **e-08** or smaller. The numerically computed gradients are imperfect approximations to the true gradients, and **e-08** is about the best we can reliably hope for when using double precision numbers. Use the **mreldif** reports from iterations away from the optimum. Because the gradient is almost zero at the optimum, the **mreldif** calculation produces an over-sized difference for iterations near the optimum.

In the example at hand, the **mreldif** calculations of **9.91e-07**, **2.06e-10**, and **1.59e-07** for iterations 0, 1, and 2 indicate that the analytically computed derivatives are correct.

The code in **dex2.do** differs from that in **dex1.do** by specifying the evaluator type to be **gf1** instead of **gf1debug** on line 37. A **gf1** evaluator type differs from a **gf1debug** evaluator type in that it uses the analytically computed gradients in the optimization, the numerical gradients are not computed, and there are no derivative comparison reports.

mata: mata drop plleval3() dowork() void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb, mu xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y)) if (todo>=1) { grad = (y - mu):*X } } void dowork( ) { real vector y, b real matrix X real scalar n, p transmorphic S y = st_data(., "accidents") X = st_data(., "cvalue kids traffic ") n = rows(y) X = X, J(n, 1, 1) p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf1") optimize_init_params(S, J(1, p, .01)) b = optimize(S) } dowork() end

Example 2 illustrates the output.

**Example 2: gf1 output**

. do dex2 . mata: ------------------------------------------------- mata (type end to exit) ------ : : mata drop plleval3() dowork() : : void plleval3(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb, mu > > xb = X*b' > mu = exp(xb) > val = (-mu + y:*xb - lnfactorial(y)) > > if (todo>=1) { > grad = (y - mu):*X > } > } note: argument hess unused : : void dowork( ) > { > real vector y, b > real matrix X > real scalar n, p > transmorphic S > > y = st_data(., "accidents") > X = st_data(., "cvalue kids traffic ") > n = rows(y) > X = X, J(n, 1, 1) > p = cols(X) > > S = optimize_init() > optimize_init_argument(S, 1, y) > optimize_init_argument(S, 2, X) > optimize_init_evaluator(S, &plleval3()) > optimize_init_evaluatortype(S, "gf1") > optimize_init_params(S, J(1, p, .01)) > > b = optimize(S) > > } note: variable b set but not used : : dowork() Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 : : end -------------------------------------------------------------------------------- . end of do-file

**Using an analytically computed Hessian in optimize()**

The code in **dex3.do** adds the sum of the observation-level Hessians to the evaluator function **plleval3()** used by **optimize()** in **dowork()**.

mata: mata drop plleval3() dowork() void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb, mu xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y)) if (todo>=1) { grad = (y - mu):*X } if (todo==2) { hess = -quadcross(X, mu, X) } } void dowork( ) { real vector y, b real matrix X real scalar n, p transmorphic S y = st_data(., "accidents") X = st_data(., "cvalue kids traffic ") n = rows(y) X = X, J(n, 1, 1) p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf2debug") optimize_init_params(S, J(1, p, .01)) b = optimize(S) } dowork() end

Lines 18–20 are new to **dex3.do**, and they compute the Hessian when **todo==2**. Line 41 in **dex3.do** specifies a **gf2debug** evaluator type instead of the **gf1** evaluator type specified on line 37 of **dex2.do**.

The **gf2debug** evaluator type is a second-derivative version of the **gf1debug** evaluator type; it specifies that the evaluator return observation-level contributions to the objective function, that it return a matrix of observation-level gradients when **todo==1** or **todo==2**, and that it return a matrix containing the sum of observation-level Hessians when **todo==2**. The **gf2debug** evaluator type also specifies that **optimize()** will produce a derivative-comparison report for the gradient and the Hessian and that **optimize()** will use the numerically computed derivatives for the optimization.

Example 3 illustrates the output.

**Example 3: gf2debug output**

. do dex3 . mata: ------------------------------------------------- mata (type end to exit) ------ : : mata drop plleval3() dowork() : : void plleval3(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb, mu > > xb = X*b' > mu = exp(xb) > val = (-mu + y:*xb - lnfactorial(y)) > > if (todo>=1) { > grad = (y - mu):*X > } > if (todo==2) { > hess = -quadcross(X, mu, X) > } > > } : : void dowork( ) > { > real vector y, b > real matrix X > real scalar n, p > transmorphic S > > y = st_data(., "accidents") > X = st_data(., "cvalue kids traffic ") > n = rows(y) > X = X, J(n, 1, 1) > p = cols(X) > > S = optimize_init() > optimize_init_argument(S, 1, y) > optimize_init_argument(S, 2, X) > optimize_init_evaluator(S, &plleval3()) > optimize_init_evaluatortype(S, "gf2debug") > optimize_init_params(S, J(1, p, .01)) > > b = optimize(S) > > } note: variable b set but not used : : dowork() gf2debug: Begin derivative-comparison report ---------------------------------- gf2debug: mreldif(gradient vectors) = 9.91e-07 gf2debug: mreldif(Hessian matrices) = 1.53e-06 gf2debug: End derivative-comparison report ------------------------------------ Iteration 0: f(p) = -851.18669 gf2debug: Begin derivative-comparison report ---------------------------------- gf2debug: mreldif(gradient vectors) = 2.06e-10 gf2debug: mreldif(Hessian matrices) = .0001703 gf2debug: End derivative-comparison report ------------------------------------ Iteration 1: f(p) = -556.66874 gf2debug: Begin derivative-comparison report ---------------------------------- gf2debug: mreldif(gradient vectors) = 1.59e-07 gf2debug: mreldif(Hessian matrices) = 5.42e-07 gf2debug: End derivative-comparison report ------------------------------------ Iteration 2: f(p) = -555.81731 gf2debug: Begin derivative-comparison report ---------------------------------- gf2debug: mreldif(gradient vectors) = .0000267 gf2debug: mreldif(Hessian matrices) = 2.45e-07 gf2debug: End derivative-comparison report ------------------------------------ Iteration 3: f(p) = -555.81538 gf2debug: Begin derivative-comparison report ---------------------------------- gf2debug: mreldif(gradient vectors) = .0000272 gf2debug: mreldif(Hessian matrices) = 2.46e-07 gf2debug: End derivative-comparison report ------------------------------------ Iteration 4: f(p) = -555.81538 : : end -------------------------------------------------------------------------------- . end of do-file

Unlike the **mreldif** calculations for the gradient, I look closely at the **mreldif** calculations for the Hessian near the optimum, because the Hessian must be full rank at the optimum. In this example, the **mreldif** calculations near the optimum are on the order of **e-07**, indicating a correctly coded analytical Hessian.

Now consider **dex4.do**, which differs from **dex3.do** in that line 40 specifies a **gf2** evaluator type instead of a **gf2debug** evaluator type. A **gf2** evaluator type is **gf1** evaluator type for first and second derivatives. A **gf2** evaluator type differs from a **gf2debug** evaluator type in that it uses the analytically computed gradients and the analytically computed Hessian in the optimization, the numerical derivatives are not computed, and there are no derivative comparison reports.

mata: mata drop plleval3() dowork() void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb, mu xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y)) if (todo>=1) { grad = (y - mu):*X } if (todo==2) { hess = -quadcross(X, mu, X) } } void dowork( ) { real vector y, b real matrix X real scalar n, p transmorphic S y = st_data(., "accidents") X = st_data(., "cvalue kids traffic ") n = rows(y) X = X, J(n, 1, 1) p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf2") optimize_init_params(S, J(1, p, .01)) b = optimize(S) } dowork() end

Example 4 illustrates the output.

**Example 4: gf2 output**

. do dex4 . mata: ------------------------------------------------- mata (type end to exit) ------ : : mata drop plleval3() dowork() : : void plleval3(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb, mu > > xb = X*b' > mu = exp(xb) > val = (-mu + y:*xb - lnfactorial(y)) > > if (todo>=1) { > grad = (y - mu):*X > } > if (todo==2) { > hess = -quadcross(X, mu, X) > } > > } : : void dowork( ) > { > real vector y, b > real matrix X > real scalar n, p > transmorphic S > > y = st_data(., "accidents") > X = st_data(., "cvalue kids traffic ") > n = rows(y) > X = X, J(n, 1, 1) > p = cols(X) > > S = optimize_init() > optimize_init_argument(S, 1, y) > optimize_init_argument(S, 2, X) > optimize_init_evaluator(S, &plleval3()) > optimize_init_evaluatortype(S, "gf2") > optimize_init_params(S, J(1, p, .01)) > > b = optimize(S) > > } note: variable b set but not used : : dowork() Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 : : end -------------------------------------------------------------------------------- . end of do-file

**Including analytical derivatives in the command**

**mypoisson4** is like **mypoisson3**, except that it computes the derivatives analytically. In the remainder of this post, I briefly discuss the code for **mypoisson4.ado**.

*! version 4.0.0 28Feb2016 program define mypoisson4, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant vce(string) ] marksample touse _vce_parse `touse' , optlist(Robust) argoptlist(CLuster) : , vce(`vce') local vce "`r(vce)'" local clustervar "`r(cluster)'" if "`vce'" == "robust" | "`vce'" == "cluster" { local vcetype "Robust" } if "`clustervar'" != "" { capture confirm numeric variable `clustervar' if _rc { display in red "invalid vce() option" display in red "cluster variable {bf:`clustervar'} is " /// "string variable instead of a numeric variable" exit(198) } sort `clustervar' } gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'", "`vce'", "`clustervar'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local clustvar "`clustervar'" ereturn local cmd "mypoisson4" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo, string scalar vcetype, string scalar clustervar) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf2") optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) if (vcetype == "robust") { V = optimize_result_V_robust(S) } else if (vcetype == "cluster") { cvar = st_data(., clustervar, touse) optimize_init_cluster(S, cvar) V = optimize_result_V_robust(S) } else { // vcetype must IID V = optimize_result_V_oim(S) } rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb, mu xb = X*b' mu = exp(xb) val = (-mu + y:*xb - lnfactorial(y)) if (todo>=1) { grad = (y - mu):*X } if (todo==2) { hess = -quadcross(X, mu, X) } } end

Only a few lines of **mypoisson4.ado** differ from their counterparts in **mypoisson3.ado**. Line 106 of **mypoisson4.ado** specifies a **gf2** evaluator type, while line 106 of **mypoisson3.ado** specifies a **gf0** evaluator type. Lines 166–171 in **mypoisson4.ado** compute the gradient and the Hessian analytically, and they have no counterparts in **mypoisson3.ado**.

The output in examples 5 and 6 confirms that **mypoisson4** produces the same results as **poisson** when the option **vce(cluster id)** is specified.

**Example 5: mypoisson4 results**

. mypoisson4 accidents cvalue kids traffic , vce(cluster id) Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .1125223 -5.83 0.000 -.8764267 -.4353475 kids | -1.009017 .1805639 -5.59 0.000 -1.362916 -.6551182 traffic | .1467115 .092712 1.58 0.114 -.0350008 .3284237 _cons | .5743541 .6238015 0.92 0.357 -.6482744 1.796983 ------------------------------------------------------------------------------

**Example 6: poisson results**

. poisson accidents cvalue kids traffic , vce(cluster id) Iteration 0: log pseudolikelihood = -555.86605 Iteration 1: log pseudolikelihood = -555.8154 Iteration 2: log pseudolikelihood = -555.81538 Poisson regression Number of obs = 505 Wald chi2(3) = 103.53 Prob > chi2 = 0.0000 Log pseudolikelihood = -555.81538 Pseudo R2 = 0.2343 (Std. Err. adjusted for 285 clusters in id) ------------------------------------------------------------------------------ | Robust accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .1125223 -5.83 0.000 -.8764266 -.4353475 kids | -1.009017 .1805639 -5.59 0.000 -1.362915 -.6551181 traffic | .1467115 .092712 1.58 0.114 -.0350008 .3284237 _cons | .574354 .6238015 0.92 0.357 -.6482744 1.796982 ------------------------------------------------------------------------------

**Done and undone**

I showed how to compute derivatives analytically when using **optimize()**, and I included analytically computed derivatives in **mypoisson4.ado**. In my next post, I show how to make **predict** work after **mypoisson4**.

I only discuss what is new in the code for **mypoisson3.ado**, assuming that you are familiar with **mypoisson2.ado**.

This is the twenty-second post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**A poisson command with options for a robust or a cluster–robust VCE**

**mypoisson3** computes Poisson-regression results in Mata. The syntax of the **mypoisson3** command is

**mypoisson3** *depvar* *indepvars* [if] [in] [**,** **vce( robust** |

where *indepvars* can contain factor variables or time-series variables.

In the remainder of this post, I discuss the code for **mypoisson3.ado**. I recommend that you click on the filename to download the code. To avoid scrolling, view the code in the Do-file Editor, or your favorite text editor, to see the line numbers.

*! version 3.0.0 21Feb2016 program define mypoisson3, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant vce(string) ] marksample touse _vce_parse `touse' , optlist(Robust) argoptlist(CLuster) : , vce(`vce') local vce "`r(vce)'" local clustervar "`r(cluster)'" if "`vce'" == "robust" | "`vce'" == "cluster" { local vcetype "Robust" } if "`clustervar'" != "" { capture confirm numeric variable `clustervar' if _rc { display in red "invalid vce() option" display in red "cluster variable {bf:`clustervar'} is " /// "string variable instead of a numeric variable" exit(198) } sort `clustervar' } gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'", "`vce'", "`clustervar'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local vce "`vce'" ereturn local vcetype "`vcetype'" ereturn local clustvar "`clustervar'" ereturn local cmd "mypoisson3" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo, string scalar vcetype, string scalar clustervar) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf0") optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) if (vcetype == "robust") { V = optimize_result_V_robust(S) } else if (vcetype == "cluster") { cvar = st_data(., clustervar, touse) optimize_init_cluster(S, cvar) V = optimize_result_V_robust(S) } else { // vcetype must IID V = optimize_result_V_oim(S) } rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = (-exp(xb) + y:*xb - lnfactorial(y)) } end

Only a few lines of **mypoisson3.ado** differ from their counterparts in **mypoisson2.ado**, and I put these changes into four groups.

- Line 5 allows
**vce()**on the**syntax**command, and lines 8–23 parse this option.I discussed the techniques used in these changes in Programming an estimation com-

mand in Stata: Adding robust and cluster–robust VCEs to our Mata based OLS command, when I used them in**myregress12.ado**. These lines- put the specified VCE in the local macro
**vce**; - put a label for the specified VCE in the local macro
**vcetype**; - put the name of a specified cluster variable in the local macro
**clustervar**, and - handle any errors when the user misspecifies the
**vce()**option.

- put the specified VCE in the local macro
- Line 35 passes the contents of the local macros
**vce**and**clustervar**to the Mata work function**mywork()**. - Lines 47–49 store the local macros
**vce**,**vcetype**, and**clustvar**in**e()**results. - Line 84 parses the new arguments
**vcetype**and**clustervar**. The string scalar**vcetype**contains the type of VCE to be estimated, and the string scalar**clustervar**contains the name of the Stata variable containing the clusters, if specified. - Lines 112–122 use the contents of
**vcetype**to return an OIM, a robust, or a cluster–robust estimator of the VCE.The contents of

**vcetype**determine which**optimize()**function is called to compute the estimated VCE. If**vcetype**contains**robust**, line 113 uses**optimize_result_V_robust()**to compute a robust estimator of the VCE. If**vcetype**contains**cluster**, lines 116 and 117 put a copy of the Stata cluster variable in the optimize object, and then line 118 uses**optimize_result_V_robust()**to compute a cluster–robust estimator of the VCE. Finally, if**vcetype**is empty, line 121 uses**optimize_result_V_oim()**to compute the default correct-specification estimator of the VCE.

The output in examples 1 and 2 confirms that **mypoisson3** produces the same results as **poisson** when the option **vce(cluster id)** is specified.

**Example 1: mypoisson3 results**

. clear all . use accident3 . mypoisson3 accidents cvalue i.kids traffic, vce(cluster id) Iteration 0: f(p) = -847.19028 Iteration 1: f(p) = -573.7331 Iteration 2: f(p) = -545.76673 Iteration 3: f(p) = -545.11357 Iteration 4: f(p) = -545.10898 Iteration 5: f(p) = -545.10898 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582924 .1128794 -5.83 0.000 -.8795319 -.4370529 | kids | 1 | -1.662351 .4309205 -3.86 0.000 -2.50694 -.8177623 2 | -1.574691 .4164515 -3.78 0.000 -2.390921 -.7584611 3 | -3.233933 .4685643 -6.90 0.000 -4.152302 -2.315564 | traffic | .1383976 .0876168 1.58 0.114 -.0333282 .3101235 _cons | .7157579 .5970943 1.20 0.231 -.4545254 1.886041 ------------------------------------------------------------------------------

**Example 2: poisson results**

. poisson accidents cvalue i.kids traffic, vce(cluster id) Iteration 0: log pseudolikelihood = -546.35782 Iteration 1: log pseudolikelihood = -545.11016 Iteration 2: log pseudolikelihood = -545.10898 Iteration 3: log pseudolikelihood = -545.10898 Poisson regression Number of obs = 505 Wald chi2(5) = 118.06 Prob > chi2 = 0.0000 Log pseudolikelihood = -545.10898 Pseudo R2 = 0.2491 (Std. Err. adjusted for 285 clusters in id) ------------------------------------------------------------------------------ | Robust accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582924 .1128793 -5.83 0.000 -.8795317 -.437053 | kids | 1 | -1.662351 .4309205 -3.86 0.000 -2.50694 -.8177622 2 | -1.574691 .4164515 -3.78 0.000 -2.390921 -.758461 3 | -3.233932 .4685642 -6.90 0.000 -4.152301 -2.315563 | traffic | .1383977 .0876167 1.58 0.114 -.0333279 .3101232 _cons | .7157576 .597093 1.20 0.231 -.4545232 1.886038 ------------------------------------------------------------------------------

**Done and undone**

I discussed **mypoisson3**, which has options for a robust or a cluster–robust estimator of the variance–covariance of the estimator. In my next post, I discuss how to have the evaluator function compute the derivatives to speed up the optimization.

\newcommand{\mub}{{\boldsymbol{\mu}}}

\newcommand{\thetab}{{\boldsymbol{\theta}}}

\newcommand{\Thetab}{{\boldsymbol{\Theta}}}

\newcommand{\etab}{{\boldsymbol{\eta}}}

\newcommand{\Sigmab}{{\boldsymbol{\Sigma}}}

\newcommand{\Phib}{{\boldsymbol{\Phi}}}

\newcommand{\Phat}{\hat{{\bf P}}}\)Vector autoregression (VAR) is a useful tool for analyzing the dynamics of multiple time series. VAR expresses a vector of observed variables as a function of its own lags.

**Simulation**

Let’s begin by simulating a bivariate VAR(2) process using the following specification,

\[

\begin{bmatrix} y_{1,t}\\ y_{2,t}

\end{bmatrix}

= \mub + {\bf A}_1 \begin{bmatrix} y_{1,t-1}\\ y_{2,t-1}

\end{bmatrix} + {\bf A}_2 \begin{bmatrix} y_{1,t-2}\\ y_{2,t-2}

\end{bmatrix} + \epsb_t

\]

where \(y_{1,t}\) and \(y_{2,t}\) are the observed series at time \(t\), \(\mub\) is a \(2 \times 1\) vector of intercepts, \({\bf A}_1\) and \({\bf A}_2\) are \(2\times 2\) parameter matrices, and \(\epsb_t\) is a \(2\times 1\) vector of innovations that is uncorrelated over time. I assume a \(N({\bf 0},\Sigmab)\) distribution for the innovations \(\epsb_t\), where \(\Sigmab\) is a \(2\times 2\) covariance matrix.

I set my sample size to 1,100 and generate variables to hold the observed series and innovations.

. clear all . set seed 2016 . local T = 1100 . set obs `T' number of observations (_N) was 0, now 1,100 . gen time = _n . tsset time time variable: time, 1 to 1100 delta: 1 unit . generate y1 = . (1,100 missing values generated) . generate y2 = . (1,100 missing values generated) . generate eps1 = . (1,100 missing values generated) . generate eps2 = . (1,100 missing values generated)

In lines 1–6, I set the seed for the random-number generator, set my sample size to 1,100, and generate a time variable, **time**. In the remaining lines, I generate variables **y1**, **y2**, **eps1**, and **eps2** to hold the observed series and innovations.

**Setting parameter values**

I choose the parameter values for the VAR(2) model as follows:

\[

\mub = \begin{bmatrix} 0.1 \\ 0.4 \end{bmatrix}, \quad {\bf A}_1 =

\begin{bmatrix} 0.6 & -0.3 \\ 0.4 & 0.2 \end{bmatrix}, \quad {\bf A}_2 =

\begin{bmatrix} 0.2 & 0.3 \\ -0.1 & 0.1 \end{bmatrix}, \quad \Sigmab =

\begin{bmatrix} 1 & 0.5 \\ 0.5 & 1 \end{bmatrix}

\]

. mata: ------------------------------------------------- mata (type end to exit) ----- : mu = (0.1\0.4) : A1 = (0.6,-0.3\0.4,0.2) : A2 = (0.2,0.3\-0.1,0.1) : Sigma = (1,0.5\0.5,1) : end -------------------------------------------------------------------------------

In Mata, I create matrices **mu**, **A1**, **A2**, and **Sigma** to hold the parameter values. Before generating my data, I check whether these values correspond to a stable VAR(2) process. Let

\[

{\bf F} = \begin{bmatrix} {\bf A}_1 & {\bf A_2} \\ {\bf I}_2 & {\bf 0}

\end{bmatrix}

\]

denote a \(4\times 4\) matrix where \({\bf I}_2\) is a \(2\times 2\) identity matrix and \({\bf 0}\) is a \(2\times 2\) matrix of zeros. The VAR(2) process is stable if the modulus of all eigenvalues of \({\bf F}\) is less than 1. The code below computes the eigenvalues.

. mata: ------------------------------------------------- mata (type end to exit) ----- : K = p = 2 // K = number of variables; p = number of lags : F = J(K*p,K*p,0) : F[1..2,1..2] = A1 : F[1..2,3..4] = A2 : F[3..4,1..2] = I(K) : X = L = . : eigensystem(F,X,L) : L' 1 +----------------------------+ 1 | .858715598 | 2 | -.217760515 + .32727213i | 3 | -.217760515 - .32727213i | 4 | .376805431 | +----------------------------+ : end -------------------------------------------------------------------------------

I construct the matrix **F** as defined above and use the function **eigensystem()** to compute its eigenvalues. The matrix **X** holds the eigenvectors, and **L** holds the eigenvalues. All eigenvalues in **L** are less than 1. The modulus of the second and third complex eigenvalues is \(\sqrt{r^2 + c^2} = 0.3931\), where \(r\) is the real part and \(c\) is the complex part. After checking the stability condition, I generate draws for the VAR(2) model.

**Drawing innovations from a multivariate normal**

I draw random normal values from \(N({\bf 0},\Sigmab)\) and assign them to Stata variables **eps1** and **eps2**.

. mata: ------------------------------------------------- mata (type end to exit) ----- : T = strtoreal(st_local("T")) : u = rnormal(T,2,0,1)*cholesky(Sigma) : epsmat = . : st_view(epsmat,.,"eps1 eps2") : epsmat[1..T,.] = u : end -------------------------------------------------------------------------------

I assign the sample size, defined in Stata as a local macro **T**, to a Mata numeric variable. This simplifies changing the sample size in the future; I only need to do this once at the beginning. In Mata, I use two functions: **st_local()** and **strtoreal()** to assign the sample size. The first function obtains strings from Stata macros, and the second function converts them into a real value.

The second line draws a \(1100 \times 2\) matrix of normal errors from a \(N({\bf 0},\Sigmab)\) distribution. I use the **st_view()** function to assign the draws to the Stata variables **eps1** and **eps2**. This function creates a matrix that is a view on the current Stata dataset. I create a null matrix **epsmat** and use **st_view()** to modify **epsmat** based on the values of the Stata variables **eps1** and **eps2**. Finally, I assign this matrix to hold the draws stored in **u**, effectively populating the Stata variables **eps1** and **eps2** with the random draws.

**Generating the observed series**

Following Lütkepohl (2005, 708), I generate the first two observations so that their correlation structure is the same as the rest of the sample. I assume a bivariate normal distribution with mean equal to the unconditional mean \(\thetab = ({\bf I}_K – {\bf A}_1 – {\bf A}_2)^{-1}\mub\). The covariance matrix of the first two observations of the two series is

\[

\text{vec}(\Sigmab_y) = ({\bf I}_{16} – {\bf F} \otimes {\bf F})^{-1}\text{vec}(\Sigmab_{\epsilon})

\]

where \(\text{vec}()\) is an operator that stacks matrix columns, \({\bf I}_{16}\) is a \(16\times 16\) identity matrix, and \(\Sigmab_{\epsilon} = \begin{bmatrix} \Sigmab & {\bf 0}\\ {\bf 0} & {\bf 0} \end{bmatrix}\) is a \(4\times 4\) matrix. The first two observations are generated as

\[

\begin{bmatrix} {\bf y}_0 \\ {\bf y}_{-1} \end{bmatrix} = {\bf Q} \etab + \Thetab

\]

where **Q** is a \(4\times 4\) matrix such that \({\bf QQ}’ = \Sigmab_y\), \(\etab\) is a \(4\times 1\) vector of standard normal innovations and \(\Thetab = \begin{bmatrix} \thetab \\ \thetab \end{bmatrix}\) is a \(4\times 1\) vector of means.

The following code generates the first two observations and assigns the values to the Stata variables **y1** and **y2**.

. mata: ------------------------------------------------- mata (type end to exit) ----- : Sigma_e = J(K*p,K*p,0) : Sigma_e[1..K,1..K] = Sigma : Sigma_y = luinv(I((K*p)^2)-F#F)*vec(Sigma_e) : Sigma_y = rowshape(Sigma_y,K*p)' : theta = luinv(I(K)-A1-A2)*mu : Q = cholesky(Sigma_y)*rnormal(K*p,1,0,1) : data = . : st_view(data,.,"y1 y2") : data[1..p,.] = ((Q[3..4],Q[1..2]):+mu)' : end -------------------------------------------------------------------------------

After generating the first two observations, I can generate the rest of the series from a VAR(2) process in Stata as follows:

. forvalues i=3/`T' { 2. qui { 3. replace y1 = 0.1 + 0.6*l.y1 - 0.3*l.y2 + 0.2*l2.y1 + 0.3*l2.y2 + eps1 in ` > i' 4. replace y2 = 0.4 + 0.4*l.y1 + 0.2*l.y2 - 0.1*l2.y1 + 0.1*l2.y2 + eps2 in ` > i' 5. } 6. } . drop in 1/100 (100 observations deleted)

I added the **quietly** statement to suppress the output generated after the **replace** command. Finally, I drop the first 100 observations as burn-in to mitigate the effect of initial values.

**Estimation**

I use the **var** command to fit a VAR(2) model.

. var y1 y2 Vector autoregression Sample: 103 - 1100 Number of obs = 998 Log likelihood = -2693.949 AIC = 5.418735 FPE = .7733536 HQIC = 5.43742 Det(Sigma_ml) = .7580097 SBIC = 5.467891 Equation Parms RMSE R-sq chi2 P>chi2 ---------------------------------------------------------------- y1 5 1.14546 0.5261 1108.039 0.0000 y2 5 .865602 0.4794 919.1433 0.0000 ---------------------------------------------------------------- ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- y1 | y1 | L1. | .5510793 .0324494 16.98 0.000 .4874797 .614679 L2. | .2749983 .0367192 7.49 0.000 .20303 .3469667 | y2 | L1. | -.3080881 .046611 -6.61 0.000 -.3994439 -.2167323 L2. | .2551285 .0425803 5.99 0.000 .1716727 .3385844 | _cons | .1285357 .0496933 2.59 0.010 .0311387 .2259327 -------------+---------------------------------------------------------------- y2 | y1 | L1. | .3890191 .0245214 15.86 0.000 .340958 .4370801 L2. | -.0190324 .027748 -0.69 0.493 -.0734175 .0353527 | y2 | L1. | .1944531 .035223 5.52 0.000 .1254172 .263489 L2. | .0459445 .0321771 1.43 0.153 -.0171215 .1090106 | _cons | .4603854 .0375523 12.26 0.000 .3867843 .5339865 ------------------------------------------------------------------------------

By default, **var** estimates the parameters of a VAR model with two lags. The parameter estimates are significant and are similar to the true values used to generate the bivariate series.

**Inference: Impulse response functions**

Impulse response functions (IRF) are useful to analyze the response of endogenous variables in the VAR model due to an exogenous impulse to one of the innovations. For example, in a bivariate VAR of inflation and interest rate, IRFs can trace out the response of interest rates over time due to exogenous shocks to the inflation equation.

Let’s consider the bivariate model that I used earlier. Suppose I want to estimate the effect of a unit shock in \(\epsb_t\) to the endogenous variables in my system. I do this by converting the VAR(2) process into an MA(\(\infty\)) process

as

\[

{\bf y}_t = \sum_{p=0}^\infty \Phib_p \mub + \sum_{p=0}^\infty \Phib_p \epsb_{t-p}

\]

where \(\Phib_p\) is the MA(\(\infty\)) coefficient matrices for the *p*th lag. Following Lütkepohl (2005, 22–23), the MA coefficient matrices are related to the AR matrices as follows:

\begin{align*}

\Phib_0 &= {\bf I}_2 \\

\Phib_1 &= {\bf A}_1 \\

\Phib_2 &= \Phib_1 {\bf A}_1 + {\bf A}_2 = {\bf A}_1^2 + {\bf A}_2\\

&\vdots \\

\Phib_i &= \Phib_{i-1} {\bf A}_1 + \Phib_{i-2}{\bf A}_2

\end{align*}

More formally, the future \(h\) responses of the *i*th variable due to a unit shock in the *j*th equation at time \(t\) are given by

\[

\frac{\partial y_{i,t+h}}{\partial \epsilon_{j,t}} = \{\Phib_h\}_{i,j}

\]

The responses are the collection of elements in the *i*th row and *j*th column of the MA(\(\infty\)) coefficient matrices. For the VAR(2) model, the first few responses using the estimated AR parameters are as follows:

\begin{align*}

\begin{bmatrix} y_{1,0} \\ y_{2,0} \end{bmatrix} &=

\Phib_0 = {\bf I}_2 =

\begin{bmatrix} 1 & 0\\ 0 & 1 \end{bmatrix} \\

\begin{bmatrix} y_{1,1} \\ y_{2,1} \end{bmatrix} &=

\hat{\Phib}_1 = \hat{{\bf A}}_1 =

\begin{bmatrix} 0.5510 & -0.3081 \\ 0.3890 & 0.1945 \end{bmatrix} \\

\begin{bmatrix} y_{1,2} \\ y_{2,2} \end{bmatrix} &= \hat{\Phib}_2 =

\hat{{\bf A}}_1^2 + \hat{{\bf A}}_2 =

\begin{bmatrix} 0.4588 & 0.0254 \\ 0.2710 & -0.0361 \end{bmatrix}

\end{align*}

Future responses of \({\bf y}_t\) for \(t>2\) is computed using similar recursions. The response of the first variable due to an impulse in the first equation is the vector \((1,0.5510,0.4588,\dots)\). I obtain the impulse responses using the **irf create** command as follows:

. irf create firstirf, set(myirf) (file myirf.irf created) (file myirf.irf now active) (file myirf.irf updated)

This command estimates the IRFs and other statistics in **firstirf** and saves the file as **myirf.irf**. The **set()** option sets the filename **myirf.irf** as the active file. I can list the table of responses of **y1** due to an impulse in the same equation by typing

. irf table irf, impulse(y1) response(y1) noci Results from firstirf +--------------------+ | | (1) | | step | irf | |--------+-----------| |0 | 1 | |1 | .551079 | |2 | .458835 | |3 | .42016 | |4 | .353356 | |5 | .305343 | |6 | .263868 | |7 | .227355 | |8 | .196142 | +--------------------+ (1) irfname = firstirf, impulse = y1, and response = y1

The default horizon is 8, and I specify the **noci** option to suppress the confidence limits. Notice that the first few responses are similar to the ones I computed earlier. IRFs are better analyzed using a graph. I can visualize this in a graph along with the 95% confidence bands by typing

. irf graph irf, impulse(y1) response(y1)

A unit impulse to the first equation increases **y1** by a unit

contemporaneously. The response of **y1** slowly declines over time to its long-run level.

**Orthogonalized impulse–response functions**

In the previous section, I showed the response of **y1** due to a unit impulse on the same equation while holding the other impulse constant. The variance–covariance matrix \(\Sigmab\), however, implies a strong positive correlation between the two equations. I list the contents of the estimated variance–covariance matrix by typing

. matrix list e(Sigma) symmetric e(Sigma)[2,2] y1 y2 y1 1.3055041 y2 .4639629 .74551376

The estimated covariance of the two equations is positive. This implies that I cannot assume the other impulse will remain constant. An impulse to the **y2** equation has a contemporaneous effect on **y1**, and vice versa.

Orthogonalized impulse-response functions (OIRF) address this by decomposing the estimated variance–covariance matrix \(\hat{\Sigmab}\) into a lower triangular matrix. This type of decomposition isolates the contemporaneous response of **y1** arising solely because of an impulse in the same equation. Nonetheless, the impulse on the first equation will still contemporaneously affect **y2**. For example, if **y1** is inflation and **y2** is interest rate, this decomposition implies that shock to inflation affects both inflation and interest rates. However, a shock to the interest rate equation only affects interest rates.

To estimate the OIRFs, let \({\bf P}\) denote the Cholesky decomposition of \(\Sigmab\) such that \({\bf P} {\bf P}’=\Sigmab\). Let \({\bf u}_t\) denote a \(2\times 1\) vector such that \({\bf P u}_t = \epsb_t\), which also implies \({\bf

u}_t = {\bf P}^{-1} \epsb_t\). The errors in \({\bf u}_t\) are uncorrelated by construction because \(E({\bf u}_t{\bf u}_t’) = {\bf P}^{-1}E(\epsb_t \epsb_t’) {\bf P’}^{-1}={\bf I}_2\). This allows us to interpret the OIRFs as a one standard-deviation impulse to \({\bf u}_t\).

I rewrite the MA(\(\infty\)) representation from earlier in terms of the \({\bf u}_t\) vector as

\[

{\bf y}_t = \sum_{p=0}^\infty \Phib_p \mub + \sum_{p=0}^\infty \Phib_p {\bf P u}_{t-p}

\]

The OIRFs are simply

\[

\frac{\partial y_{i,t+h}}{\partial u_{j,t}} = \{\Phib_h {\bf

P}\}_{i,j}

\]

the product of the MA coefficient matrices and the lower triangular matrix **P**.

I obtain the estimate of \(\Phat\) by typing

. matrix Sigma_hat = e(Sigma) . matrix P_hat = cholesky(Sigma_hat) . matrix list P_hat P_hat[2,2] y1 y2 y1 1.1425866 0 y2 .40606367 .76198823

Using this matrix, I compute the first few responses as follows:

\begin{align*}

\begin{bmatrix} y_{1,0} \\ y_{2,0} \end{bmatrix} &=

\Phib_0 \Phat = {\bf I}_2 \Phat =

\begin{bmatrix} 1.1426 & 0\\ 0.4061 & 0.7620 \end{bmatrix} \\

\begin{bmatrix} y_{1,1} \\ y_{2,1} \end{bmatrix} &=

\hat{\Phib}_1 \Phat = \hat{{\bf A}}_1 \Phat =

\begin{bmatrix} 0.5046 & -0.2348 \\ 0.5234 & 0.1482 \end{bmatrix} \\

\begin{bmatrix} y_{1,2} \\ y_{2,2} \end{bmatrix} &= \hat{\Phib}_2 \Phat

= (\hat{{\bf A}}_1^2 + \hat{{\bf A}}_2) \Phat =

\begin{bmatrix} 0.5346 & 0.0194 \\ 0.2950 & -0.0275 \end{bmatrix}

\end{align*}

I list all the OIRFs in a table and plot the response of **y1** due to an impulse in **y1**.

. irf table oirf, noci Results from firstirf +--------------------------------------------------------+ | | (1) | (2) | (3) | (4) | | step | oirf | oirf | oirf | oirf | |--------+-----------+-----------+-----------+-----------| |0 | 1.14259 | .406064 | 0 | .761988 | |1 | .504552 | .523448 | -.23476 | .148171 | |2 | .534588 | .294977 | .019384 | -.027504 | |3 | .476019 | .279771 | -.0076 | .013468 | |4 | .398398 | .242961 | -.010024 | -.00197 | |5 | .346978 | .206023 | -.003571 | -.003519 | |6 | .299284 | .178623 | -.004143 | -.001973 | |7 | .257878 | .154023 | -.003555 | -.002089 | |8 | .222533 | .13278 | -.002958 | -.001801 | +--------------------------------------------------------+ (1) irfname = firstirf, impulse = y1, and response = y1 (2) irfname = firstirf, impulse = y1, and response = y2 (3) irfname = firstirf, impulse = y2, and response = y1 (4) irfname = firstirf, impulse = y2, and response = y2

**irf table oirf** requests the OIRFs. Notice that the estimates in the first three rows are the same as the ones computed above.

. irf graph oirf, impulse(y1) response(y1)

The graph corresponds to the response of **y1** as a result of a one standard-deviation impulse to the same equation.

**Conclusion**

In this post, I showed how to simulate data from a stable VAR(2) model. I estimated the parameters of this model using the **var** command. I showed how to estimate IRFs and OIRFs. The latter obtains the responses using the lower triangular decomposition of the covariance matrix.

**Reference**

Lütkepohl, H. 2005. *New Introduction to Multiple Time Series Analysis*. New York: Springer.

This is the twenty-first post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**A Poisson command with Mata computations**

**mypoisson2** computes Poisson regression results in Mata. The syntax of the **mypoisson2** command is

**mypoisson2** *depvar indepvars* [*if*] [*in*] [**,** ** noconstant**]

where *indepvars* can contain factor variables or time-series variables.

In the remainder of this post, I discuss the code for **mypoisson2.ado**. I recommend that you click on the filename to download the code. To avoid scrolling, view the code in the do-file editor, or your favorite text editor, to see the line numbers.

*! version 2.0.0 07Feb2016 program define mypoisson2, eclass sortpreserve version 14 syntax varlist(numeric ts fv min=2) [if] [in] [, noCONStant ] marksample touse gettoken depvar indepvars : varlist _fv_check_depvar `depvar' tempname b mo V N rank getcinfo `indepvars' , `constant' local cnames "`r(cnames)'" matrix `mo' = r(mo) mata: mywork("`depvar'", "`cnames'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`mo'") if "`constant'" == "" { local cnames "`cnames' _cons" } matrix colnames `b' = `cnames' matrix colnames `V' = `cnames' matrix rownames `V' = `cnames' ereturn post `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mypoisson1" ereturn display end program getcinfo, rclass syntax varlist(ts fv), [ noCONStant ] _rmcoll `varlist' , `constant' expand local cnames `r(varlist)' local p : word count `cnames' if "`constant'" == "" { local p = `p' + 1 local cons _cons } tempname b mo matrix `b' = J(1, `p', 0) matrix colnames `b' = `cnames' `cons' _ms_omit_info `b' matrix `mo' = r(omit) return local cnames "`cnames'" return matrix mo = `mo' end mata: void mywork( string scalar depvar, string scalar indepvars, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar mo) { real vector y, b real matrix X, V, Ct real scalar n, p, rank y = st_data(., depvar, touse) n = rows(y) X = st_data(., indepvars, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) Ct = makeCt(mo) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval2()) optimize_init_params(S, J(1, p, .01)) optimize_init_constraints(S, Ct) b = optimize(S) V = optimize_result_V_oim(S) rank = p - diag0cnt(invsym(V)) st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, rank) } real matrix makeCt(string scalar mo) { real vector mo_v real scalar ko, j, p mo_v = st_matrix(mo) p = cols(mo_v) ko = sum(mo_v) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo_v[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } return(Ct) } void plleval2(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = sum(-exp(xb) + y:*xb - lnfactorial(y)) } end

As with programs that I have previously discussed, there is an ado part and Mata part. Lines 2–56 are the ado part; they define **mypoisson2** and the subroutine **getcinfo**. Lines 58–133 are the Mata part; they define the Mata work function **mywork()** used in **mypoisson2**, the **makeCt** function used in **mywork()**, and the evaluator function **plleval2()** used in **mywork()**.

The ado-command **mypoisson2** has the following parts:

- Lines 5–11 parse what the user typed, identify the sample, and create temporary names for Stata objects used in the computations or returned by our Mata work function.
- Lines 13–15 use the subroutine
**getcinfo**to get information about the user-specified covariates and then store this information in the local**cnames**and a Stata matrix. - Lines 17–18 call the Mata work function.
- Lines 20–30 post the results returned by the Mata work function to
**e()**. - Line 32 displays the results.

The Mata function **mywork()** has the following parts:

- Lines 60–65 parse the arguments.
- Lines 67–69 declare vectors, matrices, and scalars that are local to
**mywork()**. - Lines 71–90 compute the results.
- Lines 92–95 copy the computed results to Stata, using the names that were passed as arguments.

I now discuss the ado-code in some detail, focusing on only the aspects that are new to **mypoisson2.ado**.

The subroutine **getcinfo** encapsulates the computations performed in examples 3, 4, and 5 in Programming an estimation command in Stata: Handling factor variables in optimize(). **getcinfo** uses **_rmcoll** to identify which covariates must be omitted, stores the names of the covariates to be omitted in the local macro **cnames**, then uses **_ms_omit_info** to create a vector containing a 1 for omitted variables and a 0 otherwise. **getcinfo** puts **cnames** into **r(cnames)** and the vector identifying the omitted variables into **r(mo)**.

Lines 14–15 store the information put into **r()** by **getcinfo** in the local macro **cnames** and the Stata vector whose name is contained in the local macro **mo**. Lines 23–25 use **cnames** to put row names on the vector of point estimates and row and column names on the estimated variance–covariance matrix of the estimator (VCE). Line 18 passes the vector to **mywork()**.

I now discuss the Mata code in some detail, again focusing on only the new aspects. Line 79 gets the constraint matrix **Ct** needed to handle any omitted variables from **makeCt()**. Lines 98–121 define **makeCt()**, which encapsulates the computations that form **Cm** in example 6 in Programming an estimation command in Stata: Handling factor variables in optimize(). Line 86 uses **optimize_init_constraints()** to put **Ct** in the **optimize()** object. **Ct** contains a matrix with zero rows when there are no constraints, and putting a constraint matrix with zero rows into the **optimize()** object tells **optimize()** that there are no constraints.

The output in examples 1 and 2 confirms that **mypoisson2** produces the same results as **poisson** when a full set of indicator variables is included in a model with a constant term.

**Example 1: mypoisson2 results**

. clear all . use accident3 . mypoisson2 accidents cvalue ibn.kids traffic, noconstant Iteration 0: f(p) = -843.66874 Iteration 1: f(p) = -573.50561 Iteration 2: f(p) = -545.86215 Iteration 3: f(p) = -545.11765 Iteration 4: f(p) = -545.10899 Iteration 5: f(p) = -545.10898 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582923 .0703823 -9.35 0.000 -.7962391 -.5203456 | kids | 0 | .7157575 .282144 2.54 0.011 .1627653 1.26875 1 | -.9465934 .3111915 -3.04 0.002 -1.556518 -.3366693 2 | -.8589336 .3097583 -2.77 0.006 -1.466049 -.2518184 3 | -2.518175 .4366261 -5.77 0.000 -3.373947 -1.662404 | traffic | .1383977 .0307285 4.50 0.000 .078171 .1986243 ------------------------------------------------------------------------------

**Example 2: poisson results**

. poisson accidents cvalue ibn.kids traffic, noconstant Iteration 0: log likelihood = -1250.3959 Iteration 1: log likelihood = -553.73534 Iteration 2: log likelihood = -545.14915 Iteration 3: log likelihood = -545.10902 Iteration 4: log likelihood = -545.10898 Poisson regression Number of obs = 505 Wald chi2(6) = 285.69 Log likelihood = -545.10898 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6582924 .0703822 -9.35 0.000 -.796239 -.5203457 | kids | 0 | .7157576 .2821434 2.54 0.011 .1627666 1.268749 1 | -.9465933 .311191 -3.04 0.002 -1.556516 -.3366701 2 | -.8589334 .3097578 -2.77 0.006 -1.466048 -.2518192 3 | -2.51817 .436625 -5.77 0.000 -3.37394 -1.662401 | traffic | .1383977 .0307284 4.50 0.000 .0781711 .1986242 ------------------------------------------------------------------------------

**Done and undone**

I discussed **mypoisson2**, which handles factor variables and uses Mata to compute Poisson regression results. In my next post, I add robust and cluster–robust estimators of the VCE.

The command **gmm** is used to estimate the parameters of a model using the generalized method of moments (GMM). GMM can be used to estimate the parameters of models that have more identification conditions than parameters, overidentified models. The specification of these models can be evaluated using Hansen’s *J* statistic (Hansen, 1982).

We use **gmm** to estimate the parameters of a Poisson model with an endogenous regressor. More instruments than regressors are available, so the model is overidentified. We then use **estat overid** to calculate Hansen’s *J* statistic and test the validity of the overidentification restrictions.

In previous posts (see Estimating parameters by maximum likelihood and method of moments using mlexp and gmm and Understanding the generalized method of moments (GMM): A simple example), the interactive version of **gmm** has been used to estimate simple single-equation models. For more complex models, it can be easier to use the moment-evaluator program version of **gmm**. We demonstrate how to use this version of **gmm**.

**Poisson model with endogenous regressors**

In this post, the Poisson regression of \(y_i\) on exogenous \({\bf x}_i\) and endogenous \({\bf y}_i\) has the form

\begin{equation*}

E(y_i \vert {\bf x}_i,{\bf y}_{2,i},\epsilon_i)= \exp({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}) + \epsilon_i

\end{equation*}

where \(\epsilon_i\) is a zero-mean error term. The endogenous regressors \({\bf y}_{2,i}\) may be correlated with \(\epsilon_i\). This is the same formulation used by **ivpoisson** with additive errors; see **[R] ivpoisson** for more details. For more information on Poisson models with endogenous regressors, see Mullahy (1997), Cameron and Trivedi (2013), Windmeijer and Santos Silva (1997), and Wooldridge (2010).

Moment conditions are expected values that specify the model parameters in terms of the true moments. GMM finds the parameter values that are closest to satisfying the sample equivalent of the moment conditions. In this model, we define moment conditions using an error function,

\begin{equation*}

u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2) = y_i – \exp({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i})

\end{equation*}

Let \({\bf x}_{2,i}\) be additional exogenous variables. These are not correlated with \(\epsilon_i\), but are correlated with \({\bf y}_{2,i}\). Combining them with \({\bf x}_i\), we have the instruments \({\bf z}_i = (\begin{matrix} {\bf x}_{i} & {\bf x}_{2,i}\end{matrix})\). So the moment conditions are

\begin{equation*}

E({\bf z}_i u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)) = {\bf 0}

\end{equation*}

Suppose there are \(k\) parameters in \({\boldsymbol \beta}_1\) and \({\boldsymbol \beta}_2\) and \(q\) instruments. When \(q>k\), there are more moment conditions than parameters. The model is overidentified. Here GMM finds parameter estimates that solve weighted moment conditions. GMM minimizes

\[

Q({{\boldsymbol \beta}_1},{\boldsymbol \beta}_2) = \left\{\frac{1}{N}\sum\nolimits_i {{\bf z}}_i

u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)\right\}

{\bf W}

\left\{\frac{1}{N}\sum\nolimits_i {{\bf z}}_i u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)\right\}’

\]

for \(q\times q\) weight matrix \({\bf W}\).

**Overidentification test**

When the model is correctly specified,

\begin{equation*}

E({\bf z}_i u_i({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)) = {\bf 0}

\end{equation*}

In this case, if \({\bf W}\) is an optimal weight matrix, it is equal to the inverse of the covariance matrix of the moment conditions. Here we have

\[

{\bf W}^{-1} = E\{{\bf z}_i’ u_{i}({\boldsymbol \beta}_1,{\boldsymbol \beta}_2)

u_{i}({\boldsymbol \beta}_1,{\boldsymbol \beta}_2) {\bf z}_i\}

\]

Hansen’s test evaluates the null hypothesis that an overidentified model is correctly specified. The test statistic \(J = N Q(\hat{\boldsymbol \beta}_1, \hat{\boldsymbol \beta}_2)\) is used. If \({\bf W}\) is an optimal weight matrix, under the null hypothesis, Hansen’s *J* statistic has a \(\chi^2(q-k)\) distribution.

The two-step and iterated estimators used by **gmm** provide estimates of the optimal **W**. For overidentified models, the **estat overid** command calculates Hansen’s *J* statistic after these estimators are used.

**Moment-evaluator program**

We define a program that can be called by **gmm** in calculating the moment conditions for Poisson models with endogenous regressors. See Programming an estimation command in Stata: A map to posted entries for more information about programming in Stata. The program calculates the error function \(u_i\), and **gmm** generates the moment conditions by multiplying by the instruments \({\bf z}_i\).

To solve the weighted moment conditions, **gmm** must take the derivative of the moment conditions with respect to the parameters. Using the chain rule, these are the derivatives of the error functions multiplied by the instruments. Users may specify these derivatives themselves, or **gmm** will calculate the derivatives numerically. Users can gain speed and numeric stability by properly specifying the derivatives themselves.

When linear forms of the parameters are estimated, users may specify derivatives to **gmm** in terms of the linear form (prediction). The chain rule is then used by **gmm** to determine the derivatives of the error function \(u_i\) with respect to the parameters. Our error function \(u_i\) is a function of the linear prediction \({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}\).

The program **gmm_ivpois** calculates the error function \(u_i\) and the derivative of \(u_i\) in terms of the linear prediction \({\boldsymbol \beta}_1{\bf x}_i + {\boldsymbol \beta}_2{\bf y}_{2,i}\).

program gmm_ivpois version 14.1 syntax varlist [if], at(name) depvar(varlist) rhs(varlist) /// [derivatives(varlist)] tempvar m quietly gen double `m' = 0 `if' local i = 1 foreach var of varlist `rhs' { quietly replace `m' = `m' + `var'*`at'[1,`i'] `if' local i = `i' + 1 } quietly replace `m' = `m' + `at'[1,`i'] `if' quietly replace `varlist' = `depvar' - exp(`m') `if' if "`derivatives'" == "" { exit } replace `derivatives' = -exp(`m') end

Lines 3–4 of **gmm_ivpois** contain the syntax statement that parses the arguments to the program. All moment-evaluator programs must accept a **varlist**, the **if** condition, and the **at()** option. The **varlist** corresponds to variables that store the values of the error functions. The program **gmm_ivpois** will calculate the error function and store it in the specified **varlist**. The **at()** option is specified with the name of a matrix that contains the model parameters. The **if** condition specifies the observations for which estimation is performed.

The program also requires the options **depvar()** and **rhs()**. The name of the dependent variable is specified in the **depvar()** option. The regressors are specified in the **rhs()** option.

On line 4, **derivatives()** is optional. The variable name specified here corresponds to the derivative of the error function with respect to the linear prediction.

The linear prediction of the regressors is stored in the temporary variable **m** over lines 6–12. On line 13, we give the value of the error function to the specified **varlist**. Lines 14–16 allow the program to exit if **derivatives()** is not specified. Otherwise, on line 17, we store the value of the derivative of the error function with respect to the linear prediction in the variable specified in **derivatives()**.

**The data**

We simulate data from a Poisson regression with an endogenous covariate, and then we use **gmm** and the **gmm_ivpois** program to estimate the parameters of the regression. We will then use **estat overid** to check the specification of the model. We simulate a random sample of 3,000 observations.

. set seed 45 . set obs 3000 number of observations (_N) was 0, now 3,000 . generate x = rnormal()*.8 + .5 . generate z = rchi2(1) . generate w = rnormal()*.5 . matrix cm = (1, .9 \ .9, 1) . matrix sd = (.5,.8) . drawnorm e u, corr(cm) sd(sd)

We generate the exogenous covariates \(x\), \(z\), and \(w\). The variable \(x\) will be a regressor, while \(z\) and \(w\) will be extra instruments. Then we use **drawnorm** to draw the errors \(e\) and \(u\). The errors are positively correlated.

. generate y2 = exp(.2*x + .1*z + .3*w -1 + u) . generate y = exp(.5*x + .2*y2+1) + e

We generate the endogenous regressor \(y2\) as a lognormal regression on the instruments. The outcome of interest \(y\) has an exponential mean on \(x\) and \(y2\), with \(e\) as an additive error. As \(e\) is correlated with \(u\), \(y2\) is correlated with \(e\).

**Estimating the model parameters**

Now we use **gmm** to estimate the parameters of the Poisson regression with endogenous covariates. The name of our moment-evaluator program is listed to the right of **gmm**. The instruments that **gmm** will use to form the moment conditions are listed in **instruments()**. We specify the options **depvar()** and **rhs()** with the appropriate variables. They will be passed on to **gmm_ivpois**.

The parameters are specified as the linear form **y** in the **parameters()** option, while we specify **haslfderivatives** to inform **gmm** that **gmm_ivpois** provides derivatives of this linear form. The option **nequations()** tells **gmm** how many error functions to expect.

. gmm gmm_ivpois, depvar(y) rhs(x y2) /// > haslfderivatives instruments(x z w) /// > parameters({y: x y2 _cons}) nequations(1) Step 1 Iteration 0: GMM criterion Q(b) = 14.960972 Iteration 1: GMM criterion Q(b) = 3.3038486 Iteration 2: GMM criterion Q(b) = .59045217 Iteration 3: GMM criterion Q(b) = .00079862 Iteration 4: GMM criterion Q(b) = .00001419 Iteration 5: GMM criterion Q(b) = .00001418 Step 2 Iteration 0: GMM criterion Q(b) = .0000567 Iteration 1: GMM criterion Q(b) = .00005648 Iteration 2: GMM criterion Q(b) = .00005648 GMM estimation Number of parameters = 3 Number of moments = 4 Initial weight matrix: Unadjusted Number of obs = 3,000 GMM weight matrix: Robust ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .5006366 .0033273 150.46 0.000 .4941151 .507158 y2 | .2007893 .0075153 26.72 0.000 .1860597 .2155189 _cons | 1.000717 .0063414 157.81 0.000 .988288 1.013146 ------------------------------------------------------------------------------ Instruments for equation 1: x z w _cons

Our coefficients are significant. However, the model could still be misspecified.

**Overidentification test**

We use **estat overid** to compute Hansen’s *J* statistic.

. estat overid Test of overidentifying restriction: Hansen's J chi2(1) = .169449 (p = 0.6806)

The *J* statistic equals 0.17. In addition to computing Hansen’s *J*, **estat overid** provides a test against misspecification of the model. In this case, we have one more instrument than regressor, so the *J* statistic has a \(\chi^2(1)\) distribution. The probability of obtaining a \(\chi^2(1)\) value greater than 0.17 is given in parentheses. This probability—the *p*-value of the test—is large and so we fail to reject the null hypothesis that the model is properly specified.

**Conclusion**

We have demonstrated how to estimate the parameters of a Poisson regression with an endogenous regressor using the moment-evaluator program version of **gmm**. We have also demonstrated how to use **estat overid** to test for model misspecification after estimation of an overidentified model in **gmm**. See **[R] gmm** and **[R] gmm postestimation** for more information.

**References**

Cameron, A. C., and P. K. Trivedi. 2013. *Regression Analysis of Count Data*. 2nd ed. New York: Cambridge University Press.

Hansen, L. P. 1982. Large sample properties of generalized method of moments estimators. *Econometrica* 50: 1029–1054.

Mullahy, J. 1997. Instrumental-variable estimation of count data models: Applications to models of cigarette smoking behavior. *Review of Economics and Statistics* 79: 586–593.

Windmeijer, F., and J. M. C. Santos Silva. 1997. Endogeneity in count data models: An application to demand for health care. *Journal of Applied Econometrics* 12: 281–294.

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, MA: MIT Press.