## Programming an estimation command in Stata: Certifying your command

\(\newcommand{\xb}{{\bf x}}

\newcommand{\betab}{\boldsymbol{\beta}}\)Before you use or distribute your estimation command, you should verify that it produces correct results and write a do-file that certifies that it does so. I discuss the processes of verifying and certifying an estimation command, and I present some techniques for writing a do-file that certifies **mypoisson5**, which I discussed in previous posts.

This is the twenty-fifth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Verification versus certification**

Verification is the process of establishing that a command produces the correct results. Verfication produces true values that can be compared with the values produced by a command. Certification is the process of checking that the differences between the verified true results and the results produced by a command are sufficiently small.

Verification can be easy or difficult. If there is another command or program that you trust, you can use it to create verified values. For example, I trust the **poisson** command, so I can use it to create true test-case values for **mypoisson5**. When another command or program is not available, I use simulation techniques to obtain certified values. See Monte Carlo simulations using Stata and Efficiency comparisons by Monte Carlo simulation for discussions of Monte Carlo simulations.

I certify a command by writing do-files that check that the results of a command are close to the verified true values in many specific cases. These do-files are called certification scripts, and I run them every time I make any change to my command. The process that I present is a greatly simplified version of that used to certify Stata; see Gould (2001) for another introduction to certification and for more about Stata certification.

**Comparing numbers**

The **assert** command checks that a logical expression is true. Here I use it to check that two integer values are equal or that two noninteger values are sufficiently close.

I check for equality between integer values and closeness between noninteger values because of how computers do math. You cannot fit the entire real-number line on a computer; there are too many real numbers. Computers use finite-precision base-two approximations to the real numbers. Integers have an exact representation in this approximation, and integer calculations can be performed without approximation error. Most noninteger values do not have an exact representation in the base-two approximation used on computers, and noninteger calculations are performed with approximation error. See The Penultimate Guide to Precision for details.

Example 1 illustrates that **assert** produces no output or error when asked to assert that a true logical expression is true.

**Example 1: assert a true expression**

. assert 3==3

In contrast, example 2 illustrates that **assert** produces an error when asked to assert that a false logical expression is true.

**Example 2: assert a false expression**

. assert 3==2 assertion is false r(9);

In example 3, I use **mreldif()** to compute the maximum of the element-wise relative differences between two integer-valued vectors, and I then use **assert** to check for equality.

**Example 3: assert and mreldif()**

. matrix a = (1, 2, 3) . matrix b = a . display mreldif(a, b) 0 . assert mreldif(a,b)==0

Examples 1–3 illustrated **assert** by comparing integers. In certification, we usually compare noninteger values. Because of finite-precision approximation errors, a small change in how the results are computed—such as using 1 processor instead of 8 processors in Stata/MP, changing the sort order of the data, or using a Mac instead of a PC—may cause the results to change slightly. These changes can be surprising if your intuition is guided by infinite-precision math, but they should be small enough to be ignorable. Example 4 illustrates this point by comparing the point estimates obtained using 8 processors and 1 processor.

**Example 4: The effect of using 1 instead of 8 processors**

. clear all . use accident3 . gsort - cvalue . set processors 8 The maximum number of processors or cores being used is changed from 1 to 8. It can be set to any number between 1 and 8 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . set processors 1 The maximum number of processors or cores being used is changed from 8 to 1. It can be set to any number between 1 and 8 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 2.420e-17

Because differences like these are unimportant, I check that the computed results are close to the verified results instead of requiring that they be exactly the same. (You might not see these differences if you run this example on your 8-processor machine. The differences depend on what else your computer is doing when you run the example.)

**Writing a certification script**

I routinely use the four following techniques to write a certification script:

- I check that my command reproduces results that I have previously verified.
- I check that my command produces results that are close to those produced by a series of hand calculations.
- I check my command against itself.
- I check that my command produces results sufficiently close to another Stata command.

**Certifying my command against previously verified results**

Consider the results produced by **mypoisson5.ado** displayed in example 5.

**Example 5: mypoisson5 results**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------

The results displayed in example 5 are stored in **e()**. I previously verified that these results are correct by comparing the higher-precision results displayed in example 6 with other results.

**Example 6: mypoisson5 e() results**

. ereturn list scalars: e(N) = 505 e(rank) = 4 macros: e(cmd) : "mypoisson5" e(predict) : "mypoisson5_p" e(properties) : "b V" matrices: e(b) : 1 x 4 e(V) : 4 x 4 functions: e(sample) . matrix list e(b), format(%16.15g) e(b)[1,4] cvalue kids traffic _cons y1 -.65588706902223 -1.0090169724739 .1467114650851 .57435412474423

I could use the results from example 6 to create a certification script like **test1.do**.

**Code block 1: test1.do**

clear all use accident3 mypoisson5 accidents cvalue kids traffic matrix b1 = e(b) matrix btrue = (-.65588706902223, -1.0090169724739, /// .1467114650851, .57435412474423) display mreldif(b1, btrue) assert mreldif(b1, btrue) < 1e-14

Running **test1.do** produces

**Example 7: test1**

. do test1 . clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . matrix btrue = (-.65588706902223, -1.0090169724739, /// > .1467114650851, .57435412474423) . display mreldif(b1, btrue) 6.742e-15 . assert mreldif(b1, btrue) < 1e-14 . . end of do-file

Note the process. After verifing the results produced by **mypoisson5**, I write a certification script to ensure that **mypoisson5** will always produce approximately these numbers. Following this process protects me from accidentally causing my command to produce incorrect results as I make it "better" or faster. Do not underestimate the importance of this protection. Putting bugs into your calculations as you attempt to improve your command is remarkably easy. This process also documents that I have checked this particular case. If someone claims to have a program that differs from mine in this case, I can ask that person to compute the results for this example in which I know that my command works. This request almost always yields a discussion in which that person debugs his or her own program so that it produces my verified results.

Here I copied and pasted numbers from a log file into a do-file to create **test1.do**. The copy-paste method is error–prone, tedious, and should be avoided. The **mkassert** command solves this problem. **mkassert** creates **assert** commands that certify results stored in **e()**, **r()**, the dataset, or other Stata objects. I use it all the time.

I begin writing a certification script using **mkassert** with code like that in **test2.do**, whose output appears in example 8.

**Code block 2: test2.do**

clear all use accident3 mypoisson5 accidents cvalue kids traffic mkassert eclass, mtol(1e-12) saving(test3.do, replace)

**Example 8: Using mkassert**

. do test2 . clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . mkassert eclass, mtol(1e-12) saving(test3.do, replace) . end of do-file

**test2.do** produces results that I previously verified and uses **mkassert** to write an **assert** command for every result stored in **e()** in the file **test3.do**, which is in code block 2.

**Code block 3: test3.do**

assert `"`e(cmd)'"' == `"mypoisson5"' assert `"`e(predict)'"' == `"mypoisson5_p"' assert `"`e(properties)'"' == `"b V"' assert e(N) == 505 assert e(rank) == 4 qui { mat T_b = J(1,4,0) mat T_b[1,1] = -.6558870690222316 mat T_b[1,2] = -1.009016972473914 mat T_b[1,3] = .1467114650851019 mat T_b[1,4] = .5743541247442324 } matrix C_b = e(b) assert mreldif( C_b , T_b ) < 1e-12 _assert_streq `"`: rowfullnames C_b'"' `"y1"' _assert_streq `"`: colfullnames C_b'"' `"cvalue kids traffic _cons"' mat drop C_b T_b qui { mat T_V = J(4,4,0) mat T_V[1,1] = .0049911902167341 mat T_V[1,2] = .0002953642487161 mat T_V[1,3] = -.0000506909358346 mat T_V[1,4] = -.0089523155601508 mat T_V[2,1] = .0002953642487161 mat T_V[2,2] = .0065280055261688 mat T_V[2,3] = .0002050149836939 mat T_V[2,4] = -.0054776138886792 mat T_V[3,1] = -.0000506909358346 mat T_V[3,2] = .0002050149836939 mat T_V[3,3] = .0009844631577381 mat T_V[3,4] = -.0075052131640854 mat T_V[4,1] = -.0089523155601508 mat T_V[4,2] = -.0054776138886792 mat T_V[4,3] = -.0075052131640854 mat T_V[4,4] = .0806284655627814 } matrix C_V = e(V) assert mreldif( C_V , T_V ) < 1e-12 _assert_streq `"`: rowfullnames C_V'"' `"cvalue kids traffic _cons"' _assert_streq `"`: colfullnames C_V'"' `"cvalue kids traffic _cons"' mat drop C_V T_V

Each **assert** command checks that what is currently in **e()** is sufficiently close to the corresponding value stored in **e()** by **mypoisson5**. Lines 2–4 check the local macros, lines 6–7 check the scalars, lines 9–20 check **e(b)**, and lines 22–45 check **e(V)**.

I create the script **test4.do**, which checks this case by replacing the **mkassert** command in **test2.do** with the **assert** commands it created in **test3.do**; see code block 4.

**Code block 4: test4.do**

// Test case 1 clear all use accident3 mypoisson5 accidents cvalue kids traffic assert `"`e(cmd)'"' == `"mypoisson5"' assert `"`e(predict)'"' == `"mypoisson5_p"' assert `"`e(properties)'"' == `"b V"' assert e(N) == 505 assert e(rank) == 4 qui { mat T_b = J(1,4,0) mat T_b[1,1] = -.6558870690222316 mat T_b[1,2] = -1.009016972473914 mat T_b[1,3] = .1467114650851019 mat T_b[1,4] = .5743541247442324 } matrix C_b = e(b) assert mreldif( C_b , T_b ) < 1e-12 _assert_streq `"`: rowfullnames C_b'"' `"y1"' _assert_streq `"`: colfullnames C_b'"' `"cvalue kids traffic _cons"' mat drop C_b T_b qui { mat T_V = J(4,4,0) mat T_V[1,1] = .0049911902167341 mat T_V[1,2] = .0002953642487161 mat T_V[1,3] = -.0000506909358346 mat T_V[1,4] = -.0089523155601508 mat T_V[2,1] = .0002953642487161 mat T_V[2,2] = .0065280055261688 mat T_V[2,3] = .0002050149836939 mat T_V[2,4] = -.0054776138886792 mat T_V[3,1] = -.0000506909358346 mat T_V[3,2] = .0002050149836939 mat T_V[3,3] = .0009844631577381 mat T_V[3,4] = -.0075052131640854 mat T_V[4,1] = -.0089523155601508 mat T_V[4,2] = -.0054776138886792 mat T_V[4,3] = -.0075052131640854 mat T_V[4,4] = .0806284655627814 } matrix C_V = e(V) assert mreldif( C_V , T_V ) < 1e-12 _assert_streq `"`: rowfullnames C_V'"' `"cvalue kids traffic _cons"' _assert_streq `"`: colfullnames C_V'"' `"cvalue kids traffic _cons"' mat drop C_V T_V

Every time I run **test4.do**, it checks that **mypoisson5** produces correct results for this one case. The more cases that I verify and certify, the more certain I am that my command works.

I summarize this important process below.

- I write a do-file, here called
**test2.do**, that produces results for a case in which I have verified that my command produces correct results. - At the end of
**test2.do**, I use**mkassert**to create another do-file, here called**test3.do**, that contains**assert**commands for each result that my command stored in**e()**. - I replace the
**mkassert**command in**test2.do**with the commands it created in**test3.do**to create the certification script, here called**test4.do**.

This method assumes that I have already verified that my command produces correct results for a specific example. The common case of verification by simulation makes this method more applicable than you might think.

**Certifying my command against hand-calculated results**

I can almost always find another way to compute estimation results in Stata that should be numerically equivalent. In the Poisson–regression case at hand, I can use **gmm**. As discussed by Cameron and Trivedi (2005) and Wooldridge (2010), Poisson regression finds the \(\widehat{\betab}\) that solves the score equations,

$$

\sum_{i=1}^N \left[y_i - \exp(\xb_i\widehat{\betab})\right]\xb_i = {\bf 0}

$$

We showed how to use the **gmm** for similar problems in Understanding the generalized method of moments (GMM): A simple example. In example 9, I use **gmm**, and I use **assert** to check that the point estimates produced by **gmm** and **mypoisson5** are sufficiently close.

**Example 9: Using gmm to certify mypoisson5**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . gmm (accidents - exp({xb:cvalue kids traffic _cons})), /// > instruments(cvalue kids traffic) onestep Step 1 Iteration 0: GMM criterion Q(b) = .57041592 Iteration 1: GMM criterion Q(b) = .01710408 Iteration 2: GMM criterion Q(b) = .00015313 Iteration 3: GMM criterion Q(b) = 2.190e-08 Iteration 4: GMM criterion Q(b) = 3.362e-16 note: model is exactly identified GMM estimation Number of parameters = 4 Number of moments = 4 Initial weight matrix: Unadjusted Number of obs = 505 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .1094934 -5.99 0.000 -.8704901 -.441284 kids | -1.009017 .1884791 -5.35 0.000 -1.378429 -.6396047 traffic | .1467115 .0923401 1.59 0.112 -.0342718 .3276947 _cons | .5743542 .6039059 0.95 0.342 -.6092797 1.757988 ------------------------------------------------------------------------------ Instruments for equation 1: cvalue kids traffic _cons . matrix b2 = e(b) . display mreldif(b1, b2) 5.554e-08 . assert mreldif(b1, b2) <1e-7

I used a weak tolerance when comparing the two vectors of point estimates because the commands use different algorithms to find their solutions. If I reduced the convergence tolerance in each command, the solutions would be closer to each other.

For a real certification script, I would also check everything else stored in **e()** by **mypoisson5** against a value computed by **gmm**. I skip these details to present other methods.

**Certifying my command against itself**

Almost all estimation commands accept **if** or **in** sample restrictions, and these restrictions can usually be tested by comparing other results produced by the same command. Example 10 provides an example.

**Example 10: Testing a command against itself**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic if cvalue <=3 Iteration 0: f(p) = -712.62548 Iteration 1: f(p) = -540.56297 Iteration 2: f(p) = -529.54572 Iteration 3: f(p) = -529.44627 Iteration 4: f(p) = -529.44618 Iteration 5: f(p) = -529.44618 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.3646368 .0872777 -4.18 0.000 -.5356979 -.1935756 kids | -.9874777 .0805708 -12.26 0.000 -1.145394 -.8295618 traffic | .1488243 .0317338 4.69 0.000 .0866272 .2110214 _cons | .1081705 .3015328 0.36 0.720 -.4828229 .6991638 ------------------------------------------------------------------------------ . matrix b1 = e(b) . keep if cvalue <=3 (121 observations deleted) . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -712.62548 Iteration 1: f(p) = -540.56297 Iteration 2: f(p) = -529.54572 Iteration 3: f(p) = -529.44627 Iteration 4: f(p) = -529.44618 Iteration 5: f(p) = -529.44618 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.3646368 .0872777 -4.18 0.000 -.5356979 -.1935756 kids | -.9874777 .0805708 -12.26 0.000 -1.145394 -.8295618 traffic | .1488243 .0317338 4.69 0.000 .0866272 .2110214 _cons | .1081705 .3015328 0.36 0.720 -.4828229 .6991638 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 0 . assert mreldif(b1, b2) <1e-14

I begin by storing the point estimates obtained from the sample in which **cvalue<=3** in **b1**. Next, I keep only these observations in the sample and use **mypoisson5** without an **if** restriction to compute the point estimates stored in **b2**. Finally, I assert that **b1** and **b2** are sufficiently close. In this case, the results are exactly the same, but I only test that they are close because I should not rely on this equality. (I am using Stata/MP, and other jobs on my computer could change the number of processors I effectively have, which can cause the results to change slightly.)

An analogous process works for testing **in** restrictions and integer weights.

**Certifying my command against another Stata command**

Sometimes constraining a parameter in the new estimator produces the same results as another estimator already implemented in Stata. For example, a random-effects estimator may reduce to a cross-sectional estimator when the variance of the random-effect is constrained to zero.

In the case at hand, I could check that my command produces the same values as **poisson**, as shown in example 11.

**Example 11: Certifying against an existing command**

. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . poisson accidents cvalue kids traffic Iteration 0: log likelihood = -555.86605 Iteration 1: log likelihood = -555.8154 Iteration 2: log likelihood = -555.81538 Poisson regression Number of obs = 505 LR chi2(3) = 340.20 Prob > chi2 = 0.0000 Log likelihood = -555.81538 Pseudo R2 = 0.2343 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506594 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .574354 .2839515 2.02 0.043 .0178193 1.130889 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 1.081e-07 . assert mreldif(b1, b2) <1e-6

**Done and undone**

I presented some techniques that I use to write certification scripts. A real certification script would cover many more cases. In the next post, I discuss using and creating Mata libraries.

**References**

Cameron, A. C., and P. K. Trivedi. 2005. *Microeconometrics: Methods and Applications*. Cambridge: Cambridge University Press.

Gould, W. 2001. Statistical software certification. *Stata Journal* 1: 29–50.

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, Massachusetts: MIT Press.