Programming an estimation command in Stata: Certifying your command
\(\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)Before you use or distribute your estimation command, you should verify that it produces correct results and write a do-file that certifies that it does so. I discuss the processes of verifying and certifying an estimation command, and I present some techniques for writing a do-file that certifies mypoisson5, which I discussed in previous posts.
This is the twenty-fifth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
Verification versus certification
Verification is the process of establishing that a command produces the correct results. Verfication produces true values that can be compared with the values produced by a command. Certification is the process of checking that the differences between the verified true results and the results produced by a command are sufficiently small.
Verification can be easy or difficult. If there is another command or program that you trust, you can use it to create verified values. For example, I trust the poisson command, so I can use it to create true test-case values for mypoisson5. When another command or program is not available, I use simulation techniques to obtain certified values. See Monte Carlo simulations using Stata and Efficiency comparisons by Monte Carlo simulation for discussions of Monte Carlo simulations.
I certify a command by writing do-files that check that the results of a command are close to the verified true values in many specific cases. These do-files are called certification scripts, and I run them every time I make any change to my command. The process that I present is a greatly simplified version of that used to certify Stata; see Gould (2001) for another introduction to certification and for more about Stata certification.
Comparing numbers
The assert command checks that a logical expression is true. Here I use it to check that two integer values are equal or that two noninteger values are sufficiently close.
I check for equality between integer values and closeness between noninteger values because of how computers do math. You cannot fit the entire real-number line on a computer; there are too many real numbers. Computers use finite-precision base-two approximations to the real numbers. Integers have an exact representation in this approximation, and integer calculations can be performed without approximation error. Most noninteger values do not have an exact representation in the base-two approximation used on computers, and noninteger calculations are performed with approximation error. See The Penultimate Guide to Precision for details.
Example 1 illustrates that assert produces no output or error when asked to assert that a true logical expression is true.
Example 1: assert a true expression
. assert 3==3
In contrast, example 2 illustrates that assert produces an error when asked to assert that a false logical expression is true.
Example 2: assert a false expression
. assert 3==2 assertion is false r(9);
In example 3, I use mreldif() to compute the maximum of the element-wise relative differences between two integer-valued vectors, and I then use assert to check for equality.
Example 3: assert and mreldif()
. matrix a = (1, 2, 3) . matrix b = a . display mreldif(a, b) 0 . assert mreldif(a,b)==0
Examples 1–3 illustrated assert by comparing integers. In certification, we usually compare noninteger values. Because of finite-precision approximation errors, a small change in how the results are computed—such as using 1 processor instead of 8 processors in Stata/MP, changing the sort order of the data, or using a Mac instead of a PC—may cause the results to change slightly. These changes can be surprising if your intuition is guided by infinite-precision math, but they should be small enough to be ignorable. Example 4 illustrates this point by comparing the point estimates obtained using 8 processors and 1 processor.
Example 4: The effect of using 1 instead of 8 processors
. clear all . use accident3 . gsort - cvalue . set processors 8 The maximum number of processors or cores being used is changed from 1 to 8. It can be set to any number between 1 and 8 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . set processors 1 The maximum number of processors or cores being used is changed from 8 to 1. It can be set to any number between 1 and 8 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 2.420e-17
Because differences like these are unimportant, I check that the computed results are close to the verified results instead of requiring that they be exactly the same. (You might not see these differences if you run this example on your 8-processor machine. The differences depend on what else your computer is doing when you run the example.)
Writing a certification script
I routinely use the four following techniques to write a certification script:
- I check that my command reproduces results that I have previously verified.
- I check that my command produces results that are close to those produced by a series of hand calculations.
- I check my command against itself.
- I check that my command produces results sufficiently close to another Stata command.
Certifying my command against previously verified results
Consider the results produced by mypoisson5.ado displayed in example 5.
Example 5: mypoisson5 results
. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------
The results displayed in example 5 are stored in e(). I previously verified that these results are correct by comparing the higher-precision results displayed in example 6 with other results.
Example 6: mypoisson5 e() results
. ereturn list scalars: e(N) = 505 e(rank) = 4 macros: e(cmd) : "mypoisson5" e(predict) : "mypoisson5_p" e(properties) : "b V" matrices: e(b) : 1 x 4 e(V) : 4 x 4 functions: e(sample) . matrix list e(b), format(%16.15g) e(b)[1,4] cvalue kids traffic _cons y1 -.65588706902223 -1.0090169724739 .1467114650851 .57435412474423
I could use the results from example 6 to create a certification script like test1.do.
clear all use accident3 mypoisson5 accidents cvalue kids traffic matrix b1 = e(b) matrix btrue = (-.65588706902223, -1.0090169724739, /// .1467114650851, .57435412474423) display mreldif(b1, btrue) assert mreldif(b1, btrue) < 1e-14
Running test1.do produces
Example 7: test1
. do test1 . clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . matrix btrue = (-.65588706902223, -1.0090169724739, /// > .1467114650851, .57435412474423) . display mreldif(b1, btrue) 6.742e-15 . assert mreldif(b1, btrue) < 1e-14 . . end of do-file
Note the process. After verifing the results produced by mypoisson5, I write a certification script to ensure that mypoisson5 will always produce approximately these numbers. Following this process protects me from accidentally causing my command to produce incorrect results as I make it "better" or faster. Do not underestimate the importance of this protection. Putting bugs into your calculations as you attempt to improve your command is remarkably easy. This process also documents that I have checked this particular case. If someone claims to have a program that differs from mine in this case, I can ask that person to compute the results for this example in which I know that my command works. This request almost always yields a discussion in which that person debugs his or her own program so that it produces my verified results.
Here I copied and pasted numbers from a log file into a do-file to create test1.do. The copy-paste method is error–prone, tedious, and should be avoided. The mkassert command solves this problem. mkassert creates assert commands that certify results stored in e(), r(), the dataset, or other Stata objects. I use it all the time.
I begin writing a certification script using mkassert with code like that in test2.do, whose output appears in example 8.
clear all use accident3 mypoisson5 accidents cvalue kids traffic mkassert eclass, mtol(1e-12) saving(test3.do, replace)
Example 8: Using mkassert
. do test2 . clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . mkassert eclass, mtol(1e-12) saving(test3.do, replace) . end of do-file
test2.do produces results that I previously verified and uses mkassert to write an assert command for every result stored in e() in the file test3.do, which is in code block 2.
assert `"`e(cmd)'"' == `"mypoisson5"' assert `"`e(predict)'"' == `"mypoisson5_p"' assert `"`e(properties)'"' == `"b V"' assert e(N) == 505 assert e(rank) == 4 qui { mat T_b = J(1,4,0) mat T_b[1,1] = -.6558870690222316 mat T_b[1,2] = -1.009016972473914 mat T_b[1,3] = .1467114650851019 mat T_b[1,4] = .5743541247442324 } matrix C_b = e(b) assert mreldif( C_b , T_b ) < 1e-12 _assert_streq `"`: rowfullnames C_b'"' `"y1"' _assert_streq `"`: colfullnames C_b'"' `"cvalue kids traffic _cons"' mat drop C_b T_b qui { mat T_V = J(4,4,0) mat T_V[1,1] = .0049911902167341 mat T_V[1,2] = .0002953642487161 mat T_V[1,3] = -.0000506909358346 mat T_V[1,4] = -.0089523155601508 mat T_V[2,1] = .0002953642487161 mat T_V[2,2] = .0065280055261688 mat T_V[2,3] = .0002050149836939 mat T_V[2,4] = -.0054776138886792 mat T_V[3,1] = -.0000506909358346 mat T_V[3,2] = .0002050149836939 mat T_V[3,3] = .0009844631577381 mat T_V[3,4] = -.0075052131640854 mat T_V[4,1] = -.0089523155601508 mat T_V[4,2] = -.0054776138886792 mat T_V[4,3] = -.0075052131640854 mat T_V[4,4] = .0806284655627814 } matrix C_V = e(V) assert mreldif( C_V , T_V ) < 1e-12 _assert_streq `"`: rowfullnames C_V'"' `"cvalue kids traffic _cons"' _assert_streq `"`: colfullnames C_V'"' `"cvalue kids traffic _cons"' mat drop C_V T_V
Each assert command checks that what is currently in e() is sufficiently close to the corresponding value stored in e() by mypoisson5. Lines 2–4 check the local macros, lines 6–7 check the scalars, lines 9–20 check e(b), and lines 22–45 check e(V).
I create the script test4.do, which checks this case by replacing the mkassert command in test2.do with the assert commands it created in test3.do; see code block 4.
// Test case 1 clear all use accident3 mypoisson5 accidents cvalue kids traffic assert `"`e(cmd)'"' == `"mypoisson5"' assert `"`e(predict)'"' == `"mypoisson5_p"' assert `"`e(properties)'"' == `"b V"' assert e(N) == 505 assert e(rank) == 4 qui { mat T_b = J(1,4,0) mat T_b[1,1] = -.6558870690222316 mat T_b[1,2] = -1.009016972473914 mat T_b[1,3] = .1467114650851019 mat T_b[1,4] = .5743541247442324 } matrix C_b = e(b) assert mreldif( C_b , T_b ) < 1e-12 _assert_streq `"`: rowfullnames C_b'"' `"y1"' _assert_streq `"`: colfullnames C_b'"' `"cvalue kids traffic _cons"' mat drop C_b T_b qui { mat T_V = J(4,4,0) mat T_V[1,1] = .0049911902167341 mat T_V[1,2] = .0002953642487161 mat T_V[1,3] = -.0000506909358346 mat T_V[1,4] = -.0089523155601508 mat T_V[2,1] = .0002953642487161 mat T_V[2,2] = .0065280055261688 mat T_V[2,3] = .0002050149836939 mat T_V[2,4] = -.0054776138886792 mat T_V[3,1] = -.0000506909358346 mat T_V[3,2] = .0002050149836939 mat T_V[3,3] = .0009844631577381 mat T_V[3,4] = -.0075052131640854 mat T_V[4,1] = -.0089523155601508 mat T_V[4,2] = -.0054776138886792 mat T_V[4,3] = -.0075052131640854 mat T_V[4,4] = .0806284655627814 } matrix C_V = e(V) assert mreldif( C_V , T_V ) < 1e-12 _assert_streq `"`: rowfullnames C_V'"' `"cvalue kids traffic _cons"' _assert_streq `"`: colfullnames C_V'"' `"cvalue kids traffic _cons"' mat drop C_V T_V
Every time I run test4.do, it checks that mypoisson5 produces correct results for this one case. The more cases that I verify and certify, the more certain I am that my command works.
I summarize this important process below.
- I write a do-file, here called test2.do, that produces results for a case in which I have verified that my command produces correct results.
- At the end of test2.do, I use mkassert to create another do-file, here called test3.do, that contains assert commands for each result that my command stored in e().
- I replace the mkassert command in test2.do with the commands it created in test3.do to create the certification script, here called test4.do.
This method assumes that I have already verified that my command produces correct results for a specific example. The common case of verification by simulation makes this method more applicable than you might think.
Certifying my command against hand-calculated results
I can almost always find another way to compute estimation results in Stata that should be numerically equivalent. In the Poisson–regression case at hand, I can use gmm. As discussed by Cameron and Trivedi (2005) and Wooldridge (2010), Poisson regression finds the \(\widehat{\betab}\) that solves the score equations,
$$
\sum_{i=1}^N \left[y_i - \exp(\xb_i\widehat{\betab})\right]\xb_i = {\bf 0}
$$
We showed how to use the gmm for similar problems in Understanding the generalized method of moments (GMM): A simple example. In example 9, I use gmm, and I use assert to check that the point estimates produced by gmm and mypoisson5 are sufficiently close.
Example 9: Using gmm to certify mypoisson5
. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . gmm (accidents - exp({xb:cvalue kids traffic _cons})), /// > instruments(cvalue kids traffic) onestep Step 1 Iteration 0: GMM criterion Q(b) = .57041592 Iteration 1: GMM criterion Q(b) = .01710408 Iteration 2: GMM criterion Q(b) = .00015313 Iteration 3: GMM criterion Q(b) = 2.190e-08 Iteration 4: GMM criterion Q(b) = 3.362e-16 note: model is exactly identified GMM estimation Number of parameters = 4 Number of moments = 4 Initial weight matrix: Unadjusted Number of obs = 505 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .1094934 -5.99 0.000 -.8704901 -.441284 kids | -1.009017 .1884791 -5.35 0.000 -1.378429 -.6396047 traffic | .1467115 .0923401 1.59 0.112 -.0342718 .3276947 _cons | .5743542 .6039059 0.95 0.342 -.6092797 1.757988 ------------------------------------------------------------------------------ Instruments for equation 1: cvalue kids traffic _cons . matrix b2 = e(b) . display mreldif(b1, b2) 5.554e-08 . assert mreldif(b1, b2) <1e-7
I used a weak tolerance when comparing the two vectors of point estimates because the commands use different algorithms to find their solutions. If I reduced the convergence tolerance in each command, the solutions would be closer to each other.
For a real certification script, I would also check everything else stored in e() by mypoisson5 against a value computed by gmm. I skip these details to present other methods.
Certifying my command against itself
Almost all estimation commands accept if or in sample restrictions, and these restrictions can usually be tested by comparing other results produced by the same command. Example 10 provides an example.
Example 10: Testing a command against itself
. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic if cvalue <=3 Iteration 0: f(p) = -712.62548 Iteration 1: f(p) = -540.56297 Iteration 2: f(p) = -529.54572 Iteration 3: f(p) = -529.44627 Iteration 4: f(p) = -529.44618 Iteration 5: f(p) = -529.44618 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.3646368 .0872777 -4.18 0.000 -.5356979 -.1935756 kids | -.9874777 .0805708 -12.26 0.000 -1.145394 -.8295618 traffic | .1488243 .0317338 4.69 0.000 .0866272 .2110214 _cons | .1081705 .3015328 0.36 0.720 -.4828229 .6991638 ------------------------------------------------------------------------------ . matrix b1 = e(b) . keep if cvalue <=3 (121 observations deleted) . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -712.62548 Iteration 1: f(p) = -540.56297 Iteration 2: f(p) = -529.54572 Iteration 3: f(p) = -529.44627 Iteration 4: f(p) = -529.44618 Iteration 5: f(p) = -529.44618 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.3646368 .0872777 -4.18 0.000 -.5356979 -.1935756 kids | -.9874777 .0805708 -12.26 0.000 -1.145394 -.8295618 traffic | .1488243 .0317338 4.69 0.000 .0866272 .2110214 _cons | .1081705 .3015328 0.36 0.720 -.4828229 .6991638 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 0 . assert mreldif(b1, b2) <1e-14
I begin by storing the point estimates obtained from the sample in which cvalue<=3 in b1. Next, I keep only these observations in the sample and use mypoisson5 without an if restriction to compute the point estimates stored in b2. Finally, I assert that b1 and b2 are sufficiently close. In this case, the results are exactly the same, but I only test that they are close because I should not rely on this equality. (I am using Stata/MP, and other jobs on my computer could change the number of processors I effectively have, which can cause the results to change slightly.)
An analogous process works for testing in restrictions and integer weights.
Certifying my command against another Stata command
Sometimes constraining a parameter in the new estimator produces the same results as another estimator already implemented in Stata. For example, a random-effects estimator may reduce to a cross-sectional estimator when the variance of the random-effect is constrained to zero.
In the case at hand, I could check that my command produces the same values as poisson, as shown in example 11.
Example 11: Certifying against an existing command
. clear all . use accident3 . mypoisson5 accidents cvalue kids traffic Iteration 0: f(p) = -851.18669 Iteration 1: f(p) = -556.66855 Iteration 2: f(p) = -555.81731 Iteration 3: f(p) = -555.81538 Iteration 4: f(p) = -555.81538 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506596 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .5743541 .2839515 2.02 0.043 .0178194 1.130889 ------------------------------------------------------------------------------ . matrix b1 = e(b) . poisson accidents cvalue kids traffic Iteration 0: log likelihood = -555.86605 Iteration 1: log likelihood = -555.8154 Iteration 2: log likelihood = -555.81538 Poisson regression Number of obs = 505 LR chi2(3) = 340.20 Prob > chi2 = 0.0000 Log likelihood = -555.81538 Pseudo R2 = 0.2343 ------------------------------------------------------------------------------ accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- cvalue | -.6558871 .0706484 -9.28 0.000 -.7943553 -.5174188 kids | -1.009017 .0807961 -12.49 0.000 -1.167374 -.8506594 traffic | .1467115 .0313762 4.68 0.000 .0852153 .2082076 _cons | .574354 .2839515 2.02 0.043 .0178193 1.130889 ------------------------------------------------------------------------------ . matrix b2 = e(b) . display mreldif(b1, b2) 1.081e-07 . assert mreldif(b1, b2) <1e-6
Done and undone
I presented some techniques that I use to write certification scripts. A real certification script would cover many more cases. In the next post, I discuss using and creating Mata libraries.
References
Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.
Gould, W. 2001. Statistical software certification. Stata Journal 1: 29–50.
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.