Home > Programming > Programming an estimation command in Stata: Certifying your command

Programming an estimation command in Stata: Certifying your command


This is the twenty-fifth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Verification versus certification

Verification is the process of establishing that a command produces the correct results. Verfication produces true values that can be compared with the values produced by a command. Certification is the process of checking that the differences between the verified true results and the results produced by a command are sufficiently small.

Verification can be easy or difficult. If there is another command or program that you trust, you can use it to create verified values. For example, I trust the poisson command, so I can use it to create true test-case values for mypoisson5. When another command or program is not available, I use simulation techniques to obtain certified values. See Monte Carlo simulations using Stata and Efficiency comparisons by Monte Carlo simulation for discussions of Monte Carlo simulations.

I certify a command by writing do-files that check that the results of a command are close to the verified true values in many specific cases. These do-files are called certification scripts, and I run them every time I make any change to my command. The process that I present is a greatly simplified version of that used to certify Stata; see Gould (2001) for another introduction to certification and for more about Stata certification.

Comparing numbers

The assert command checks that a logical expression is true. Here I use it to check that two integer values are equal or that two noninteger values are sufficiently close.

I check for equality between integer values and closeness between noninteger values because of how computers do math. You cannot fit the entire real-number line on a computer; there are too many real numbers. Computers use finite-precision base-two approximations to the real numbers. Integers have an exact representation in this approximation, and integer calculations can be performed without approximation error. Most noninteger values do not have an exact representation in the base-two approximation used on computers, and noninteger calculations are performed with approximation error. See The Penultimate Guide to Precision for details.

Example 1 illustrates that assert produces no output or error when asked to assert that a true logical expression is true.

Example 1: assert a true expression

. assert 3==3


In contrast, example 2 illustrates that assert produces an error when asked to assert that a false logical expression is true.

Example 2: assert a false expression

. assert 3==2
assertion is false
r(9);


In example 3, I use mreldif() to compute the maximum of the element-wise relative differences between two integer-valued vectors, and I then use assert to check for equality.

Example 3: assert and mreldif()

. matrix a = (1, 2, 3)

. matrix b = a

. display mreldif(a, b)
0

. assert mreldif(a,b)==0


Examples 1–3 illustrated assert by comparing integers. In certification, we usually compare noninteger values. Because of finite-precision approximation errors, a small change in how the results are computed—such as using 1 processor instead of 8 processors in Stata/MP, changing the sort order of the data, or using a Mac instead of a PC—may cause the results to change slightly. These changes can be surprising if your intuition is guided by infinite-precision math, but they should be small enough to be ignorable. Example 4 illustrates this point by comparing the point estimates obtained using 8 processors and 1 processor.

Example 4: The effect of using 1 instead of 8 processors

. clear all

. use accident3

. gsort - cvalue

. set processors 8
The maximum number of processors or cores being used is changed from 1 to
8.  It can be set to any number between 1 and 8

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66855
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |   .5743541   .2839515     2.02   0.043     .0178194    1.130889
------------------------------------------------------------------------------

. matrix b1 = e(b)

. set processors 1
The maximum number of processors or cores being used is changed from 8 to
1.  It can be set to any number between 1 and 8

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66855
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |   .5743541   .2839515     2.02   0.043     .0178194    1.130889
------------------------------------------------------------------------------

. matrix b2 = e(b)

. display mreldif(b1, b2)
2.420e-17


Because differences like these are unimportant, I check that the computed results are close to the verified results instead of requiring that they be exactly the same. (You might not see these differences if you run this example on your 8-processor machine. The differences depend on what else your computer is doing when you run the example.)

Writing a certification script

I routinely use the four following techniques to write a certification script:

1. I check that my command reproduces results that I have previously verified.
2. I check that my command produces results that are close to those produced by a series of hand calculations.
3. I check my command against itself.
4. I check that my command produces results sufficiently close to another Stata command.

Certifying my command against previously verified results

Consider the results produced by mypoisson5.ado displayed in example 5.

Example 5: mypoisson5 results

. clear all

. use accident3

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66855
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |   .5743541   .2839515     2.02   0.043     .0178194    1.130889
------------------------------------------------------------------------------


The results displayed in example 5 are stored in e(). I previously verified that these results are correct by comparing the higher-precision results displayed in example 6 with other results.

Example 6: mypoisson5 e() results

. ereturn list

scalars:
e(N) =  505
e(rank) =  4

macros:
e(cmd) : "mypoisson5"
e(predict) : "mypoisson5_p"
e(properties) : "b V"

matrices:
e(b) :  1 x 4
e(V) :  4 x 4

functions:
e(sample)

. matrix list e(b), format(%16.15g)

e(b)[1,4]
cvalue              kids           traffic             _cons
y1  -.65588706902223  -1.0090169724739    .1467114650851   .57435412474423


I could use the results from example 6 to create a certification script like test1.do.

Code block 1: test1.do

clear all
use accident3
mypoisson5 accidents cvalue kids traffic
matrix b1    = e(b)
matrix btrue = (-.65588706902223,  -1.0090169724739,    ///
.1467114650851,     .57435412474423)
display mreldif(b1, btrue)
assert mreldif(b1, btrue) < 1e-14


Running test1.do produces

Example 7: test1

. do test1

. clear all

. use accident3

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66855
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |   .5743541   .2839515     2.02   0.043     .0178194    1.130889
------------------------------------------------------------------------------

. matrix b1    = e(b)

. matrix btrue = (-.65588706902223,  -1.0090169724739,    ///
>                  .1467114650851,     .57435412474423)

. display mreldif(b1, btrue)
6.742e-15

. assert mreldif(b1, btrue) < 1e-14

.
.
end of do-file


Note the process. After verifing the results produced by mypoisson5, I write a certification script to ensure that mypoisson5 will always produce approximately these numbers. Following this process protects me from accidentally causing my command to produce incorrect results as I make it "better" or faster. Do not underestimate the importance of this protection. Putting bugs into your calculations as you attempt to improve your command is remarkably easy. This process also documents that I have checked this particular case. If someone claims to have a program that differs from mine in this case, I can ask that person to compute the results for this example in which I know that my command works. This request almost always yields a discussion in which that person debugs his or her own program so that it produces my verified results.

Here I copied and pasted numbers from a log file into a do-file to create test1.do. The copy-paste method is error–prone, tedious, and should be avoided. The mkassert command solves this problem. mkassert creates assert commands that certify results stored in e(), r(), the dataset, or other Stata objects. I use it all the time.

I begin writing a certification script using mkassert with code like that in test2.do, whose output appears in example 8.

Code block 2: test2.do

clear all
use accident3
mypoisson5 accidents cvalue kids traffic
mkassert eclass,  mtol(1e-12) saving(test3.do, replace)


Example 8: Using mkassert

. do test2

. clear all

. use accident3

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66855
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |   .5743541   .2839515     2.02   0.043     .0178194    1.130889
------------------------------------------------------------------------------

. mkassert eclass,  mtol(1e-12) saving(test3.do, replace)

.
end of do-file


test2.do produces results that I previously verified and uses mkassert to write an assert command for every result stored in e() in the file test3.do, which is in code block 2.

Code block 3: test3.do


assert "e(cmd)'"'        == "mypoisson5"'
assert "e(predict)'"'    == "mypoisson5_p"'
assert "e(properties)'"' == "b V"'

assert         e(N)    == 505
assert         e(rank) == 4

qui {
mat T_b = J(1,4,0)
mat T_b[1,1] = -.6558870690222316
mat T_b[1,2] = -1.009016972473914
mat T_b[1,3] =  .1467114650851019
mat T_b[1,4] =  .5743541247442324
}
matrix C_b = e(b)
assert mreldif( C_b , T_b ) < 1e-12
_assert_streq ": rowfullnames C_b'"' "y1"'
_assert_streq ": colfullnames C_b'"' "cvalue kids traffic _cons"'
mat drop C_b T_b

qui {
mat T_V = J(4,4,0)
mat T_V[1,1] =  .0049911902167341
mat T_V[1,2] =  .0002953642487161
mat T_V[1,3] = -.0000506909358346
mat T_V[1,4] = -.0089523155601508
mat T_V[2,1] =  .0002953642487161
mat T_V[2,2] =  .0065280055261688
mat T_V[2,3] =  .0002050149836939
mat T_V[2,4] = -.0054776138886792
mat T_V[3,1] = -.0000506909358346
mat T_V[3,2] =  .0002050149836939
mat T_V[3,3] =  .0009844631577381
mat T_V[3,4] = -.0075052131640854
mat T_V[4,1] = -.0089523155601508
mat T_V[4,2] = -.0054776138886792
mat T_V[4,3] = -.0075052131640854
mat T_V[4,4] =  .0806284655627814
}
matrix C_V = e(V)
assert mreldif( C_V , T_V ) < 1e-12
_assert_streq ": rowfullnames C_V'"' "cvalue kids traffic _cons"'
_assert_streq ": colfullnames C_V'"' "cvalue kids traffic _cons"'
mat drop C_V T_V


Each assert command checks that what is currently in e() is sufficiently close to the corresponding value stored in e() by mypoisson5. Lines 2–4 check the local macros, lines 6–7 check the scalars, lines 9–20 check e(b), and lines 22–45 check e(V).

I create the script test4.do, which checks this case by replacing the mkassert command in test2.do with the assert commands it created in test3.do; see code block 4.

Code block 4: test4.do

// Test case 1
clear all
use accident3
mypoisson5 accidents cvalue kids traffic

assert "e(cmd)'"'        == "mypoisson5"'
assert "e(predict)'"'    == "mypoisson5_p"'
assert "e(properties)'"' == "b V"'

assert         e(N)    == 505
assert         e(rank) == 4

qui {
mat T_b = J(1,4,0)
mat T_b[1,1] = -.6558870690222316
mat T_b[1,2] = -1.009016972473914
mat T_b[1,3] =  .1467114650851019
mat T_b[1,4] =  .5743541247442324
}
matrix C_b = e(b)
assert mreldif( C_b , T_b ) < 1e-12
_assert_streq ": rowfullnames C_b'"' "y1"'
_assert_streq ": colfullnames C_b'"' "cvalue kids traffic _cons"'
mat drop C_b T_b

qui {
mat T_V = J(4,4,0)
mat T_V[1,1] =  .0049911902167341
mat T_V[1,2] =  .0002953642487161
mat T_V[1,3] = -.0000506909358346
mat T_V[1,4] = -.0089523155601508
mat T_V[2,1] =  .0002953642487161
mat T_V[2,2] =  .0065280055261688
mat T_V[2,3] =  .0002050149836939
mat T_V[2,4] = -.0054776138886792
mat T_V[3,1] = -.0000506909358346
mat T_V[3,2] =  .0002050149836939
mat T_V[3,3] =  .0009844631577381
mat T_V[3,4] = -.0075052131640854
mat T_V[4,1] = -.0089523155601508
mat T_V[4,2] = -.0054776138886792
mat T_V[4,3] = -.0075052131640854
mat T_V[4,4] =  .0806284655627814
}
matrix C_V = e(V)
assert mreldif( C_V , T_V ) < 1e-12
_assert_streq ": rowfullnames C_V'"' "cvalue kids traffic _cons"'
_assert_streq ": colfullnames C_V'"' "cvalue kids traffic _cons"'
mat drop C_V T_V


Every time I run test4.do, it checks that mypoisson5 produces correct results for this one case. The more cases that I verify and certify, the more certain I am that my command works.

I summarize this important process below.

1. I write a do-file, here called test2.do, that produces results for a case in which I have verified that my command produces correct results.
2. At the end of test2.do, I use mkassert to create another do-file, here called test3.do, that contains assert commands for each result that my command stored in e().
3. I replace the mkassert command in test2.do with the commands it created in test3.do to create the certification script, here called test4.do.

This method assumes that I have already verified that my command produces correct results for a specific example. The common case of verification by simulation makes this method more applicable than you might think.

Certifying my command against hand-calculated results

I can almost always find another way to compute estimation results in Stata that should be numerically equivalent. In the Poisson–regression case at hand, I can use gmm. As discussed by Cameron and Trivedi (2005) and Wooldridge (2010), Poisson regression finds the $$\widehat{\betab}$$ that solves the score equations,

$$\sum_{i=1}^N \left[y_i - \exp(\xb_i\widehat{\betab})\right]\xb_i = {\bf 0}$$

We showed how to use the gmm for similar problems in Understanding the generalized method of moments (GMM): A simple example. In example 9, I use gmm, and I use assert to check that the point estimates produced by gmm and mypoisson5 are sufficiently close.

Example 9: Using gmm to certify mypoisson5

. clear all

. use accident3

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66855
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |   .5743541   .2839515     2.02   0.043     .0178194    1.130889
------------------------------------------------------------------------------

. matrix b1 = e(b)

. gmm (accidents - exp({xb:cvalue kids traffic _cons})),   ///
>      instruments(cvalue kids traffic) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  .57041592
Iteration 1:   GMM criterion Q(b) =  .01710408
Iteration 2:   GMM criterion Q(b) =  .00015313
Iteration 3:   GMM criterion Q(b) =  2.190e-08
Iteration 4:   GMM criterion Q(b) =  3.362e-16

note: model is exactly identified

GMM estimation

Number of parameters =   4
Number of moments    =   4
Initial weight matrix: Unadjusted                 Number of obs   =        505

------------------------------------------------------------------------------
|               Robust
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .1094934    -5.99   0.000    -.8704901    -.441284
kids |  -1.009017   .1884791    -5.35   0.000    -1.378429   -.6396047
traffic |   .1467115   .0923401     1.59   0.112    -.0342718    .3276947
_cons |   .5743542   .6039059     0.95   0.342    -.6092797    1.757988
------------------------------------------------------------------------------
Instruments for equation 1: cvalue kids traffic _cons

. matrix b2 = e(b)

. display mreldif(b1, b2)
5.554e-08

. assert mreldif(b1, b2) <1e-7


I used a weak tolerance when comparing the two vectors of point estimates because the commands use different algorithms to find their solutions. If I reduced the convergence tolerance in each command, the solutions would be closer to each other.

For a real certification script, I would also check everything else stored in e() by mypoisson5 against a value computed by gmm. I skip these details to present other methods.

Certifying my command against itself

Almost all estimation commands accept if or in sample restrictions, and these restrictions can usually be tested by comparing other results produced by the same command. Example 10 provides an example.

Example 10: Testing a command against itself

. clear all

. use accident3

. mypoisson5 accidents cvalue kids traffic  if cvalue <=3
Iteration 0:   f(p) = -712.62548
Iteration 1:   f(p) = -540.56297
Iteration 2:   f(p) = -529.54572
Iteration 3:   f(p) = -529.44627
Iteration 4:   f(p) = -529.44618
Iteration 5:   f(p) = -529.44618
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.3646368   .0872777    -4.18   0.000    -.5356979   -.1935756
kids |  -.9874777   .0805708   -12.26   0.000    -1.145394   -.8295618
traffic |   .1488243   .0317338     4.69   0.000     .0866272    .2110214
_cons |   .1081705   .3015328     0.36   0.720    -.4828229    .6991638
------------------------------------------------------------------------------

. matrix b1 = e(b)

. keep if cvalue <=3
(121 observations deleted)

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -712.62548
Iteration 1:   f(p) = -540.56297
Iteration 2:   f(p) = -529.54572
Iteration 3:   f(p) = -529.44627
Iteration 4:   f(p) = -529.44618
Iteration 5:   f(p) = -529.44618
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.3646368   .0872777    -4.18   0.000    -.5356979   -.1935756
kids |  -.9874777   .0805708   -12.26   0.000    -1.145394   -.8295618
traffic |   .1488243   .0317338     4.69   0.000     .0866272    .2110214
_cons |   .1081705   .3015328     0.36   0.720    -.4828229    .6991638
------------------------------------------------------------------------------

. matrix b2 = e(b)

. display mreldif(b1, b2)
0

. assert mreldif(b1, b2) <1e-14


I begin by storing the point estimates obtained from the sample in which cvalue<=3 in b1. Next, I keep only these observations in the sample and use mypoisson5 without an if restriction to compute the point estimates stored in b2. Finally, I assert that b1 and b2 are sufficiently close. In this case, the results are exactly the same, but I only test that they are close because I should not rely on this equality. (I am using Stata/MP, and other jobs on my computer could change the number of processors I effectively have, which can cause the results to change slightly.)

An analogous process works for testing in restrictions and integer weights.

Certifying my command against another Stata command

Sometimes constraining a parameter in the new estimator produces the same results as another estimator already implemented in Stata. For example, a random-effects estimator may reduce to a cross-sectional estimator when the variance of the random-effect is constrained to zero.

In the case at hand, I could check that my command produces the same values as poisson, as shown in example 11.

Example 11: Certifying against an existing command

. clear all

. use accident3

. mypoisson5 accidents cvalue kids traffic
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66855
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538
------------------------------------------------------------------------------
|      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506596
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |   .5743541   .2839515     2.02   0.043     .0178194    1.130889
------------------------------------------------------------------------------

. matrix b1 = e(b)

. poisson accidents cvalue kids traffic

Iteration 0:   log likelihood = -555.86605
Iteration 1:   log likelihood =  -555.8154
Iteration 2:   log likelihood = -555.81538

Poisson regression                              Number of obs     =        505
LR chi2(3)        =     340.20
Prob > chi2       =     0.0000
Log likelihood = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
kids |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
traffic |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
_cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------

. matrix b2 = e(b)

. display mreldif(b1, b2)
1.081e-07

. assert mreldif(b1, b2) <1e-6


Done and undone

I presented some techniques that I use to write certification scripts. A real certification script would cover many more cases. In the next post, I discuss using and creating Mata libraries.

References

Cameron, A. C., and P. K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.

Gould, W. 2001. Statistical software certification. Stata Journal 1: 29–50.

Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.

Categories: Programming Tags: