Archive

Posts Tagged ‘binary’

Using Stata’s random-number generators, part 4, details

For those interested in how pseudo random number generators work, I just wrote something on Statalist which you can see in the Statalist archives by clicking the link even if you do not subscribe:

http://www.stata.com/statalist/archive/2012-10/msg01129.html

To remind you, I’ve been writing about how to use random-number generators in parts 1, 2, and 3, and I still have one more posting I want to write on the subject. What I just wrote on Statalist, however, is about how random-number generators work, and I think you will find it interesting.

To find out more about Statalist, see

Statalist

How to successfully ask a question on Statalist

Using Stata’s random-number generators, part 1

I want to start a series on using Stata’s random-number function. Stata in fact has ten random-number functions:

  1. runiform() generates rectangularly (uniformly) distributed random number over [0,1).
  2. rbeta(a, b) generates beta-distribution beta(a, b) random numbers.
  3. rbinomial(n, p) generates binomial(n, p) random numbers, where n is the number of trials and p the probability of a success.
  4. rchi2(df) generates χ2 with df degrees of freedom random numbers.
  5. rgamma(a, b) generates Γ(a, b) random numbers, where a is the shape parameter and b, the scale parameter.
  6. rhypergeometric(N, K, n) generates hypergeometric random numbers, where N is the population size, K is the number of in the population having the attribute of interest, and n is the sample size.
  7. rnbinomial(n, p) generates negative binomial — the number of failures before the nth success — random numbers, where p is the probability of a success. (n can also be noninteger.)
  8. rnormal(μ, σ) generates Gaussian normal random numbers.
  9. rpoisson(m) generates Poisson(m) random numbers.
  10. rt(df) generates Student’s t(df) random numbers.

You already know that these random-number generators do not really produce random numbers; they produce pseudo-random numbers. This series is not about that, so we’ll be relaxed about calling them random-number generators.

You should already know that you can set the random-number seed before using the generators. That is not required but it is recommended. You set the seed not to obtain better random numbers, but to obtain reproducible random numbers. In fact, setting the seed too often can actually reduce the quality of the random numbers! If you don’t know that, then read help set seed in Stata. I should probably pull out the part about setting the seed too often, expand it, and turn it into a blog entry. Anyway, this series is not about that either.

This series is about the use of random-number generators to solve problems, just as most users usually use them. The series will provide practical advice. I’ll stay away from describing how they work internally, although long-time readers know that I won’t keep the promise. At least I’ll try to make sure that any technical details are things you really need to know. As a result, I probably won’t even get to write once that if this is the kind of thing that interests you, StataCorp would be delighted to have you join our development staff.

 

runiform(), generating uniformly distributed random numbers

Mostly I’m going to write about runiform() because runiform() can solve such a variety of problems. runiform() can be used to solve,

  • shuffling data (putting observations in random order),
  • drawing random samples without replacement (there’s a minor detail we’ll have to discuss because runiform() itself produces values drawn with replacement),
  • drawing random samples with replacement (which is easier to do than most people realize),
  • drawing stratified random samples (with or without replacement),
  • manufacturing fictional data (something teachers, textbook authors, manual writers, and blog writers often need to do).

runiform() generates uniformly, a.k.a. rectangularly distributed, random numbers over the interval, I quote from the manual, “0 to nearly 1″.

Nearly 1? “Why not all the way to 1?” you should be asking. “And what exactly do you mean by nearly 1?”

The answer is that the generator is more useful if it omits 1 from the interval, and so we shaved just a little off. runiform() produces random numbers over [0, 0.999999999767169356].

Here are two useful formulas you should commit to memory.

  1. If you want to generate continuous random numbers between a and b, use

    generate double u = (b-a)*runiform() + a

    The random numbers will not actually be between a and b, they will be between a and nearly b, but the top will be so close to b, namely 0.999999999767169356*b, that it will not matter.

    Remember to store continuous random values as doubles.

  2. If you want to generate integer random numbers between a and b, use

    generate ui = floor((b-a+1)*runiform() + a)

    In particular, do not even consider using the formula for continuous values but rounded to integers, which is to say, round(u) = round((b-a)*runiform() + a). If you use that formula, and if b-a>1, then a and b will be under represented by 50% each in the samples you generate!

    I stored ui as a default float, so I am assuming that -16,777,216 ≤ a < b ≤ 16,777,216. If you have integers outside of that range, however, store as a long or double.

I’m going to spend the rest of this blog entry explaining the above.

First, I want to show you how I got the two formulas and why you must use the second formula for generating integer uniform deviates.

Then I want explain why we shaved a little from the top of runiform(), namely (1) while it wouldn’t matter for formula 1, it made formula 2 a little easier, (2) the code would run more quickly, (3) we could more easily prove that we had implemented the random-number generator correctly, and (4) anyone digging deeper into our random numbers would not be misled into thinking they had more than 32 bits of resolution. That last point will be important in a future blog entry.

 

Continuous uniforms over [a, b)

runiform() produces random numbers over [0, 1). It therefore obviously follows that (b-a)*runiform()+a produces number over [a, b). Substitute 0 for runiform() and the lower limit is obtained. Substitute 1 for runiform() and the upper limit is obtained.

I can tell you that in fact, runiform() produces random numbers over [0, (232-1)/232].

Thus (b-a)*runiform()+a produces random numbers over [a, ((232-1)/232)*b].

(232-1)/232) approximately equals 0.999999999767169356 and exactly equals 1.fffffffeX-01 if you will allow me to use %21x format, which Stata understands and which you can understand if you see my previous blog posting on precision.

Thus, if you are concerned about results being in the interval [a, b) rather than [a, b], you can use the formula

generate double u = ((b-a)*runiform() + a) / 1.fffffffeX-01

There are seven f’s followed by e in the hexadecimal constant. Alternatively, you could type

generate double u = ((b-a)*runiform() + a) * ((2^32-1)/2^32)

but multiplying by 1.fffffffeX-01 is less typing so I’d type that. Actually I wouldn’t type either one; the small difference between values lying in [a, b) or [a, b] is unimportant.

 

Integer uniforms over [a, b]

Whether we produce real, continuous random numbers over [a, b) or [a, b] may be unimportant, but if we want to draw random integers, the distinction is important.

runiform() produces continuous results over [0, 1).

(b-a)*runiform()+a produces continuous results over [a, b).

To produce integer results, we might round continuous results over segments of the number line:

           a    a+.5  a+1  a+1.5  a+2  a+2.5       b-1.5  b-1  b-.5    b
real line  +-----+-----+-----+-----+-----+-----------+-----+-----+-----+
int  line  |<-a->|<---a+1--->|<---a+2--->|           |<---b-1--->|<-b->|

In the diagram above, think of the numbers being produced by the continuous formula u=(b-a)*runiform()+a as being arrayed along the real line. Then imagine rounding those values, say by using Stata’s round(u) function. If you rounded in that way, then

  • Values of u between a and a+0.5 will be rounded to a.
  • Values of u between a+0.5 and a+1.5 will be rounded to a+1.
  • Values of u between a+1.5 and a+2.5 will be rounded to a+2.
  • Values of u between b-1.5 and b-0.5 will be rounded to b-1.
  • Values of u between b-0.5 and b-1 will be rounded to b.

Note that the width of the first and last intervals is half that of the other intervals. Given that u follows the rectangular distribution, we thus expect half as many values rounded to a and to b as to a+1 or a+2 or … or b-1.

And indeed, that is exactly what we would see:

. set obs 100000
obs was 0, now 100000

. gen double u = (5-1)*runiform() + 1

. gen i = round(u)

. summarize u i 

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           u |    100000    3.005933    1.156486   1.000012   4.999983
           i |    100000     3.00489    1.225757          1          5

. tabulate i

          i |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     12,525       12.53       12.53
          2 |     24,785       24.79       37.31
          3 |     24,886       24.89       62.20
          4 |     25,284       25.28       87.48
          5 |     12,520       12.52      100.00
------------+-----------------------------------
      Total |    100,000      100.00

To avoid the problem we need to make the widths of all the intervals equal, and that is what the formula floor((b-a+1)*runiform() + a) does.

           a          a+1         a+2                     b-1          b          b+1
real line  +-----+-----+-----+-----+-----------------------+-----+-----+-----+-----+
int  line  |<--- a --->|<-- a+1 -->|                       |<-- b-1 -->|<--- b --->)

Our intervals are of equal width and thus we expect to see roughly the same number of observations in each:

. gen better = floor((5-1+1)*runiform() + 1)

. tabulate better

     better |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     19,808       19.81       19.81
          2 |     20,025       20.02       39.83
          3 |     19,963       19.96       59.80
          4 |     20,051       20.05       79.85
          5 |     20,153       20.15      100.00
------------+-----------------------------------
      Total |    100,000      100.00

So now you know why we shaved a little off the top when we implemented runiform(); it made the formula

floor((b-a+1)*runiform() + a):

easier. Our integer [a, b] formula did not have to concern itself that runiform() would sometimes — rarely — return 1. If runiform() did return the occasional 1, the simple formula above would produce the (correspondingly occasional) b+1.

 

How Stata calculates continuous random numbers

I’ve said that we shaved a little off the top, but the fact was that it was easier for us to do the shaving than not.

runiform() is based on the KISS random number generator. KISS produces 32-bit integers, meaning integers the range [0, 232-1], or [0, 4,294,967,295]. You might wonder how we converted that range to being continuous over [0, 1).

Start by thinking of the number KISS produces in its binary form:

b31b30b29b28b27b26b25b24b23b22b21b20b19b18b17b16b15b14b13b12b11b10b9b8b7b6b5b4b3b2b1b0

The corresponding integer is b31*231 + b31*230 + … + b0*20. All we did was insert a binary point out front:

. b31b30b29b28b27b26b25b24b23b22b21b20b19b18b17b16b15b14b13b12b11b10b9b8b7b6b5b4b3b2b1b0

making the real value b31*2-1 + b30*2-2 + … + b0*2-32. Doing that is equivalent to dividing by 2-32, except insertion of the binary point is faster. Nonetheless, if we had wanted runiform() to produce numbers over [0, 1], we could have divided by 232-1.

Anyway, if the KISS random number generator produced 3190625931, which in binary is

10111110001011010001011010001011

we converted that to

0.10111110001011010001011010001011

which equals 0.74287549 in base 10.

The largest number the KISS random number generator can produce is, of course,

11111111111111111111111111111111

and 0.11111111111111111111111111111111 equals 0.999999999767169356 in base 10. Thus, the runiform() implementation of KISS generates random numbers in the range [0, 0.999999999767169356].

I could have presented all of this mathematically in base 10: KISS produces integers in the range [0, 232-1], and in runiform() we divide by 232 to thus produce continuous numbers over the range [0, (232-1)/232]. I could have said that, but it loses the flavor and intuition of my longer explanation, and it would gloss over the fact that we just inserted the binary point. If I asked you, a base-10 user, to divide 232 by 10, you wouldn’t actually divide in the same way that they would divide by, say 9. Dividing by 9 is work. Dividing by 10 merely requires shifting the decimal point. 232 divided by 10 is obviously 23.2. You may not have realized that modern digital computers, when programmed by “advanced” programmers, follow similar procedures.

Oh gosh, I do get to say it! If this sort of thing interests you, consider a career at StataCorp. We’d love to have you.

 

Is it important that runiform() values be stored as doubles?

Sometimes it is important. It’s obviously not important when you are generating random integers using floor((b-a+1)*runiform() + a) and -16,777,216 ≤ a < b ≤ 16,777,216. Integers in that range fit into a float without rounding.

When creating continuous values, remember that runiform() produces 32 bits. floats store 23 bits and doubles store 52, so if you store the result of runiform() as a float, it will be rounded. Sometimes the rounding matters, and sometimes it does not. Next time, we will discuss drawing random samples without replacement. In that case, the rounding matters. In most other cases, including drawing random samples with replacement — something else for later — the rounding
does not matter. Rather than thinking hard about the issue, I store all my non-integer
random values as doubles.

 

Tune in for the next episode

Yes, please do tune in for the next episode of everything you need to know about using random-number generators. As I already mentioned, we’ll discuss drawing random samples without replacement. In the third installment, I’m pretty sure we’ll discuss random samples with replacement. After that, I’m a little unsure about the ordering, but I want to discuss oversampling of some groups relative to others and, separately, discuss the manufacturing of fictional data.

Am I forgetting something?

The Penultimate Guide to Precision

There have recently been occasional questions on precision and storage types on Statalist despite all that I have written on the subject, much of it posted in this blog. I take that as evidence that I have yet to produce a useful, readable piece that addresses all the questions researchers have.

So I want to try again. This time I’ll try to write the ultimate piece on the subject, making it as short and snappy as possible, and addressing every popular question of which I am aware — including some I haven’t addressed before — and doing all that without making you wade with me into all the messy details, which I know I have a tendency to do.

I am hopeful that from now on, every question that appears on Statalist that even remotely touches on the subject will be answered with a link back to this page. If I succeed, I will place this in the Stata manuals and get it indexed online in Stata so that users can find it the instant they have questions.

What follows is intended to provide everything scientific researchers need to know to judge the effect of storage precision on their work, to know what can go wrong, and to prevent that. I don’t want to raise expectations too much, however, so I will entitle it …

THE PENULTIMATE GUIDE TO PRECISION

  1. Contents

     1. Numeric types
    2. Floating-point types
    3. Integer types
    4. Integer precision
    5. Floating-point precision
    6. Advice concerning 0.1, 0.2, …
    7. Advice concerning exact data, such as currency data
    8. Advice for programmers
    9. How to interpret %21x format (if you care)
    10. Also see

  2. Numeric types

    1.1 Stata provides five numeric types for storing variables, three of them integer types and two of them floating point.

    1.2 The floating-point types are float and double.

    1.3 The integer types are byte, int, and long.

    1.4 Stata uses these five types for the storage of data.

    1.5 Stata makes all calculations in double precision (and sometimes quad precision) regardless of the type used to store the data.

  3. Floating-point types

    2.1 Stata provides two IEEE 754-2008 floating-point types: float and double.

    2.2 float variables are stored in 4 bytes.

    2.3 double variables are stored in 8 bytes.

    2.4 The ranges of float and double variables are

         Storage
         type             minimum                maximum
         -----------------------------------------------------
         float     -3.40282346639e+ 38      1.70141173319e+ 38
         double    -1.79769313486e+308      8.98846567431e+307
         -----------------------------------------------------
         In addition, float and double can record missing values 
         ., .a, .b, ..., .z.

    The above values are approximations. For those familiar with %21x floating-point hexadecimal format, the exact values are

         Storage
         type                   minimum                maximum
         ------------------------------------------------------- 
         float   -1.fffffe0000000X+07f     +1.fffffe0000000X+07e 
         double  -1.fffffffffffffX+3ff     +1.fffffffffffffX+3fe
         -------------------------------------------------------

    Said differently, and less precisely, float values are in the open interval (-2128, 2127), and double values are in the open interval (-21024, 21023). This is less precise because the intervals shown in the tables are closed intervals.

  4. Integer types

    3.1 Stata provides three integer storage formats: byte, int, and long. They are 1 byte, 2 bytes, and 4 bytes, respectively.

    3.2 Integers may also be stored in Stata’s IEEE 754-2008 floating-point storage formats float and double.

    3.3 Integer values may be stored precisely over the ranges

         storage
         type                   minimum                 maximum
         ------------------------------------------------------
         byte                      -127                     100
         int                    -32,767                  32,740
         long            -2,147,483,647           2,147,483,620
         ------------------------------------------------------
         float              -16,777,216              16,777,216
         double  -9,007,199,254,740,992   9,007,199,254,740,992
         ------------------------------------------------------
         In addition, all storage types can record missing values
         ., .a, .b, ..., .z.

    The overall ranges of float and double were shown in (2.4) and are wider than the ranges for them shown here. The ranges shown here are the subsets of the overall ranges over which no rounding of integer values occurs.

  5. Integer precision

    4.1 (Automatic promotion.) For the integer storage types — for byte, int, and long — numbers outside the ranges listed in (3.3) would be stored as missing (.) except that storage types are promoted automatically. As necessary, Stata promotes bytes to ints, ints to longs, and longs to doubles. Even if a variable is a byte, the effective range is still [-9,007,199,254,740,992, 9,007,199,254,740,992] in the sense that you could change a value of a byte variable to a large value and that value would be stored correctly; the variable that was a byte would, as if by magic, change its type to int, long, or double if that were necessary.

    4.2 (Data input.) Automatic promotion (4.1) applies after the data are input/read/imported/copied into Stata. When first reading, importing, copying, or creating data, it is your responsibility to choose appropriate storage types. Be aware that Stata’s default storage type is float, so if you have large integers, it is usually necessary to specify explicitly the types you wish to use.

    If you are unsure of the type to specify for your integer variables, specify double. After reading the data, you can use compress to demote storage types. compress never results in a loss of precision.

    4.3 Note that you can use the floating-point types float and double to store integer data.

    4.3.1 Integers outside the range [-2,147,483,647, 2,147,483,620] must be stored as doubles if they are to be precisely recorded.

    4.3.2 Integers can be stored as float, but avoid doing that unless you are certain they will be inside the range [-16,777,216, 16,777,216] not just when you initially read, import, or copy them into Stata, but subsequently as you make transformations.

    4.3.3 If you read your integer data as floats, and assuming they are within the allowed range, we recommend that you change them to an integer type. You can do that simply by typing compress. We make that recommendation so that your integer variables will benefit from the automatic promotion described in (4.1).

    4.4 Let us show what can go wrong if you do not follow our advice in (4.3). For the floating-point types — for float and double — integer values outside the ranges listed in (3.3) are rounded.

    Consider a float variable, and remember that the integer range for floats is [-16,777,216, 16,777,216]. If you tried to store a value outside the range in the variable — say, 16,777,221 — and if you checked afterward, you would discover that actually stored was 16,777,220! Here are some other examples of rounding:

         desired value                            stored (rounded)
         to store            true value             float value 
         ------------------------------------------------------
         maximum             16,777,216              16,777,216 
         maximum+1           16,777,217              16,777,216
         ------------------------------------------------------
         maximum+2           16,777,218              16,777,218
         ------------------------------------------------------
         maximum+3           16,777,219              16,777,220
         maximum+4           16,777,220              16,777,220
         maximum+5           16,777,221              16,777,220
         ------------------------------------------------------
         maximum+6           16,777,222              16,777,222
         ------------------------------------------------------
         maximum+7           16,777,223              16,777,224
         maximum+8           16,777,224              16,777,224
         maximum+9           16,777,225              16,777,224
         ------------------------------------------------------
         maximum+10          16,777,226              16,777,226
         ------------------------------------------------------

    When you store large integers in float variables, values will be rounded and no mention will be made of that fact.

    And that is why we say that if you have integer data that must be recorded precisely and if the values might be large — outside the range ±16,777,216 — do not use float. Use long or use double; or just use the compress command and let automatic promotion handle the problem for you.

    4.5 Unlike byte, int, and long, float and double variables are not promoted to preserve integer precision.

    Float values are not promoted because, well, they are not. Actually, there is a deep reason, but it has to do with the use of float variables for their real purpose, which is to store non-integer values.

    Double values are not promoted because there is nothing to promote them to. Double is Stata’s most precise storage type. The largest integer value Stata can store precisely is 9,007,199,254,740,992 and the smallest is -9,007,199,254,740,992.

    Integer values outside the range for doubles round in the same way that float values round, except at absolutely larger values.

  6. Floating-point precision

    5.1 The smallest, nonzero value that can be stored in float and double is

         Storage
         type      value          value in %21x         value in base 10
         -----------------------------------------------------------------
         float     ±2^-127    ±1.0000000000000X-07f   ±5.877471754111e-039
         double    ±2^-1022   ±1.0000000000000X-3fe   ±2.225073858507e-308
         -----------------------------------------------------------------

    We include the value shown in the third column, the value in %21x, for those who know how to read it. It is described in (9), but it is unimportant. We are merely emphasizing that these are the smallest values for properly normalized numbers.

    5.2 The smallest value of epsilon such that 1+epsilon ≠ 1 is

         Storage
         type      epsilon       epsilon in %21x        epsilon in base 10
         -----------------------------------------------------------------
         float      ±2^-23     ±1.0000000000000X-017    ±1.19209289551e-07
         double     ±2^-52     ±1.0000000000000X-034    ±2.22044604925e-16
         -----------------------------------------------------------------

    Epsilon is the distance from 1 to the next number on the floating-point number line. The corresponding unit roundoff error is u = ±epsilon/2. The unit roundoff error is the maximum relative roundoff error that is introduced by the floating-point number storage scheme.

    The smallest value of epsilon such that x+epsilon ≠ x is approximately |x|*epsilon, and the corresponding unit roundoff error is ±|x|*epsilon/2.

    5.3 The precision of the floating-point types is, depending on how you want to measure it,

         Measurement                           float              double
         ----------------------------------------------------------------
         # of binary digits                       23                  52
         # of base 10 digits (approximate)         7                  16 
    
         Relative precision                   ±2^-24              ±2^-53
         ... in base 10 (approximate)      ±5.96e-08           ±1.11e-16
         ----------------------------------------------------------------

    Relative precision is defined as

                           |x - x_as_stored|
                  ± max   ------------------    
                     x            x

    performed using infinite precision arithmetic, x chosen from the subset of reals between the minimum and maximum values that can be stored. It is worth appreciating that relative precision is a worst-case relative error over all possible numbers that can be stored. Relative precision is identical to roundoff error, but perhaps this definition is easier to appreciate.

    5.4 Stata never makes calculations in float precision, even if the data are stored as float.

    Stata makes double-precision calculations regardless of how the numeric data are stored. In some cases, Stata internally uses quad precision, which provides approximately 32 decimal digits of precision. If the result of the calculation is being stored back into a variable in the dataset, then the double (or quad) result is rounded as necessary to be stored.

    5.5 (False precision.) Double precision is 536,870,912 times more accurate than float precision. You may worry that float precision is inadequate to accurately record your data.

    Little in this world is measured to a relative accuracy of ±2-24, the accuracy provided by float precision.

    Ms. Smith, it is reported, made $112,293 this year. Do you believe that is recorded to an accuracy of ±2-24*112,293, or approximately ±0.7 cents?

    David was born on 21jan1952, so on 27mar2012 he was 21,981 days old, or 60.18 years old. Recorded in float precision, the precision is ±60.18*2-24, or roughly ±1.89 minutes.

    Joe reported that he drives 12,234 miles per year. Do you believe that Joe’s report is accurate to ±12,234*2-24, equivalent to ±3.85 feet?

    A sample of 102,400 people reported that they drove, in total, 1,252,761,600 miles last year. Is that accurate to ±74.7 miles (float precision)? If it is, each of them is reporting with an accuracy of roughly ±3.85 feet.

    The distance from the Earth to the moon is often reported as 384,401 kilometers. Recorded as a float, the precision is ±384,401*2-24, or ±23 meters, or ±0.023 kilometers. Because the number was not reported as 384,401.000, one would assume float precision would be accurate to record that result. In fact, float precision is more than sufficiently accurate to record the distance because the distance from the Earth to the moon varies from 356,400 to 406,700 kilometers, some 50,300 kilometers. The distance would have been better reported as 384,401 ±25,150 kilometers. At best, the measurement 384,401 has relative accuracy of ±0.033 (it is accurate to roughly two digits).

    Nonetheless, a few things have been measured with more than float accuracy, and they stand out as crowning accomplishments of mankind. Use double as required.

  7. Advice concerning 0.1, 0.2, …

    6.1 Stata uses base 2, binary. Popular numbers such as 0.1, 0.2, 100.21, and so on, have no exact binary representation in a finite number of binary digits. There are a few exceptions, such as 0.5 and 0.25, but not many.

    6.2 If you create a float variable containing 1.1 and list it, it will list as 1.1 but that is only because Stata’s default display format is %9.0g. If you changed that format to %16.0g, the result would appear as 1.1000000238419.

    This scares some users. If this scares you, go back and read (5.5) False Precision. The relative error is still a modest ±2-24. The number 1.1000000238419 is likely a perfectly acceptable approximation to 1.1 because the 1.1 was never measured to an accuracy of less than ±2-24 anyway.

    6.3 One reason perfectly acceptable approximations to 1.1 such as 1.1000000238419 may bother you is that you cannot select observations containing 1.1 by typing if x==1.1 if x is a float variable. You cannot because the 1.1 on the right is interpreted as double precision 1.1. To select the observations, you have to type if x==float(1.1).

    6.4 If this bothers you, record the data as doubles. It is best to do this at the point when you read the original data or when you make the original calculation. The number will then appear to be 1.1. It will not really be 1.1, but it will have less relative error, namely, ±2-53.

    6.5 If you originally read the data and stored them as floats, it is still sometimes possible to recover the double-precision accuracy just as if you had originally read the data into doubles. You can do this if you know how many decimal digits were recorded after the decimal point and if the values are within a certain range.

    If there was one digit after the decimal point and if the data are in the range [-1,048,576, 1,048,576], which means the values could be -1,048,576, -1,048,575.9, …, -1, 0, 1, …, 1,048,575.9, 1,048,576, then typing

    . gen double y = round(x*10)/10

    will recover the full double-precision result. Stored in y will be the number in double precision just as if you had originally read it that way.

    It is not possible, however, to recover the original result if x is outside the range ±1,048,576 because the float variable contains too little information.

    You can do something similar when there are two, three, or more decimal digits:

         # digits to
         right of 
         decimal pt.   range     command
         -----------------------------------------------------------------
             1      ±1,048,576   gen double y = round(x*10)/10
             2      ±  131,072   gen double y = round(x*100)/100
             3      ±   16,384   gen double y = round(x*1000)/1000
             4      ±    1,024   gen double y = round(x*10000)/10000
             5      ±      128   gen double y = round(x*100000)/100000
             6      ±       16   gen double y = round(x*1000000)/1000000
             7      ±        1   gen double y = round(x*10000000)/10000000
         -----------------------------------------------------------------

    Range is the range of x over which command will produce correct results. For instance, range = ±16 in the next-to-the-last line means that the values recorded in x must be -16 ≤ x ≤ 16.

  8. Advice concerning exact data, such as currency data

    7.1 Yes, there are exact data in this world. Such data are usually counts of something or are currency data, which you can think of as counts of pennies ($0.01) or the smallest unit in whatever currency you are using.

    7.2 Just because the data are exact does not mean you need exact answers. It may still be that calculated answers are adequate if the data are recorded to a relative accuracy of ±2-24 (float). For most analyses — even of currency data — this is often adequate. The U.S. deficit in 2011 was $1.5 trillion. Stored as a float, this amount has a (maximum) error of ±2-24*1.5e+12 = ±$89,406.97. It would be difficult to imagine that ±$89,406.97 would affect any government decision maker dealing with the full $1.5 trillion.

    7.3 That said, you sometimes do need to make exact calculations. Banks tracking their accounts need exact amounts. It is not enough to say to account holders that we have your money within a few pennies, dollars, or hundreds of dollars.

    In that case, the currency data should be converted to integers (pennies) and stored as integers, and then processed as described in (4). Assuming the dollar-and-cent amounts were read into doubles, you can convert them into pennies by typing

    . replace x = x*100

    7.4 If you mistakenly read the currency data as a float, you do not have to re-read the data if the dollar amounts are between ±$131,072. You can type

    . gen double x_in_pennies = round(x*100)

    This works only if x is between ±131,072.

  9. Advice for programmers

    8.1 Stata does all calculations in double (and sometimes quad) precision.

    Float precision may be adequate for recording most data, but float precision is inadequate for performing calculations. That is why Stata does all calculations in double precision. Float precision is also inadequate for storing the results of intermediate calculations.

    There is only one situation in which you need to exercise caution — if you create variables in the data containing intermediate results. Be sure to create all such variables as doubles.

    8.2 The same quad-precision routines StataCorp uses are available to you in Mata; see the manual entries [M-5] mean, [M-5] sum, [M-5] runningsum, and [M-5] quadcross. Use them as you judge necessary.

  10. How to interpret %21x format (if you care)

    9.1 Stata has a display format that will display IEEE 754-2008 floating-point numbers in their full binary glory but in a readable way. You probably do not care; if so, skip this section.

    9.2 IEEE 754-2008 floating-point numbers are stored as a pair of numbers (a, b) that are given the interpretation

    z = a * 2b

    where -2 < a < 2. In double precision, a is recorded with 52 binary digits. In float precision, a is recorded with 23 binary digits. For example, the number 2 is recorded in double precision as

    a = +1.0000000000000000000000000000000000000000000000000000
    b = +1

    The value of pi is recorded as

    a = +1.1001001000011111101101010100010001000010110100011000
    b = +1

    9.3 %21x presents a and b in base 16. The double-precision value of 2 is shown in %21x format as

    +1.0000000000000X+001

    and the value of pi is shown as

    +1.921fb54442d18X+001

    In the case of pi, the interpretation is

    a = +1.921fb54442d18 (base 16)
    b = +001             (base 16)

    Reading this requires practice. It helps to remember that one-half corresponds to 0.8 (base 16). Thus, we can see that a is slightly larger than 1.5 (base 10) and b = 1 (base 10), so _pi is something over 1.5*21 = 3.

    The number 100,000 in %21x is

    +1.86a0000000000X+010

    which is to say

    a = +1.86a0000000000 (base 16)
    b = +010             (base 16)

    We see that a is slightly over 1.5 (base 10), and b is 16 (base 10), so 100,000 is something over 1.5*216 = 98,304.

    9.4 %21x faithfully presents how the computer thinks of the number. For instance, we can easily see that the nice number 1.1 (base 10) is, in binary, a number with many digits to the right of the binary point:

    . display %21x 1.1
    +1.199999999999aX+000

    We can also see why 1.1 stored as a float is different from 1.1 stored as a double:

    . display %21x float(1.1)
    +1.19999a0000000X+000

    Float precision assigns fewer digits to the mantissa than does double precision, and 1.1 (base 10) in base 16 is a repeating hexadecimal.

    9.5 %21x can be used as an input format as well as an output format. For instance, Stata understands

    . gen x = 1.86ax+10

    Stored in x will be 100,000 (base 10).

    9.6 StataCorp has seen too many competent scientific programmers who, needing a perturbance for later use in their program, code something like

    epsilon = 1e-8

    It is worth examining that number:

    . display %21x 1e-8
    +1.5798ee2308c3aX-01b

    That is an ugly number that can only lead to the introduction of roundoff error in their program. A far better number would be

    epsilon = 1.0x-1b

    Stata and Mata understand the above statement because %21x may be used as input as well as output. Naturally, 1.0x-1b looks just like what it is,

    . display %21x 1.0x-1b
    +1.0000000000000X-01b

    and all those pretty zeros will reduce numerical roundoff error.

    In base 10, the pretty 1.0x-1b looks like

    . display %20.0g 1.0x-1b
    7.4505805969238e-09

    and that number may not look pretty to you, but you are not a base-2 digital computer.

    Perhaps the programmer feels that epsilon really needs to be closer to 1e-8. In %21x, we see that 1e-8 is +1.5798ee2308c3aX-01b, so if we want to get closer, perhaps we use

    epsilon = 1.6x-1b

    9.7 %21x was invented by StataCorp.

  11. Also see

    If you wish to learn more, see

    How to read the %21x format

    How to read the %21x format, part 2

    Precision (yet again), Part I

    Precision (yet again), Part II

Precision (yet again), Part II

In part I, I wrote about precision issues in English. If you enjoyed that, you may want to stop reading now, because I’m about to go into the technical details. Actually, these details are pretty interesting.

For instance, I offered the following formula for calculating error due to float precision:

maximum_error = 2-24 X

I later mentioned that the formula is an approximation, and said that the true formula is,

maximum_error = 2-24 2floor(log2 X)

I didn’t explain how I got either formula.

I need to be more precise today than I was in my previous posting. For instance, I previously used x for two concepts, the true value and the rounded-after-storage value. Today I need to distinguish those concepts.

X is the true value.

x is the value after rounding due to storage.

The issue is the difference between x and X when X is stored in 24-binary-digit float precision.

Base 10

Although I harp on the value of learning to think in binary and hexadecimal, I admit that I, too, find it easier to think in base 10. So let’s start that way.

Say we record numbers to two digits of accuracy, which I will call d=2. Examples of d=2 numbers include

 1.0
 1.6
12
47
52*10^1 (i.e, 520, but with only two significant digits)

To say that we record numbers to two digits of accuracy is to say that, coming upon the recorded number 1, we know only that the number lies between 0.95 and 1.05; or coming upon 12, that the true number lies between 11.5 and 12.5, and so on. I assume that numbers are rounded efficiently, which is to say, stored values record midpoints of intervals.

Before we get into the math, let me note that most us would be willing to say that numbers recorded this way are accurate to 1 part in 10 or, if d=3, to 1 part in 100. If numbers are accurate to 1 part in 10^(d-1), then couldn’t we must multiply the number by 1/(10^(d-1)) to obtain the width of the interval? Let’s try:

Assume X=520 and d=2. Then 520/(10^(2-1)) = 52. The true interval, however, is (515, 525] and it has width 10. So the simple formula does not work.

The simple formula does not work yet I presented its base-2 equivalent in Part 1 and I even recommended its use! We will get to that. It turns out the smaller the base, the more accurately the simple formula approximates the true formula, but before I can show that, I need the true formula.

Let’s start by thinking about d=1.

  1. The recorded number 0 will contain all numbers between [-0.5, 0.5). The recorded number 1 will contain all numbers between [0.5, 1.5), and so on. For 0, 1, …, 9, the width of the intervals is 1.

  2. The recorded number 10 will contain all numbers between [5, 15). The recorded number 20 will contain all numbers between [15, 25), and so on, For 10, 20, …, 90, the width of the intervals is 10.

The derivation for the width of interval goes like this:

  1. If we recorded the value of X to one decimal digit, the recorded digit will will be b, the recorded value will be x = b*10p, and the power of ten will be p = floor(log10X). More importantly, W1 = 10p will be the width of the interval containing X.

  2. It therefore follows that if we recorded the value of X to two decimal digits, the interval length would be W2 = W1/10. What ever the width with one digit, adding another must reduce width by one-tenth.

  3. If we recorded the value of X to three decimal digits, the interval length would be W3 = W2/10.

  4. Thus, if d is the number of digits to which numbers are recorded, the width of the interval is 10p where p = floor(log10X) – (d-1).

The above formula is exact.

Base 2

Converting the formula

interval_width = 10floor(log10X)-(d-1)

from base 10 to base 2 is easy enough:

interval_width = 2floor(log2X)-(d-1)

In Part 1, I presented this formula for d=24 as

maximum_error = 2floor(log2X)-24 = 2 -24 2floor(log2 X)

In interval_width, it is d-1 and not d that appears in the formula. You might think I made an error and should have put -23 where I put -24 in the maximum_error formula. There is no mistake. In Part 1, the maximum error was defined as a plus-or-minus quantity and is thus half the width of the overall interval. So I divided by 2, and in effect, I did put -23 into the maximum_error formula, at least before I subracted one more from it, making it -24 again.

I started out this posting by considering and dismissing the base-10 approximation formula

interval_width = 10-(d-1) X

which in maximum-error units is

maximum_error = 10-d X

and yet in Part 1, I presented — and even recommended — its base-2, d=24 equivalent,

maximum_error = 2-24 X

It turns out that the approximation formula is not as inaccurate in base 2 and it would be in base 10. The correct formula,

maximum_error = 2floor(log2X)-d

can be written

maximum_error = 2-d 2floor(log2X

so the question becomes about the accuracy of substituting X for 2^floor(log2X). We know by examination that X ≥ 2^floor(log2X), so making the substitution will overstate the error and, in that sense, is a safe thing to do. The question becomes how much the error is overstated.

X can be written 2^(log2X) and thus we need to compare 2^(log2X) with 2^floor(log2X). The floor() function cannot reduce its argument by more than 1, and thus 2^(log2X) cannot differ from 2^floor(log2X) by more than a factor of 2. Under the circumstances, this seems a reasonable approximation.

In the case of base 10, the the floor() function reducing its argument by up to 1 results in a decrease of up to a factor of 10. That, it seems to me, is not a reasonable amount of error.

Categories: Numerical Analysis Tags: ,

Precision (yet again), Part I

I wrote about precision here and here, but they were pretty technical.

“Great,” coworkers inside StataCorp said to me, “but couldn’t you explain these issues in a way that doesn’t get lost in the details of how computers store binary and maybe, just maybe, write about floats and doubles from a user’s perspective instead of programmer’s perspective?”

“Mmmm,” I said clearly.

Later, when I tried, I liked the result. It contains new material, too. What follows is what I now wish I had written first. I’d would have still written the other two postings, but as technical appendices.

In Part 2 (forthcoming), I provide the mathematical derivations underlying what follows. There are a few interesting issues underlying what follows.

Please excuse the manualish style of what follows, but I suspect that what follows will eventually work its way into Stata’s help files or manuals, so I wrote it that way.


Syntax

Problem:

. generate x = 1.1

. list
  (Stata displays output showing x is 1.1 in all observations)

. count if x==1.1
     0

Solution 1:

. count if x==float(1.1)
   100

Solution 2:

. generate double x = 1.1

. count if x==1.1
  100

Solution 3:

. set type double

. generate x = 1.1

. count if x==1.1
  100

Description

Stata works in binary. Stata stores data in float precision by default. Stata preforms all calculations in double precision. Sometimes the combination results in surprises until you think more carefully about what happened.

Remarks

Remarks are presented under the headings

Summary
Why count==1.1 produces 0
How count==float(1.1) solves the problem
How storing data as double appears to solve the problem (and does)
Float is plenty accurate to store most data
Why don’t I have the problems using Excel?

Summary

Justifications for all statements made appear in the sections below. In summary,

  1. It sometimes appears that Stata is inaccurate. That is not true and, in fact, the appearance of inaccuracy happens in part because Stata is so accurate.

  2. You can cover up this appearance of inaccuracy by storing all your data in double precision. This will double or more the size of your dataset, and so I do not recommend the double-precision solution unless your dataset is small relative to the amount of memory on your computer. In that case, there is nothing wrong with storing all your data in double precision.

    The easiest way to implement the double-precision solution is by typing set type double. After that, Stata will default to to creating all new variables as doubles, at least for the remainder of the session. If all your datasets are small relative to the amount of memory on your computer, you can set type double, permanently.

  3. The double-precision solution is needlessly wasteful of memory. It is difficult to imagine data that are accurate to more than float precision. Regardless of how your data are stored, Stata does all calculations in double precision, and sometimes in quad precision.

  4. The issue of 1.1 not being equal to 1.1 arises only with “nice” decimal numbers. You just have to remember to use Stata’s float() function when dealing with such numbers.

Why count x==1.1 produces 0

Let’s trace through what happens when you type the commands

. generate x = 1.1

. count if x==1.1
     0

Here is how it works:

  1. Some numbers have no exact finite-digit binary representation just as some numbers have no exact finite-digit decimal representation. One-third, 0.3333… (base 10), is an example of a number with no exact finite-digit decimal representation. In base 12, one-third does have an exact finite-digit representation, namely 0.4 (base 12). In base 2 (binary), base 10 numbers such as 0.1, 0.2, 0.3, 0.4, 0.6, … have no exact finite-digit representation.

  2. Computers store numbers with a finite number of binary digits. In float precision, numbers have 24 binary digits. In double precision, they have 53 binary digits.

    The decimal number 1.1 in binary is 1.000110011001… (base 2). The 1001 on the end repeats forever. Thus, 1.1 (base 10) is stored by a computer as

     1.00011001100110011001101
    

    in float, or as

     1.0001100110011001100110011001100110011001100110011010
    

    in double. There are 24 and 53 digits in the numbers above.

  3. Typing generate x = 1.1 results in 1.1 being interpreted as the longer binary number Stata performs all calculations in double precision. New variable x is created as a float by default. When the more precise number is stored in x, it is rounded to the shorter number.

  4. Thus when you count if x==1.1 the result is 0 because 1.1 is again interpreted as the longer binary number and the longer number is compared to shorter number stored in x, and they are not equal.

How count x==float(1.1) solves the problem

One way to fix the problem is to change count if x==1.1 to read count if x==float(1.1):

. generate x = 1.1

. count if x==float(1.1)
   100

Function float() rounds results to float precision. When you type float(1.1), the 1.1 is converted to binary, double precision, namely,

 1.0001100110011001100110011001100110011001100110011010 (base 2)

and float() then rounds that long binary number to

 1.00011001100110011001101 (base 2)

or more correctly, to

 1.0001100110011001100110000000000000000000000000000000 (base 2)

because the number is still stored in double precision. Regardless, this new value is equal to the value stored in x, and so count reports that 100 observations contain float(1.1).

As an aside, when you typed generate x = 1.1, Stata acted as if you typed generate x = float(1.1). Whenever you type generate x = … and x is a float, Stata acts if if you typed generate x = float(…).

How storing data as double appears to solve the problem (and does)

When you type

. generate double x = 1.1

. count if x==1.1
   100

it should be pretty obvious how the problem was solved. Stata stores

1.0001100110011001100110011001100110011001100110011010 (base 2)

in x, and then compares the stored result to

1.0001100110011001100110011001100110011001100110011010 (base 2)

and of course they are equal.

In the Summary above, I referred to this as a cover up. It is a cover up because 1.1 (base 10) is not what is stored in x. What is stored in x is the binary number just shown, and to be equal to 1.1 (base 10), the binary number needs to suffixed with 1001, and then another 1001, and then another, and so on without end.

Stata tells you that x is equal to 1.1 because Stata converted the 1.1 in count to the same inexact binary representation as Stata previously stored in x, and those two values are equal, but neither is equal to 1.1 (base 10). This leads to an important property of digital computers:

If storage and calculation are done to the same precision, it will appear to the user as if all numbers that the user types are stored without error.

That is, it appears to you as if there is no inaccuracy in storing 1.1 in x when x is a double because Stata performs calculations in double. And it is equally true that it would appear to you as if there were no accuracy issues storing 1.1 when x is stored in float precision if Stata, observing that x is float, performed calculations involving x in float. The fact is that there are accuracy issues in both cases.

“Wait,” you are probably thinking. “I understand your argument, but I’ve always heard that float is inaccurate and double is accurate. I understand from your argument that it is only a matter of degree but, in this case, those two degrees are on opposite sides of an important line.”

“No,” I respond.

What you have heard is right with respect to calculation. What you have heard might apply to data storage too, but that is unlikely. It turns out that float provides plenty of precision to store most real measurements.

Float is plenty accurate to store most data

The misconception that float precision is inaccurate comes from the true statement that float precision is not accurate enough when it comes to making calculations with stored values. Whether float precision is accurate enough for storing values depends solely on the accuracy with which the values are measured.

Float precision provides 24 base-2 (binary) digits, and thus values stored in float precision have a maximum relative error error of plus-or-minus 2^(-24) = 5.96e-08, or less than +/-1 part in 15 million.

  1. The U.S. deficit in 2011 is projected to be $1.5 trillion. Stored as a float, the number has a (maximum) error of 2^(-24) * 1.5e+12 = $89,407. That is, if the true number is 1.5 trillion, the number recorded in float precision is guaranteed to be somewhere in the range [(1.5e+12)-89,407, (1.5e+14)+89,407]. The projected U.S. deficit is not known to an accuracy of +/-$89,407.

  2. People in the US work about 40 hours per week, or roughly 0.238 of the hours in the week. 2^(-24) * 0.238 = 1.419e-09 of a week, or 0.1 milliseconds. Time worked in a week is not known to an accuracy of +/-0.1 milliseconds.

  3. A cancer survivor might live 350 days. 2^(-24) * 350 = .00002086, or 1.8 seconds. Time of death is rarely recorded to an accuracy of +/-1.8 seconds. Time of diagnosis is never recorded to such accuracy, nor could it be.

  4. The moon is said to be 384,401 kilometers from the Earth. 2^(-24) * 348,401 = 0.023 kilometers, or 23 meters. At its closest and farthest, the moon is 356,400 and 406,700 kilometers from Earth.

  5. Most fundamental constants of the universe are known to a few parts in a million, which is to say, less than 1 part in 15 million, the accuracy float precision can provide. An exception is the speed of light, measured to be 299,793.458 kilometers per second. Record that as a float and you will be off by 0.01 km/s.

In all the examples except the last, quoted are worst-case scenarios. The actual errors depend on the exact number and is a more tedious calculation (not shown):

  1. For the U.S. deficit, the exact error for 1.5 trillion is -$26,624, which is within the plus or minus $89,407 quoted.

  2. For fraction of the week, at 0.238 the error is -0.04 milliseconds, which is within the +/-0.1 milliseconds quoted.

  3. For cancer survival time, at 350 days the actual error is 0, which is within the +/-1.8 seconds quoted.

  4. For the distance between the Earth and moon, the actual error is 0, which is within within the +/-23 meters quoted.

The actual errors may be interesting, but the maximum errors are more useful. Remember the multiplier 2^(-24). All you have to do is multiply a measurement by 2^(-24) and compare the result with the inherent error in the measurement. If 2^(-24) multiplied by the measurement is less than the inherent error, you can use float precision to store your data. Otherwise, you need to use double.

By the way, the formula

maximum_error = 2^(-24) * x

is an approximation. The true formula is

maximum_error = 2^(-24) * 2^(floor(log2(x)))

It can be readily proven that x ≥ 2^(floor(log2(x))) and thus the approximation formula overstates the maximum error. The approximation formula can overstate the maximum error by as much as a factor of 2. Float precision is adequate for most data. There is one kind of data, however, where float precision may not be adequate, and that is financial data such as sales data, general ledgers, and the like. People working with dollar-and-cent data, or Euro-and-Eurocent data, or Pound Stirling-and-penny data, or any other currency data, usually find it best to use doubles. To avoid rounding issues, it is preferable to store the data as pennies. Float precision binary cannot store 0.01, 0.02, and the like, exactly. Integer values, however, can be stored exactly, at least up to certain 16,777,215.

Floats can store up to 16,777,215 exactly. If stored your data in pennies, that would correspond to $167,772.15.

Doubles can store up to 9,007,199,254,740,991 exactly. If you stored your data in pennies, the would correspond to $90,071,992,547,409.91, or just over $90 trillion.

Why don’t I have these problems using Excel?

You do not have these problems when you use Excel because Excel stores numeric values in double precision. As I explained in How float(1.1) solves the problem above,

If storage and calculation are done to the same precision, it will appear to the user as if all numbers that the user types are stored without error.

You can adopt the Excel solution in Stata by typing

. set type double, permanently

You will double (or more) the amount of memory Stata uses to store your data, but if that is not of concern to you, there are no other disadvantages to adopting this solution. If you adopt this solution and later wish to change your mind, type

. set type float, permanently

That’s all for today

If you enjoyed the above, you may want to see Part II (forthcoming). As I said, There are a few technical issues underlying what is written above that may interest those interested in computer science as it applies to statistical computing.

Categories: Numerical Analysis Tags: ,

How to read the %21x format, part 2

In my previous posting last week, I explained how computers store binary floating-point numbers, how Stata’s %21x display format displays with fidelity those binary floating-point numbers, how %21x can help you uncover bugs, and how %21x can help you understand behaviors that are not bugs even though they are surpising to us base-10 thinkers. The point is, it is sometimes useful to think in binary, and with %21x, thinking in binary is not difficult.

This week, I want to discuss double versus float precision.

Double (8-byte) precision provides 53 binary digits. Float (4-byte) precision provides 24. Let me show you what float precision looks like.

. display %21x sqrt(2) _newline %21x float(sqrt(2))
+1.6a09e667f3bcdX+000
+1.6a09e60000000X+000

All those zeros in the floating-point result are not really there;
%21x merely padded them on. The display would be more honest if it were

+1.6a09e6       X+000

Of course, +1.6a09e60000000X+000 is a perfectly valid way of writing +1.6a09e6X+000 — just as 1.000 is a valid way of writing 1 — but you must remember that float has fewer digits than double.

Hexadecimal 1.6109e6 is a rounded version of 1.6a09e667f3bcd, and you can think of this in one of two ways:

     double     =  float   + extra precision
1.6a09e667f3bcd = 1.6a09e6 + 0.00000067f3bcd

or

  float   =      double     -  lost precision
1.6a09e6  = 1.6a09e667f3bcd - 0.00000067f3bcd

Note that more digits are lost than appear in the float result! The float result provides six hexadecimal digits (ignoring the 1), and seven digits appear under the heading lost precision. Double precision is more than twice float precision. To be accurate, double precision provides 53 binary digits, float provides 24, so double precision is really 53/24 = 2.208333 precision.

The double of double precision refers to the total number of binary digits used to store the mantissa and the exponent in z=a*2^b, which is 64 versus 32. Precision is 53 versus 24.

In this case, we obtained the floating-point from float(sqrt(2)), meaning that we rounded a more accurate double-precision result. One usually rounds when producing a less precise representation. One of the rounding rules is to round up if the digits being omitted (with a decimal point in front) exceed 1/2, meaning 0.5 in decimal. The equivalent rule in base-16 is to round up if the digits being omitted (with a hexadecimal point in front) exceed 1/2, meaning 0.8 (base-16). The lost digits were .67f3bcd, which are less than 0.8, and therefore, the last digit of the rounded result was not adjusted.

Actually, rounding to float precision is more difficult than I make out, and seeing that numbers are rounded correctly when displayed in %21x can be difficult. These difficulties have to do with the relationship between base-2 — the base in which the computer works — and base-16 — a base similar but not identical to base-2 that we humans find more readable. The fact is that %21x was designed for double precision, so it only does an adequate job of showing single precision. When %21x displays a float-precision number, it shows you the exactly equal double-precision number, and that turns out to matter.

We use base-16 because it is easier to read. But why do we use base-16 and not base-15 or base-17? We use base-16 because it is an integer power of 2, the base the computer uses. One advantage of bases being powers of each other is that base conversion can be done more easily. In fact, conversion can be done almost digit by digit. Doing base conversion is usually a tedious process. Try converting 2394 (base-10) to base-11. Well, you say, 11^3=1331, and 2*11331 = 2662 > 2394, so the first digit is 1 and the remainder is 2394-1331 = 1063. Now, repeating the process with 1063, I observe that 11^2 = 121 and that 1063 is bound by 8*121=969 and 9*121=1089, so the second digit is 9 and I have a remainder of …. And eventually you produce the answer 1887 (base-11).

Converting between bases when one is the power of another not only is easier but also is so easy you can do it in your head. To convert from base-2 to base-16, group the binary digits into groups of four (because 2^4=16) and then translate each group individually.

For instance, to convert 011110100010, proceed as follows:

0111 1010 0010
--------------
   7    a    2

I’ve performed this process often enough that I hardly have to think. But here is how you should think: Divide the binary number into four-digit groups. The four columns of the binary number stand for 8, 4, 2, and 1. When you look at 0111, say to yourself 4+2+1 = 7. When you look at 1010, say to yourself 8+2 = 10, and remember that the digit for 10 in base-16 is a.

Converting back is nearly as easy:

   7    a    2
--------------
0111 1010 0010

Look at 7 and remember the binary columns 8-4-2-1. Though 7 does not contain an 8, it does contain a 4 (leaving 3), and 3 contains a 2 and a 1.

I admit that converting base-16 to base-2 is more tedious than converting base-2 to base-16, but eventually, you’ll have the four-digit binary table memorized; there are only 16 lines. Say 7 to me, and 0111 just pops into my head. Well, I’ve been doing this a long time, and anyway, I’m a geek. I suspect I carry the as-yet-undiscovered binary gene, which means I came into this world with the base-2-to-base-16 conversion table hardwired:

base-2 base-16
0000 0
0001 1
0010 2
0011 3
0100 4
1001 9
1010 a
1111 f

Now that you can convert base-2 to base-16 — convert from binary to hexadecimal — and you can convert back again, let’s return to floating-point numbers.

Remember how floating-point numbers are stored:

z = a * 2^b, 1<=a<2 or a==0

For example,

    0.0 = 0.0000000000000000000000000000000000000000000000000000 * 2^-big
    0.5 = 1.0000000000000000000000000000000000000000000000000000 * 2^-1
    1.0 = 1.0000000000000000000000000000000000000000000000000000 * 2^0
sqrt(2) = 1.0110101000001001111001100110011111110011101111001101 * 2^0
    1.5 = 1.1000000000000000000000000000000000000000000000000000 * 2^0
    2.0 = 1.0000000000000000000000000000000000000000000000000000 * 2^0
    2.5 = 1.0100000000000000000000000000000000000000000000000000 * 2^0
    3.0 = 1.1000000000000000000000000000000000000000000000000000 * 2^1
    _pi = 1.1001001000011111101101010100010001000010110100011000 * 2^1
    etc.

In double precision, there are 53 binary digits of precision. One of the digits is written to the left of binary point, and the remaining 52 are written to the right. Next observe that the 52 binary digits to the right of the binary point can be written in 52/4=13 hexadecimal digits. That is exactly what %21x does:

    0.0 = +0.0000000000000X-3ff
    0.5 = +1.0000000000000X-001
    1.0 = +1.0000000000000X+000
sqrt(2) = +1.6a09e667f3bcdX+000
    1.0 = +1.0000000000000X+000
    1.5 = +1.8000000000000X+000
    2.0 = +1.0000000000000X+001
    2.5 = +1.4000000000000X+001
    3.0 = +1.8000000000000X+002
    _pi = +1.921fb54442d18X+001

You could perform the binary-to-hexadecimal translation for yourself. Consider _pi. The first group of four binary digits after the binary point are 1001, and 9 appears after the binary point in the %21x result. The second group of four are 0010, and 2 appears in the %21x result.
The %21x result is an exact representation of the underlying binary, and thus you are equally entitled to think in either base.

In single precision, the rule is the same:

z = a * 2^b, 1<=a<2 or a==0

But this time, only 24 binary digits are provided for a, and so we have

    0.0 = 0.00000000000000000000000 * 2^-big
    0.5 = 1.00000000000000000000000 * 2^-1
    1.0 = 1.00000000000000000000000 * 2^0
sqrt(2) = 1.01101010000010011110011 * 2^0
    1.5 = 1.10000000000000000000000 * 2^0
    2.0 = 1.00000000000000000000000 * 2^0 
    2.5 = 1.01000000000000000000000 * 2^0
    3.0 = 1.10000000000000000000000 * 2^1
    _pi = 1.10010010000111111011011 * 2^1
    etc.

In single precision, there are 24-1=23 binary digits of precision to the right of the binary point, and 23 is not divisible by 4. If we tried to convert to base-16, we end up with

sqrt(2) = 1.0110 1010 0000 1001 1110 011   * 2^0
          1.   6    a    0    9    e    ?  * 2^0

To fill in the last digit, we could recognize that we can pad on an extra 0 because we are to the right of the binary point. For example, 1.101 == 1.1010. If we padded on the extra 0, we have

sqrt(2) = 1.0110 1010 0000 1001 1110 0110  * 2^0
          1.   6    a    0    9    e    6  * 2^0

That is precisely the result %21x shows us:

. display %21x float(sqrt(2))
+1.6a09e60000000X+000

although we might wish that %21x would omit the 0s that aren’t really there, and instead display this as +1.6a09e6X+000.

The problem with this solution is that it can be misleading because the last digit looks like it contains four binary digits when in fact it contains only three. To show how easily you can be misled, look at _pi in double and float precisions:

. display %21x _pi _newline %21x float(_pi)
+1.921fb54442d18X+001
+1.921fb60000000X+001
        ^
  digit incorrectly rounded?

The computer rounded the last digit up from 5 to 6. The digits after the rounded-up digit in the full-precision result, however, are 0.4442d18, and are clearly less than 0.8 (1/2). Shouldn’t the rounded result be 1.921fb5X+001? The answer is that yes, 1.921fb5X+001 would be a better result if we had 6*4=24 binary digits to the right of the binary point. But we have only 23 digits; correctly rounding to 23 binary digits and then translating into base-16 results in 1.921fb6X+001. Because of the missing binary digit, the last base-16 digit can only take on the values 0, 2, 4, 6, 8, a, c, and e.

The computer performs the rounding in binary. Look at the relevant piece of this double-precision number in binary:

+1.921f   b    5    4    4    42d18X+001      number
       1011 0101 0100 0100 0100               expansion into binary
       1011 01?x xxxx xxxx xxxxxxxx           thinking about rounding
       1011 011x xxxx xxxx xxxxxxxx           performing rounding
+1.921f   b    6                   X+001      convert to base-16

The part I have converted to binary in the second line is around the part to be rounded. In the third line, I’ve put x’s under the part we will have to discard to round this double into a float. The x’d out part — 10100… — is clearly greater than 1/2, so the last digit (where I put a question mark) must be rounded up. Thus, _pi in float precision rounds to 1.921fb6+X001, just as the computer said.

Float precision does not play much of a role in Stata despite the fact that most users store their data as floats. Regardless of how data are stored, Stata makes all calculations in double precision, and float provides more than enough precision for most data applications. The U.S. deficit in 2011 is projected to be $1.5 trillion. One hopes that a grand total of $26,624 — the error that would be introduced by storing this projected deficit in float precision — would not be a significant factor in any lawmaker’s decision concerning the issue. People in the U.S. are said to work about 40 hours per week, or roughly 0.238 of the hours in a week. I doubt that number is accurate to 0.4 milliseconds, the error that float would introduce in recording the fraction. A cancer survivor might live 350.1 days after a treatment, but we would introduce an error of roughly 1/2 second if we record the number as a float. One might question whether the instant of death could even conceptually be determined that accurately. The moon is said to be 384.401 thousand kilometers from the Earth. Record in 1,000s of kilometers in float, and the error is almost 1 meter. At its closest and farthest, the moon is 356,400 and 406,700 kilometers away. Most fundamental constants of the universe are known only to a few parts in a million, which is to say, to less than float precision, although we do know the speed of light in a vacuum to one decimal digit beyond float accuracy; it’s 299,793.458 kilometers per second. Round that to float and you’ll be off by 0.01 km/s.

The largest integer that can be recorded without rounding in float precision is 16,777,215. The largest integer that can be recorded without rounding in double precision is 9,007,199,254,740,991.

People working with dollar-and-cent data in Stata usually find it best to use doubles both to avoid rounding issues and in case the total exceeds $167,772.15. Rounding issues of 0.01, 0.02, etc., are inherent when working with binary floating point, regardless of precision. To avoid all problems, these people should use doubles and record amounts in pennies. That will have no difficulty with sums up to $90,071,992,547,409.91, which is to say, about $90 trillion. That’s nine quadrillion pennies. In my childhood, I thought a quadrillion just meant a lot, but it has a formal definition.

All of which is a long way from where I started, but now you are an expert in understanding binary floating-point numbers the way a scientific programmer needs to understand them: z=a*2^b. You are nearly all the way to understanding the IEEE 754-2008 standard. That standard merely states how a and b are packed into 32 and 64 bits, and the entire point of %21x is to avoid those details because, packed together, the numbers are unreadable by humans.

References

Cox, N. J. 2006. Tip 33: Sweet sixteen: Hexadecimal formats and precision problems. Stata Journal 6: 282-283.

Gould, William. 2006. Mata matters: Precision. Stata Journal 6: 550-560.

Linhart, J. M. 2008. Mata matters: Overflow and IEEE floating-point format. Stata Journal 8: 255-268.

How to read the %21x format

%21x is a Stata display format, just as are %f, %g, %9.2f, %td, and so on. You could put %21x on any variable in your dataset, but that is not its purpose. Rather, %21x is for use with Stata’s display command for those wanting to better understand the accuracy of the calculations they make. We use %21x frequently in developing Stata.

%21x produces output that looks like this:

. display %21x 1
+1.0000000000000X+000

. display %21x 2
+1.0000000000000X+001

. display %21x 10
+1.4000000000000X+003

. display %21x sqrt(2)
+1.6a09e667f3bcdX+000

All right, I admit that the result is pretty unreadable to the uninitiated. The purpose of %21x is to show floating-point numbers exactly as the computer stores them and thinks about them. In %21x’s defense, it is more readable than how the computer really records floating-point numbers, yet it loses none of the mathematical essence. Computers really record floating-point numbers like this:

      1 = 3ff0000000000000
      2 = 4000000000000000
     10 = 4024000000000000
sqrt(2) = 3ff6a09e667f3bcd

Or more correctly, they record floating-point numbers in binary, like this:

      1 = 0011111111110000000000000000000000000000000000000000000000000000
      2 = 0100000000000000000000000000000000000000000000000000000000000000
     10 = 0100000000100100000000000000000000000000000000000000000000000000
sqrt(2) = 0011111111110110101000001001111001100110011111110011101111001101

By comparison, %21x is a model of clarity.

The above numbers are 8-byte floating point, also known as double precision, encoded in binary64 IEEE 754-2008 little endian format. Little endian means that the bytes are ordered, left to right, from least significant to most significant. Some computers store floating-point numbers in big endian format — bytes ordered from most significant to least significant — and then numbers look like this:

      1 = 000000000000f03f
      2 = 0000000000000040
     10 = 0000000000004024
sqrt(2) = cd3b7f669ea0f63f

or:

      1 = 0000000000000000000000000000000000000000000000001111000000111111
      2 = 0000000000000000000000000000000000000000000000000000000000000100
     10 = 0000000000000000000000000000000000000000000000000100000000100100
sqrt(2) = 1100110100111011011111110110011010011110000011111111011000111111

Regardless of that, %21x produces the same output:

. display %21x 1
+1.0000000000000X+000

. display %21x 2
+1.0000000000000X+001

. display %21x 10
+1.4000000000000X+003

. display %21x sqrt(2)
+1.6a09e667f3bcdX+000

Binary computers store floating-point numbers as a number pair, (a, b); the desired number z is encoded

z = a * 2^b

For example,

 1 = 1.00 * 2^0
 2 = 1.00 * 2^1
10 = 1.25 * 2^3

The number pairs are encrypted in the bit patterns, such as 00111111…01, above.

I’ve written the components a and b in decimal, but for reasons that will become clear, we need to preserve the essential binaryness of the computer’s number. We could write the numbers in binary, but they will be more readable if we represent them in base-16:

base-10 base-16
floating point
1 = 1.00 * 2^0
2 = 1.00 * 2^1
10 = 1.40 * 2^3

“1.40?”, you ask, looking at the last row, which indicates 1.40*2^3 for decimal 10.

The period in 1.40 is not a decimal point; it is a hexadecimal point. The first digit after the hexadecimal point is the number for 1/16ths, the next is for 1/(16^2)=1/256ths, and so on. Thus, 1.40 hexadecimal equals 1 + 4*(1/16) + 0*(1/256) = 1.25 in decimal.

And that is how you read %21x values +1.0000000000000X+000, +1.0000000000000X+001, and +1.4000000000000X+003. To wit,

base-10 base-16
floating point
%21x
1 = 1.00 * 2^0 = +1.0000000000000X+000
2 = 1.00 * 2^1 = +1.0000000000000X+001
10 = 1.40 * 2^3 = +1.4000000000000X+003

The mantissa is shown to the left of the X, and, to the right of the X, the exponent for the 2. %21x is nothing more than a binary variation of the %e format with which we are all familiar, for example, 12 = 1.20000e+01 = 1.2*10^1. It’s such an obvious generalization that one would guess it has existed for a long time, so excuse me when I mention that we invented it at StataCorp. If I weren’t so humble, I would emphasize that this human-readable way of representing binary floating-point numbers preserves nearly every aspect of the IEEE floating-point number. Being humble, I will merely observe that 1.40x+003 is more readable than 4024000000000000.

Now that you know how to read %21x, let me show you how you might use it. %21x is particularly useful for examining precision issues.

For instance, the cube root of 8 is 2; 2*2*2 = 8. And yet, in Stata, 8^(1/3) is not equal to 2:

. display 8^(1/3)2

. assert 8^(1/3) == 2
assertion is false
r(9) ; 

. display %20.0g 8^(1/3)
1.99999999999999978

I blogged about that previously; see How Stata calculates powers. The error is not much:

. display 8^(1/3)-2-2.220e-16

In %21x format, however, we can see that the error is only one bit:

. display %21x 8^(1/3)
+1.fffffffffffffX+000

. display %21x 2    
+1.0000000000000X+001

I wish the answer for 8^(1/3) had been +1.0000000000001X+000, because then the one-bit error would have been obvious to you. Instead, rather than being a bit too large, the actual answer is a bit too small — one bit too small to be exact — so we end up with +1.fffffffffffffX+000.

One bit off means being off by 2^(-52), which is 2.220e-16, and which is the number we saw when we displayed in base-10 8^(1/3)-2. So %21x did not reveal anything we could not have figured out in other ways. The nature of the error, however, is more obvious in %21x format than it is in a base-10 format.

On Statalist, the point often comes up that 0.1, 0.2, …, 0.4, 0.6, …, 0.9, 0.11, 0.12, … have no exact representation in the binary base that computers use. That becomes obvious with %21x format:

. display %21x 0.1
+1.999999999999aX-004 . display %21x 0.2 +1.999999999999aX-003. ...

0.5 does have an exact representation, of course, as do all the negative powers of 2:

. display %21x 0.5           //  1/2
+1.0000000000000X-001

. display %21x 0.25          //  1/4
+1.0000000000000X-002

. display %21x 0.125         //  1/8
+1.0000000000000X-003

. display %21x 0.0625        // 1/16
+1.0000000000000X-004

. ...

Integers have exact representations, too:

. display %21x 1
+1.0000000000000X+000

. display %21x 2
+1.0000000000000X+001

. display %21x 3
+1.8000000000000X+001

. ...

. display %21x 10
+1.4000000000000X+003

. ...

. display %21x 10786204
+1.492b380000000X+017

. ...

%21x is a great way of becoming familiar with base-16 (equivalently, base-2), which is worth doing if you program base-16 (equivalently, base-2) computers.

Let me show you something useful that can be done with %21x.

A programmer at StataCorp has implemented a new statistical command. In four examples, the program produces the following results:

41.8479499816895
 6.7744922637939
 0.1928647905588
 1.6006311178207

Without any additional information, I can tell you that the program has a bug, and that StataCorp will not be publishing the code until the bug is fixed!

How can I know that this program has a bug without even knowing what is being calculated? Let me show you the above results in %21x format:

+1.4ec89a0000000X+005
+1.b191480000000X+002
+1.8afcb20000000X-003
+1.99c2f60000000X+000

Do you see what I see? It’s all those zeros. In randomly drawn problems, it would be unlikely that there would be all zeros at the end of each result. What is likely is that the results were somehow rounded, and indeed they were. The rounding in this case was due to using float (4-byte) precision inadvertently. The programmer forgot to include a double in the ado-file.

And that’s one way %21x is used.

I am continually harping on programmers at StataCorp that if they are going to program binary computers, they need to think in binary. I go ballistic when I see a comparison that’s coded as “if (abs(x-y)<1e-8) …” in an attempt to deal with numerical inaccuracy. What kind of number is 1e-8? Well, it’s this kind of number:

. display %21x 1e-8
+1.5798ee2308c3aX-01b

Why put the computer to all that work, and exactly how many digits are you, the programmer, trying to ignore? Rather than 1e-8, why not use the “nice” numbers 7.451e-09 or 3.725e-09, which is to say, 1.0x-1b or 1.0x-1c? If you do that, then I can see exactly how many digits you are ignoring. If you code 1.0x-1b, I can see you are ignoring 1b=27 binary digits. If you code 1.0x-1c, I can see you are ignoring 1c=28 binary digits. Now, how many digits do you need to ignore? How imprecise do you really think your calculation is? By the way, Stata understands numbers such as 1.0x-1b and 1.0x-1c as input, so you can type the precise number you want.

As another example of thinking in binary, a StataCorp programmer once described a calculation he was making. At one point, the programmer needed to normalize a number in a particular way, and so calculated x/10^trunc(log10(x)), and held onto the 10^trunc(log10(x)) for denormalization later. Dividing by 10, 100, etc., may be easy for us humans, but it’s not easy in binary, and it can result in very small amounts of dreaded round-off error. And why even bother to calculate the log, which is an expensive operation? “Remember,” I said, “how floating-point numbers are recorded on a computer: z = a*2^b, where 0 < = |a| < 2. Writing in C, it’s easy to extract components. In fact, isn’t a number normalized to be between 0 and 2 even better for your purposes?” Yes, it turned out it was.

Even I sometimes forget to think in binary. Just last week I was working on a problem and Alan Riley suggested a solution. I thought a while. “Very clever,” I said. “Recasting the problem in powers of 2 will get rid of that divide that caused half the problem. Even so, there’s still the pesky subtraction.” Alan looked at me, imitating a look I so often give others. “In binary,” Alan patiently explained to me, “the difference you need is the last 19 bits of the original number. Just mask out the other digits.”

At this point, many of you may want to stop reading and go off and play with %21x. If you play with %21x long enough, you’ll eventually examine the relationship between numbers recorded as Stata floats and as Stata doubles, and you may discover something you think to be an error. I will discuss that next week in my next blog posting.