How to read the %21x format
%21x is a Stata display format, just as are %f, %g, %9.2f, %td, and so on. You could put %21x on any variable in your dataset, but that is not its purpose. Rather, %21x is for use with Stata’s display command for those wanting to better understand the accuracy of the calculations they make. We use %21x frequently in developing Stata.
%21x produces output that looks like this:
. display %21x 1 +1.0000000000000X+000 . display %21x 2 +1.0000000000000X+001 . display %21x 10 +1.4000000000000X+003 . display %21x sqrt(2) +1.6a09e667f3bcdX+000
All right, I admit that the result is pretty unreadable to the uninitiated. The purpose of %21x is to show floating-point numbers exactly as the computer stores them and thinks about them. In %21x’s defense, it is more readable than how the computer really records floating-point numbers, yet it loses none of the mathematical essence. Computers really record floating-point numbers like this:
1 = 3ff0000000000000 2 = 4000000000000000 10 = 4024000000000000 sqrt(2) = 3ff6a09e667f3bcd
Or more correctly, they record floating-point numbers in binary, like this:
1 = 0011111111110000000000000000000000000000000000000000000000000000 2 = 0100000000000000000000000000000000000000000000000000000000000000 10 = 0100000000100100000000000000000000000000000000000000000000000000 sqrt(2) = 0011111111110110101000001001111001100110011111110011101111001101
By comparison, %21x is a model of clarity.
The above numbers are 8-byte floating point, also known as double precision, encoded in binary64 IEEE 754-2008 little endian format. Little endian means that the bytes are ordered, left to right, from least significant to most significant. Some computers store floating-point numbers in big endian format — bytes ordered from most significant to least significant — and then numbers look like this:
1 = 000000000000f03f 2 = 0000000000000040 10 = 0000000000004024 sqrt(2) = cd3b7f669ea0f63f
or:
1 = 0000000000000000000000000000000000000000000000001111000000111111 2 = 0000000000000000000000000000000000000000000000000000000000000100 10 = 0000000000000000000000000000000000000000000000000100000000100100 sqrt(2) = 1100110100111011011111110110011010011110000011111111011000111111
Regardless of that, %21x produces the same output:
. display %21x 1 +1.0000000000000X+000 . display %21x 2 +1.0000000000000X+001 . display %21x 10 +1.4000000000000X+003 . display %21x sqrt(2) +1.6a09e667f3bcdX+000
Binary computers store floating-point numbers as a number pair, (a, b); the desired number z is encoded
z = a * 2^b
For example,
1 = 1.00 * 2^0 2 = 1.00 * 2^1 10 = 1.25 * 2^3
The number pairs are encrypted in the bit patterns, such as 00111111…01, above.
I’ve written the components a and b in decimal, but for reasons that will become clear, we need to preserve the essential binaryness of the computer’s number. We could write the numbers in binary, but they will be more readable if we represent them in base-16:
base-10 | base-16 floating point |
|
---|---|---|
1 | = | 1.00 * 2^0 |
2 | = | 1.00 * 2^1 |
10 | = | 1.40 * 2^3 |
“1.40?”, you ask, looking at the last row, which indicates 1.40*2^3 for decimal 10.
The period in 1.40 is not a decimal point; it is a hexadecimal point. The first digit after the hexadecimal point is the number for 1/16ths, the next is for 1/(16^2)=1/256ths, and so on. Thus, 1.40 hexadecimal equals 1 + 4*(1/16) + 0*(1/256) = 1.25 in decimal.
And that is how you read %21x values +1.0000000000000X+000, +1.0000000000000X+001, and +1.4000000000000X+003. To wit,
base-10 | base-16 floating point |
%21x | ||
---|---|---|---|---|
1 | = | 1.00 * 2^0 | = | +1.0000000000000X+000 |
2 | = | 1.00 * 2^1 | = | +1.0000000000000X+001 |
10 | = | 1.40 * 2^3 | = | +1.4000000000000X+003 |
The mantissa is shown to the left of the X, and, to the right of the X, the exponent for the 2. %21x is nothing more than a binary variation of the %e format with which we are all familiar, for example, 12 = 1.20000e+01 = 1.2*10^1. It’s such an obvious generalization that one would guess it has existed for a long time, so excuse me when I mention that we invented it at StataCorp. If I weren’t so humble, I would emphasize that this human-readable way of representing binary floating-point numbers preserves nearly every aspect of the IEEE floating-point number. Being humble, I will merely observe that 1.40x+003 is more readable than 4024000000000000.
Now that you know how to read %21x, let me show you how you might use it. %21x is particularly useful for examining precision issues.
For instance, the cube root of 8 is 2; 2*2*2 = 8. And yet, in Stata, 8^(1/3) is not equal to 2:
. display 8^(1/3)2 . assert 8^(1/3) == 2 assertion is false r(9) ; . display %20.0g 8^(1/3) 1.99999999999999978
I blogged about that previously; see How Stata calculates powers. The error is not much:
. display 8^(1/3)-2-2.220e-16
In %21x format, however, we can see that the error is only one bit:
. display %21x 8^(1/3) +1.fffffffffffffX+000 . display %21x 2 +1.0000000000000X+001
I wish the answer for 8^(1/3) had been +1.0000000000001X+000, because then the one-bit error would have been obvious to you. Instead, rather than being a bit too large, the actual answer is a bit too small — one bit too small to be exact — so we end up with +1.fffffffffffffX+000.
One bit off means being off by 2^(-52), which is 2.220e-16, and which is the number we saw when we displayed in base-10 8^(1/3)-2. So %21x did not reveal anything we could not have figured out in other ways. The nature of the error, however, is more obvious in %21x format than it is in a base-10 format.
On Statalist, the point often comes up that 0.1, 0.2, …, 0.4, 0.6, …, 0.9, 0.11, 0.12, … have no exact representation in the binary base that computers use. That becomes obvious with %21x format:
. display %21x 0.1 +1.999999999999aX-004 . display %21x 0.2 +1.999999999999aX-003. ...
0.5 does have an exact representation, of course, as do all the negative powers of 2:
. display %21x 0.5 // 1/2 +1.0000000000000X-001 . display %21x 0.25 // 1/4 +1.0000000000000X-002 . display %21x 0.125 // 1/8 +1.0000000000000X-003 . display %21x 0.0625 // 1/16 +1.0000000000000X-004 . ...
Integers have exact representations, too:
. display %21x 1 +1.0000000000000X+000 . display %21x 2 +1.0000000000000X+001 . display %21x 3 +1.8000000000000X+001 . ... . display %21x 10 +1.4000000000000X+003 . ... . display %21x 10786204 +1.492b380000000X+017 . ...
%21x is a great way of becoming familiar with base-16 (equivalently, base-2), which is worth doing if you program base-16 (equivalently, base-2) computers.
Let me show you something useful that can be done with %21x.
A programmer at StataCorp has implemented a new statistical command. In four examples, the program produces the following results:
41.8479499816895 6.7744922637939 0.1928647905588 1.6006311178207
Without any additional information, I can tell you that the program has a bug, and that StataCorp will not be publishing the code until the bug is fixed!
How can I know that this program has a bug without even knowing what is being calculated? Let me show you the above results in %21x format:
+1.4ec89a0000000X+005 +1.b191480000000X+002 +1.8afcb20000000X-003 +1.99c2f60000000X+000
Do you see what I see? It’s all those zeros. In randomly drawn problems, it would be unlikely that there would be all zeros at the end of each result. What is likely is that the results were somehow rounded, and indeed they were. The rounding in this case was due to using float (4-byte) precision inadvertently. The programmer forgot to include a double in the ado-file.
And that’s one way %21x is used.
I am continually harping on programmers at StataCorp that if they are going to program binary computers, they need to think in binary. I go ballistic when I see a comparison that’s coded as “if (abs(x-y)<1e-8) …” in an attempt to deal with numerical inaccuracy. What kind of number is 1e-8? Well, it’s this kind of number:
. display %21x 1e-8 +1.5798ee2308c3aX-01b
Why put the computer to all that work, and exactly how many digits are you, the programmer, trying to ignore? Rather than 1e-8, why not use the “nice” numbers 7.451e-09 or 3.725e-09, which is to say, 1.0x-1b or 1.0x-1c? If you do that, then I can see exactly how many digits you are ignoring. If you code 1.0x-1b, I can see you are ignoring 1b=27 binary digits. If you code 1.0x-1c, I can see you are ignoring 1c=28 binary digits. Now, how many digits do you need to ignore? How imprecise do you really think your calculation is? By the way, Stata understands numbers such as 1.0x-1b and 1.0x-1c as input, so you can type the precise number you want.
As another example of thinking in binary, a StataCorp programmer once described a calculation he was making. At one point, the programmer needed to normalize a number in a particular way, and so calculated x/10^trunc(log10(x)), and held onto the 10^trunc(log10(x)) for denormalization later. Dividing by 10, 100, etc., may be easy for us humans, but it’s not easy in binary, and it can result in very small amounts of dreaded round-off error. And why even bother to calculate the log, which is an expensive operation? “Remember,” I said, “how floating-point numbers are recorded on a computer: z = a*2^b, where 0 < = |a| < 2. Writing in C, it’s easy to extract components. In fact, isn’t a number normalized to be between 0 and 2 even better for your purposes?” Yes, it turned out it was.
Even I sometimes forget to think in binary. Just last week I was working on a problem and Alan Riley suggested a solution. I thought a while. “Very clever,” I said. “Recasting the problem in powers of 2 will get rid of that divide that caused half the problem. Even so, there’s still the pesky subtraction.” Alan looked at me, imitating a look I so often give others. “In binary,” Alan patiently explained to me, “the difference you need is the last 19 bits of the original number. Just mask out the other digits.”
At this point, many of you may want to stop reading and go off and play with %21x. If you play with %21x long enough, you’ll eventually examine the relationship between numbers recorded as Stata floats and as Stata doubles, and you may discover something you think to be an error. I will discuss that next week in my next blog posting.