How to read the %21x format
%21x is a Stata display format, just as are %f, %g, %9.2f, %td, and so on. You could put %21x on any variable in your dataset, but that is not its purpose. Rather, %21x is for use with Stata’s display command for those wanting to better understand the accuracy of the calculations they make. We use %21x frequently in developing Stata.
%21x produces output that looks like this:
. display %21x 1 +1.0000000000000X+000 . display %21x 2 +1.0000000000000X+001 . display %21x 10 +1.4000000000000X+003 . display %21x sqrt(2) +1.6a09e667f3bcdX+000
All right, I admit that the result is pretty unreadable to the uninitiated. The purpose of %21x is to show floatingpoint numbers exactly as the computer stores them and thinks about them. In %21x’s defense, it is more readable than how the computer really records floatingpoint numbers, yet it loses none of the mathematical essence. Computers really record floatingpoint numbers like this:
1 = 3ff0000000000000 2 = 4000000000000000 10 = 4024000000000000 sqrt(2) = 3ff6a09e667f3bcd
Or more correctly, they record floatingpoint numbers in binary, like this:
1 = 0011111111110000000000000000000000000000000000000000000000000000 2 = 0100000000000000000000000000000000000000000000000000000000000000 10 = 0100000000100100000000000000000000000000000000000000000000000000 sqrt(2) = 0011111111110110101000001001111001100110011111110011101111001101
By comparison, %21x is a model of clarity.
The above numbers are 8byte floating point, also known as double precision, encoded in binary64 IEEE 7542008 little endian format. Little endian means that the bytes are ordered, left to right, from least significant to most significant. Some computers store floatingpoint numbers in big endian format — bytes ordered from most significant to least significant — and then numbers look like this:
1 = 000000000000f03f 2 = 0000000000000040 10 = 0000000000004024 sqrt(2) = cd3b7f669ea0f63f
or:
1 = 0000000000000000000000000000000000000000000000001111000000111111 2 = 0000000000000000000000000000000000000000000000000000000000000100 10 = 0000000000000000000000000000000000000000000000000100000000100100 sqrt(2) = 1100110100111011011111110110011010011110000011111111011000111111
Regardless of that, %21x produces the same output:
. display %21x 1 +1.0000000000000X+000 . display %21x 2 +1.0000000000000X+001 . display %21x 10 +1.4000000000000X+003 . display %21x sqrt(2) +1.6a09e667f3bcdX+000
Binary computers store floatingpoint numbers as a number pair, (a, b); the desired number z is encoded
z = a * 2^b
For example,
1 = 1.00 * 2^0 2 = 1.00 * 2^1 10 = 1.25 * 2^3
The number pairs are encrypted in the bit patterns, such as 00111111…01, above.
I’ve written the components a and b in decimal, but for reasons that will become clear, we need to preserve the essential binaryness of the computer’s number. We could write the numbers in binary, but they will be more readable if we represent them in base16:
base10  base16 floating point 


1  =  1.00 * 2^0 
2  =  1.00 * 2^1 
10  =  1.40 * 2^3 
“1.40?”, you ask, looking at the last row, which indicates 1.40*2^3 for decimal 10.
The period in 1.40 is not a decimal point; it is a hexadecimal point. The first digit after the hexadecimal point is the number for 1/16ths, the next is for 1/(16^2)=1/256ths, and so on. Thus, 1.40 hexadecimal equals 1 + 4*(1/16) + 0*(1/256) = 1.25 in decimal.
And that is how you read %21x values +1.0000000000000X+000, +1.0000000000000X+001, and +1.4000000000000X+003. To wit,
base10  base16 floating point 
%21x  

1  =  1.00 * 2^0  =  +1.0000000000000X+000 
2  =  1.00 * 2^1  =  +1.0000000000000X+001 
10  =  1.40 * 2^3  =  +1.4000000000000X+003 
The mantissa is shown to the left of the X, and, to the right of the X, the exponent for the 2. %21x is nothing more than a binary variation of the %e format with which we are all familiar, for example, 12 = 1.20000e+01 = 1.2*10^1. It’s such an obvious generalization that one would guess it has existed for a long time, so excuse me when I mention that we invented it at StataCorp. If I weren’t so humble, I would emphasize that this humanreadable way of representing binary floatingpoint numbers preserves nearly every aspect of the IEEE floatingpoint number. Being humble, I will merely observe that 1.40x+003 is more readable than 4024000000000000.
Now that you know how to read %21x, let me show you how you might use it. %21x is particularly useful for examining precision issues.
For instance, the cube root of 8 is 2; 2*2*2 = 8. And yet, in Stata, 8^(1/3) is not equal to 2:
. display 8^(1/3)2 . assert 8^(1/3) == 2 assertion is false r(9) ; . display %20.0g 8^(1/3) 1.99999999999999978
I blogged about that previously; see How Stata calculates powers. The error is not much:
. display 8^(1/3)22.220e16
In %21x format, however, we can see that the error is only one bit:
. display %21x 8^(1/3) +1.fffffffffffffX+000 . display %21x 2 +1.0000000000000X+001
I wish the answer for 8^(1/3) had been +1.0000000000001X+000, because then the onebit error would have been obvious to you. Instead, rather than being a bit too large, the actual answer is a bit too small — one bit too small to be exact — so we end up with +1.fffffffffffffX+000.
One bit off means being off by 2^(52), which is 2.220e16, and which is the number we saw when we displayed in base10 8^(1/3)2. So %21x did not reveal anything we could not have figured out in other ways. The nature of the error, however, is more obvious in %21x format than it is in a base10 format.
On Statalist, the point often comes up that 0.1, 0.2, …, 0.4, 0.6, …, 0.9, 0.11, 0.12, … have no exact representation in the binary base that computers use. That becomes obvious with %21x format:
. display %21x 0.1 +1.999999999999aX004 . display %21x 0.2 +1.999999999999aX003. ...
0.5 does have an exact representation, of course, as do all the negative powers of 2:
. display %21x 0.5 // 1/2 +1.0000000000000X001 . display %21x 0.25 // 1/4 +1.0000000000000X002 . display %21x 0.125 // 1/8 +1.0000000000000X003 . display %21x 0.0625 // 1/16 +1.0000000000000X004 . ...
Integers have exact representations, too:
. display %21x 1 +1.0000000000000X+000 . display %21x 2 +1.0000000000000X+001 . display %21x 3 +1.8000000000000X+001 . ... . display %21x 10 +1.4000000000000X+003 . ... . display %21x 10786204 +1.492b380000000X+017 . ...
%21x is a great way of becoming familiar with base16 (equivalently, base2), which is worth doing if you program base16 (equivalently, base2) computers.
Let me show you something useful that can be done with %21x.
A programmer at StataCorp has implemented a new statistical command. In four examples, the program produces the following results:
41.8479499816895 6.7744922637939 0.1928647905588 1.6006311178207
Without any additional information, I can tell you that the program has a bug, and that StataCorp will not be publishing the code until the bug is fixed!
How can I know that this program has a bug without even knowing what is being calculated? Let me show you the above results in %21x format:
+1.4ec89a0000000X+005 +1.b191480000000X+002 +1.8afcb20000000X003 +1.99c2f60000000X+000
Do you see what I see? It’s all those zeros. In randomly drawn problems, it would be unlikely that there would be all zeros at the end of each result. What is likely is that the results were somehow rounded, and indeed they were. The rounding in this case was due to using float (4byte) precision inadvertently. The programmer forgot to include a double in the adofile.
And that’s one way %21x is used.
I am continually harping on programmers at StataCorp that if they are going to program binary computers, they need to think in binary. I go ballistic when I see a comparison that’s coded as “if (abs(xy)<1e8) …” in an attempt to deal with numerical inaccuracy. What kind of number is 1e8? Well, it’s this kind of number:
. display %21x 1e8 +1.5798ee2308c3aX01b
Why put the computer to all that work, and exactly how many digits are you, the programmer, trying to ignore? Rather than 1e8, why not use the “nice” numbers 7.451e09 or 3.725e09, which is to say, 1.0x1b or 1.0x1c? If you do that, then I can see exactly how many digits you are ignoring. If you code 1.0x1b, I can see you are ignoring 1b=27 binary digits. If you code 1.0x1c, I can see you are ignoring 1c=28 binary digits. Now, how many digits do you need to ignore? How imprecise do you really think your calculation is? By the way, Stata understands numbers such as 1.0x1b and 1.0x1c as input, so you can type the precise number you want.
As another example of thinking in binary, a StataCorp programmer once described a calculation he was making. At one point, the programmer needed to normalize a number in a particular way, and so calculated x/10^trunc(log10(x)), and held onto the 10^trunc(log10(x)) for denormalization later. Dividing by 10, 100, etc., may be easy for us humans, but it’s not easy in binary, and it can result in very small amounts of dreaded roundoff error. And why even bother to calculate the log, which is an expensive operation? “Remember,” I said, “how floatingpoint numbers are recorded on a computer: z = a*2^b, where 0 < = a < 2. Writing in C, it’s easy to extract components. In fact, isn’t a number normalized to be between 0 and 2 even better for your purposes?” Yes, it turned out it was.
Even I sometimes forget to think in binary. Just last week I was working on a problem and Alan Riley suggested a solution. I thought a while. “Very clever,” I said. “Recasting the problem in powers of 2 will get rid of that divide that caused half the problem. Even so, there’s still the pesky subtraction.” Alan looked at me, imitating a look I so often give others. “In binary,” Alan patiently explained to me, “the difference you need is the last 19 bits of the original number. Just mask out the other digits.”
At this point, many of you may want to stop reading and go off and play with %21x. If you play with %21x long enough, you’ll eventually examine the relationship between numbers recorded as Stata floats and as Stata doubles, and you may discover something you think to be an error. I will discuss that next week in my next blog posting.