Home > Numerical Analysis > The Penultimate Guide to Precision

## The Penultimate Guide to Precision

There have recently been occasional questions on precision and storage types on Statalist despite all that I have written on the subject, much of it posted in this blog. I take that as evidence that I have yet to produce a useful, readable piece that addresses all the questions researchers have.

So I want to try again. This time I’ll try to write the ultimate piece on the subject, making it as short and snappy as possible, and addressing every popular question of which I am aware—including some I haven’t addressed before—and doing all that without making you wade with me into all the messy details, which I know I have a tendency to do.

I am hopeful that from now on, every question that appears on Statalist that even remotely touches on the subject will be answered with a link back to this page. If I succeed, I will place this in the Stata manuals and get it indexed online in Stata so that users can find it the instant they have questions.

What follows is intended to provide everything scientific researchers need to know to judge the effect of storage precision on their work, to know what can go wrong, and to prevent that. I don’t want to raise expectations too much, however, so I will entitle it …

THE PENULTIMATE GUIDE TO PRECISION

1. Contents

2. Numeric types

1.1 Stata provides five numeric types for storing variables, three of them integer types and two of them floating point.

1.2 The floating-point types are float and double.

1.3 The integer types are byte, int, and long.

1.4 Stata uses these five types for the storage of data.

1.5 Stata makes all calculations in double precision (and sometimes quad precision) regardless of the type used to store the data.

3. Floating-point types

2.1 Stata provides two IEEE 754-2008 floating-point types: float and double.

2.2 float variables are stored in 4 bytes.

2.3 double variables are stored in 8 bytes.

2.4 The ranges of float and double variables are

```     Storage
type             minimum                maximum
-----------------------------------------------------
float     -3.40282346639e+ 38      1.70141173319e+ 38
double    -1.79769313486e+308      8.98846567431e+307
-----------------------------------------------------
In addition, float and double can record missing values
., .a, .b, ..., .z.```

The above values are approximations. For those familiar with %21x floating-point hexadecimal format, the exact values are

```     Storage
type                   minimum                maximum
-------------------------------------------------------
float   -1.fffffe0000000X+07f     +1.fffffe0000000X+07e
double  -1.fffffffffffffX+3ff     +1.fffffffffffffX+3fe
-------------------------------------------------------```

Said differently, and less precisely, float values are in the open interval (-2128, 2127), and double values are in the open interval (-21024, 21023). This is less precise because the intervals shown in the tables are closed intervals.

4. Integer types

3.1 Stata provides three integer storage formats: byte, int, and long. They are 1 byte, 2 bytes, and 4 bytes, respectively.

3.2 Integers may also be stored in Stata’s IEEE 754-2008 floating-point storage formats float and double.

3.3 Integer values may be stored precisely over the ranges

```     storage
type                   minimum                 maximum
------------------------------------------------------
byte                      -127                     100
int                    -32,767                  32,740
long            -2,147,483,647           2,147,483,620
------------------------------------------------------
float              -16,777,216              16,777,216
double  -9,007,199,254,740,992   9,007,199,254,740,992
------------------------------------------------------
In addition, all storage types can record missing values
., .a, .b, ..., .z.```

The overall ranges of float and double were shown in (2.4) and are wider than the ranges for them shown here. The ranges shown here are the subsets of the overall ranges over which no rounding of integer values occurs.

5. Integer precision

4.1 (Automatic promotion.) For the integer storage types—for byte, int, and long—numbers outside the ranges listed in (3.3) would be stored as missing (.) except that storage types are promoted automatically. As necessary, Stata promotes bytes to ints, ints to longs, and longs to doubles. Even if a variable is a byte, the effective range is still [-9,007,199,254,740,992, 9,007,199,254,740,992] in the sense that you could change a value of a byte variable to a large value and that value would be stored correctly; the variable that was a byte would, as if by magic, change its type to int, long, or double if that were necessary.

4.2 (Data input.) Automatic promotion (4.1) applies after the data are input/read/imported/copied into Stata. When first reading, importing, copying, or creating data, it is your responsibility to choose appropriate storage types. Be aware that Stata’s default storage type is float, so if you have large integers, it is usually necessary to specify explicitly the types you wish to use.

If you are unsure of the type to specify for your integer variables, specify double. After reading the data, you can use compress to demote storage types. compress never results in a loss of precision.

4.3 Note that you can use the floating-point types float and double to store integer data.

4.3.1 Integers outside the range [-2,147,483,647, 2,147,483,620] must be stored as doubles if they are to be precisely recorded.

4.3.2 Integers can be stored as float, but avoid doing that unless you are certain they will be inside the range [-16,777,216, 16,777,216] not just when you initially read, import, or copy them into Stata, but subsequently as you make transformations.

4.3.3 If you read your integer data as floats, and assuming they are within the allowed range, we recommend that you change them to an integer type. You can do that simply by typing compress. We make that recommendation so that your integer variables will benefit from the automatic promotion described in (4.1).

4.4 Let us show what can go wrong if you do not follow our advice in (4.3). For the floating-point types—for float and double—integer values outside the ranges listed in (3.3) are rounded.

Consider a float variable, and remember that the integer range for floats is [-16,777,216, 16,777,216]. If you tried to store a value outside the range in the variable—say, 16,777,221—and if you checked afterward, you would discover that actually stored was 16,777,220! Here are some other examples of rounding:

```     desired value                            stored (rounded)
to store            true value             float value
------------------------------------------------------
maximum             16,777,216              16,777,216
maximum+1           16,777,217              16,777,216
------------------------------------------------------
maximum+2           16,777,218              16,777,218
------------------------------------------------------
maximum+3           16,777,219              16,777,220
maximum+4           16,777,220              16,777,220
maximum+5           16,777,221              16,777,220
------------------------------------------------------
maximum+6           16,777,222              16,777,222
------------------------------------------------------
maximum+7           16,777,223              16,777,224
maximum+8           16,777,224              16,777,224
maximum+9           16,777,225              16,777,224
------------------------------------------------------
maximum+10          16,777,226              16,777,226
------------------------------------------------------```

When you store large integers in float variables, values will be rounded and no mention will be made of that fact.

And that is why we say that if you have integer data that must be recorded precisely and if the values might be large—outside the range ±16,777,216—do not use float. Use long or use double; or just use the compress command and let automatic promotion handle the problem for you.

4.5 Unlike byte, int, and long, float and double variables are not promoted to preserve integer precision.

Float values are not promoted because, well, they are not. Actually, there is a deep reason, but it has to do with the use of float variables for their real purpose, which is to store non-integer values.

Double values are not promoted because there is nothing to promote them to. Double is Stata’s most precise storage type. The largest integer value Stata can store precisely is 9,007,199,254,740,992 and the smallest is -9,007,199,254,740,992.

Integer values outside the range for doubles round in the same way that float values round, except at absolutely larger values.

6. Floating-point precision

5.1 The smallest, nonzero value that can be stored in float and double is

```     Storage
type      value          value in %21x         value in base 10
-----------------------------------------------------------------
float     ±2^-127    ±1.0000000000000X-07f   ±5.877471754111e-039
double    ±2^-1022   ±1.0000000000000X-3fe   ±2.225073858507e-308
-----------------------------------------------------------------```

We include the value shown in the third column, the value in %21x, for those who know how to read it. It is described in (9), but it is unimportant. We are merely emphasizing that these are the smallest values for properly normalized numbers.

5.2 The smallest value of epsilon such that 1+epsilon ≠ 1 is

```     Storage
type      epsilon       epsilon in %21x        epsilon in base 10
-----------------------------------------------------------------
float      ±2^-23     ±1.0000000000000X-017    ±1.19209289551e-07
double     ±2^-52     ±1.0000000000000X-034    ±2.22044604925e-16
-----------------------------------------------------------------```

Epsilon is the distance from 1 to the next number on the floating-point number line. The corresponding unit roundoff error is u = ±epsilon/2. The unit roundoff error is the maximum relative roundoff error that is introduced by the floating-point number storage scheme.

The smallest value of epsilon such that x+epsilon ≠ x is approximately |x|*epsilon, and the corresponding unit roundoff error is ±|x|*epsilon/2.

5.3 The precision of the floating-point types is, depending on how you want to measure it,

```     Measurement                           float              double
----------------------------------------------------------------
# of binary digits                       23                  52
# of base 10 digits (approximate)         7                  16

Relative precision                   ±2^-24              ±2^-53
... in base 10 (approximate)      ±5.96e-08           ±1.11e-16
----------------------------------------------------------------```

Relative precision is defined as

```                       |x - x_as_stored|
± max   ------------------
x            x```

performed using infinite precision arithmetic, x chosen from the subset of reals between the minimum and maximum values that can be stored. It is worth appreciating that relative precision is a worst-case relative error over all possible numbers that can be stored. Relative precision is identical to roundoff error, but perhaps this definition is easier to appreciate.

5.4 Stata never makes calculations in float precision, even if the data are stored as float.

Stata makes double-precision calculations regardless of how the numeric data are stored. In some cases, Stata internally uses quad precision, which provides approximately 32 decimal digits of precision. If the result of the calculation is being stored back into a variable in the dataset, then the double (or quad) result is rounded as necessary to be stored.

5.5 (False precision.) Double precision is 536,870,912 times more accurate than float precision. You may worry that float precision is inadequate to accurately record your data.

Little in this world is measured to a relative accuracy of ±2-24, the accuracy provided by float precision.

Ms. Smith, it is reported, made \$112,293 this year. Do you believe that is recorded to an accuracy of ±2-24*112,293, or approximately ±0.7 cents?

David was born on 21jan1952, so on 27mar2012 he was 21,981 days old, or 60.18 years old. Recorded in float precision, the precision is ±60.18*2-24, or roughly ±1.89 minutes.

Joe reported that he drives 12,234 miles per year. Do you believe that Joe’s report is accurate to ±12,234*2-24, equivalent to ±3.85 feet?

A sample of 102,400 people reported that they drove, in total, 1,252,761,600 miles last year. Is that accurate to ±74.7 miles (float precision)? If it is, each of them is reporting with an accuracy of roughly ±3.85 feet.

The distance from the Earth to the moon is often reported as 384,401 kilometers. Recorded as a float, the precision is ±384,401*2-24, or ±23 meters, or ±0.023 kilometers. Because the number was not reported as 384,401.000, one would assume float precision would be accurate to record that result. In fact, float precision is more than sufficiently accurate to record the distance because the distance from the Earth to the moon varies from 356,400 to 406,700 kilometers, some 50,300 kilometers. The distance would have been better reported as 384,401 ±25,150 kilometers. At best, the measurement 384,401 has relative accuracy of ±0.033 (it is accurate to roughly two digits).

Nonetheless, a few things have been measured with more than float accuracy, and they stand out as crowning accomplishments of mankind. Use double as required.

7. Advice concerning 0.1, 0.2, …

6.1 Stata uses base 2, binary. Popular numbers such as 0.1, 0.2, 100.21, and so on, have no exact binary representation in a finite number of binary digits. There are a few exceptions, such as 0.5 and 0.25, but not many.

6.2 If you create a float variable containing 1.1 and list it, it will list as 1.1 but that is only because Stata’s default display format is %9.0g. If you changed that format to %16.0g, the result would appear as 1.1000000238419.

This scares some users. If this scares you, go back and read (5.5) False Precision. The relative error is still a modest ±2-24. The number 1.1000000238419 is likely a perfectly acceptable approximation to 1.1 because the 1.1 was never measured to an accuracy of less than ±2-24 anyway.

6.3 One reason perfectly acceptable approximations to 1.1 such as 1.1000000238419 may bother you is that you cannot select observations containing 1.1 by typing if x==1.1 if x is a float variable. You cannot because the 1.1 on the right is interpreted as double precision 1.1. To select the observations, you have to type if x==float(1.1).

6.4 If this bothers you, record the data as doubles. It is best to do this at the point when you read the original data or when you make the original calculation. The number will then appear to be 1.1. It will not really be 1.1, but it will have less relative error, namely, ±2-53.

6.5 If you originally read the data and stored them as floats, it is still sometimes possible to recover the double-precision accuracy just as if you had originally read the data into doubles. You can do this if you know how many decimal digits were recorded after the decimal point and if the values are within a certain range.

If there was one digit after the decimal point and if the data are in the range [-1,048,576, 1,048,576], which means the values could be -1,048,576, -1,048,575.9, …, -1, 0, 1, …, 1,048,575.9, 1,048,576, then typing

. gen double y = round(x*10)/10

will recover the full double-precision result. Stored in y will be the number in double precision just as if you had originally read it that way.

It is not possible, however, to recover the original result if x is outside the range ±1,048,576 because the float variable contains too little information.

You can do something similar when there are two, three, or more decimal digits:

```     # digits to
right of
decimal pt.   range     command
-----------------------------------------------------------------
1      ±1,048,576   gen double y = round(x*10)/10
2      ±  131,072   gen double y = round(x*100)/100
3      ±   16,384   gen double y = round(x*1000)/1000
4      ±    1,024   gen double y = round(x*10000)/10000
5      ±      128   gen double y = round(x*100000)/100000
6      ±       16   gen double y = round(x*1000000)/1000000
7      ±        1   gen double y = round(x*10000000)/10000000
-----------------------------------------------------------------```

Range is the range of x over which command will produce correct results. For instance, range = ±16 in the next-to-the-last line means that the values recorded in x must be -16 ≤ x ≤ 16.

8. Advice concerning exact data, such as currency data

7.1 Yes, there are exact data in this world. Such data are usually counts of something or are currency data, which you can think of as counts of pennies (\$0.01) or the smallest unit in whatever currency you are using.

7.2 Just because the data are exact does not mean you need exact answers. It may still be that calculated answers are adequate if the data are recorded to a relative accuracy of ±2-24 (float). For most analyses—even of currency data—this is often adequate. The U.S. deficit in 2011 was \$1.5 trillion. Stored as a float, this amount has a (maximum) error of ±2-24*1.5e+12 = ±\$89,406.97. It would be difficult to imagine that ±\$89,406.97 would affect any government decision maker dealing with the full \$1.5 trillion.

7.3 That said, you sometimes do need to make exact calculations. Banks tracking their accounts need exact amounts. It is not enough to say to account holders that we have your money within a few pennies, dollars, or hundreds of dollars.

In that case, the currency data should be converted to integers (pennies) and stored as integers, and then processed as described in (4). Assuming the dollar-and-cent amounts were read into doubles, you can convert them into pennies by typing

. replace x = x*100

7.4 If you mistakenly read the currency data as a float, you do not have to re-read the data if the dollar amounts are between ±\$131,072. You can type

. gen double x_in_pennies = round(x*100)

This works only if x is between ±131,072.

8.1 Stata does all calculations in double (and sometimes quad) precision.

Float precision may be adequate for recording most data, but float precision is inadequate for performing calculations. That is why Stata does all calculations in double precision. Float precision is also inadequate for storing the results of intermediate calculations.

There is only one situation in which you need to exercise caution—if you create variables in the data containing intermediate results. Be sure to create all such variables as doubles.

8.2 The same quad-precision routines StataCorp uses are available to you in Mata; see the manual entries [M-5] mean, [M-5] sum, [M-5] runningsum, and [M-5] quadcross. Use them as you judge necessary.

10. How to interpret %21x format (if you care)

9.1 Stata has a display format that will display IEEE 754-2008 floating-point numbers in their full binary glory but in a readable way. You probably do not care; if so, skip this section.

9.2 IEEE 754-2008 floating-point numbers are stored as a pair of numbers (a, b) that are given the interpretation

z = a * 2b

where -2 < a < 2. In double precision, a is recorded with 52 binary digits. In float precision, a is recorded with 23 binary digits. For example, the number 2 is recorded in double precision as

a = +1.0000000000000000000000000000000000000000000000000000
b = +1

The value of pi is recorded as

a = +1.1001001000011111101101010100010001000010110100011000
b = +1

9.3 %21x presents a and b in base 16. The double-precision value of 2 is shown in %21x format as

+1.0000000000000X+001

and the value of pi is shown as

+1.921fb54442d18X+001

In the case of pi, the interpretation is

a = +1.921fb54442d18 (base 16)
b = +001             (base 16)

Reading this requires practice. It helps to remember that one-half corresponds to 0.8 (base 16). Thus, we can see that a is slightly larger than 1.5 (base 10) and b = 1 (base 10), so _pi is something over 1.5*21 = 3.

The number 100,000 in %21x is

+1.86a0000000000X+010

which is to say

a = +1.86a0000000000 (base 16)
b = +010             (base 16)

We see that a is slightly over 1.5 (base 10), and b is 16 (base 10), so 100,000 is something over 1.5*216 = 98,304.

9.4 %21x faithfully presents how the computer thinks of the number. For instance, we can easily see that the nice number 1.1 (base 10) is, in binary, a number with many digits to the right of the binary point:

. display %21x 1.1
+1.199999999999aX+000

We can also see why 1.1 stored as a float is different from 1.1 stored as a double:

. display %21x float(1.1)
+1.19999a0000000X+000

Float precision assigns fewer digits to the mantissa than does double precision, and 1.1 (base 10) in base 16 is a repeating hexadecimal.

9.5 %21x can be used as an input format as well as an output format. For instance, Stata understands

. gen x = 1.86ax+10

Stored in x will be 100,000 (base 10).

9.6 StataCorp has seen too many competent scientific programmers who, needing a perturbance for later use in their program, code something like

epsilon = 1e-8

It is worth examining that number:

. display %21x 1e-8
+1.5798ee2308c3aX-01b

That is an ugly number that can only lead to the introduction of roundoff error in their program. A far better number would be

epsilon = 1.0x-1b

Stata and Mata understand the above statement because %21x may be used as input as well as output. Naturally, 1.0x-1b looks just like what it is,

. display %21x 1.0x-1b
+1.0000000000000X-01b

and all those pretty zeros will reduce numerical roundoff error.

In base 10, the pretty 1.0x-1b looks like

. display %20.0g 1.0x-1b
7.4505805969238e-09

and that number may not look pretty to you, but you are not a base-2 digital computer.

Perhaps the programmer feels that epsilon really needs to be closer to 1e-8. In %21x, we see that 1e-8 is +1.5798ee2308c3aX-01b, so if we want to get closer, perhaps we use

epsilon = 1.6x-1b

9.7 %21x was invented by StataCorp.

11. Also see