How Stata calculates powers
Excuse me, but I’m going to toot Stata’s horn.
I got an email from Nicholas Cox (an Editor of the Stata Journal) yesterday. He said he was writing something for the Stata Journal and wanted the details on how we calculated a^b. He was focusing on examples such as (-8)^(1/3), where Stata produces a missing value rather than -2, and he wanted to know if our calculation of that was exp((1/3)*ln(-8)). He didn’t say where he was going, but I answered his question.
I have rather a lot to say about this.
Nick’s supposition was correct, in this particular case, and for most values of a and b, Stata calculates a^b as exp(b*ln(a)). In the case of a=-8 and b=1/3, ln(-8)==., and thus (-8)^(1/3)==..
You might be tempted to say Stata (or other software) should just get this right and return -2. The problem is not only that 1/3 has no exact decimal representation, but it has no exact binary representation, either. Watching for 1/3 and doing something special is problematic.
One solution to this problem, if a solution is necessary, would be to introduce a new function, say, invpower(a,b), that returns a^(1/b). Thus, cube roots would be invpower(a,3). Integers have exact numerical representations even in digital computers, so computers can watch for integers and take special actions.
Whether one should watch integers for such special values is an interesting question. Stata does watch for special values of b in calculating a^b. In particular, it watches for b as an integer, 1<=b<=64 or -64<=b<=-1. When Stata finds such integers, it calculates a^b by repeatedly multiplying a or 1/a. Thus Stata calculates ..., a^(-2), a^(-1), ..., a^2, a^3, ... more accurately than it calculates ..., a^(-2+epsilon), a^(-1+epsilon), ..., a^(2+epsilon), a^(3+epsilon), .... So what could be wrong with that? Let's imagine we wish to make a calculation of F(a,b) and that we have two ways to calculate it. The first way is via an approximation formula A(a,b) = F(a,b) + e, where e is error. Do not assume that error has good properties that a statistical model would have. Typically, e/F() has roughly average 0 (I write sloppily) because otherwise we would add values to achieve that. In addition, the error tends to be serially correlated because approximation formulas are usually continuous. In this case, serial correlation is actually a desirable property! Okay, that's one way we have for calculating F(). The second way is by an exact formula E(a,b) = F(a,b), but E() is only valid for certain values of b. Great, you say, we'll use E() for the special values and use A() for the rest. We'll call that function G(). Now consider someone calculating a numerical derivative from results of G(). Say that they wish to calculate (F(a,b+h)-F(a,b))/h for a suitably small value of h, which they do by calculating (G(a,b+h)+G(a,b))/h. Then they obtain
- (A(a,b+h)-E(b))/h for some values, and
- (A(a,b+h)-A(b))/h for others.
There are other possibilities, for instance, they might obtain (E(a,b-h)-A(b))/h, but the two above will be sufficient for my purposes.
Note that in calculation (2), the serial correlation in the error actually reduces the error in (2) relative to (1)!
This can be an issue.
The error in the case of Stata’s a^b around the integers -64, -63, …, -1, 1, 2, …, 64 is small enough that we just ignored it. For instance, in 2^3, A()-E() = -1.776e-15. Had the error been large enough, we would have combined A() and E() in a different way to produce a more accurate approximation formula. To wit, you know that at certain values of b, E() is exact, so one develops an adjust for A() that uses that information to adjust not just the specific value, but values around the specific value, too.
In this example, A(a,b) = exp(b*ln(a)), a>0, and you may be asking yourself in what sense exp(b*ln(a)) is an approximation. The answer is that exp() and ln() are approximations to the true functions, and in fact, so is * an approximation for the underlying idea of multiplication.
That Stata calculates a^b by repeated multiplication for -64<=b<=-1 and 1<=b<=64 is not something we have ever mentioned. People do not realize the extreme caution we go to on what might seem the minor issues. It is because we do this that things work as you expect. In this case, a^3 is exactly equal to a*a*a. This is ironic because when numerical issues arise that do not have a similarly easy solution, users are disappointed. Why don't you fix that? In the old Fortran days in which I grew up, one would never expect a^3 to equal a*a*a. One's nose was constantly being rubbed into numerical issues, which reminded us not to overlook them. By the way, if issues like this interest you, consider applying to StataCorp for employment. We can fill your days with discussions like this.