Stata Press is pleased to announce the release of An Introduction to Stata for Health Researchers, Fifth Edition, by Svend Juul and Morten Frydenberg. This book debuted at #1 on Kindle’s new release list for Probability & Statistics and debuted on the top ten list on Kindle’s new release list for Mathematics. Read more…
Categories: New Books, New Products, Resources Tags: 5th, biostatistics, books, data management, datasets, epidemiology, fifth edition, forthcoming, Frydenberg, health policy, health research, health researchers, ICD-10, ICD-9, introduction, Juul, Morton, power, precision, public health, sample-size analysis, Stata 17, stata press, Svend, tables, textbook, unicode
There have recently been occasional questions on precision and storage types on Statalist despite all that I have written on the subject, much of it posted in this blog. I take that as evidence that I have yet to produce a useful, readable piece that addresses all the questions researchers have.
So I want to try again. This time I’ll try to write the ultimate piece on the subject, making it as short and snappy as possible, and addressing every popular question of which I am aware—including some I haven’t addressed before—and doing all that without making you wade with me into all the messy details, which I know I have a tendency to do. Read more…
In part I, I wrote about precision issues in English. If you enjoyed that, you may want to stop reading now, because I’m about to go into the technical details. Actually, these details are pretty interesting.
For instance, I offered the following formula for calculating error due to float precision: Read more…
I wrote about precision here and here, but they were pretty technical.
“Great,” coworkers inside StataCorp said to me, “but couldn’t you explain these issues in a way that doesn’t get lost in the details of how computers store binary and maybe, just maybe, write about floats and doubles from a user’s perspective instead of programmer’s perspective?”
“Mmmm,” I said clearly.
Later, when I tried, I liked the result. It contains new material, too. What follows is what I now wish I had written first. I’d would have still written the other two postings, but as technical appendices. Read more…
In my previous posting last week, I explained how computers store binary floating-point numbers, how Stata’s %21x display format displays with fidelity those binary floating-point numbers, how %21x can help you uncover bugs, and how %21x can help you understand behaviors that are not bugs even though they are surpising to us base-10 thinkers. The point is, it is sometimes useful to think in binary, and with %21x, thinking in binary is not difficult.
This week, I want to discuss double versus float precision. Read more…
%21x is a Stata display format, just as are %f, %g, %9.2f, %td, and so on. You could put %21x on any variable in your dataset, but that is not its purpose. Rather, %21x is for use with Stata’s display command for those wanting to better understand the accuracy of the calculations they make. We use %21x frequently in developing Stata. Read more…
Excuse me, but I’m going to toot Stata’s horn.
I got an email from Nicholas Cox (an Editor of the Stata Journal) yesterday. He said he was writing something for the Stata Journal and wanted the details on how we calculated a^b. He was focusing on examples such as (-8)^(1/3), where Stata produces a missing value rather than -2, and he wanted to know if our calculation of that was exp((1/3)*ln(-8)). He didn’t say where he was going, but I answered his question.
I have rather a lot to say about this.
Nick’s supposition was correct, in this particular case, and for most values of a and b, Stata calculates a^b as exp(b*ln(a)). In the case of a=-8 and b=1/3, ln(-8)==., and thus (-8)^(1/3)==.. Read more…