Author Archive

Advanced Mata: Pointers

I’m still recycling my talk called “Mata, The Missing Manual” at user meetings, a talk designed to make Mata more approachable. One of the things I say late in the talk is, “Unless you already know what pointers are and know you need them, ignore them. You don’t need them.” And here I am writing about, of all things, pointers. Well, I exaggerated a little in my talk, but just a little.

Before you take my previous advice and stop reading, let me explain: Mata serves a number of purposes and one of them is as the primary langugage we at StataCorp use to implement new features in Stata. I’m not referring to mock ups, toys, and experiments, I’m talking about ready-to-ship code. Stata 12’s Structural Equation Modeling features are written in Mata, so is Multiple Imputation, so is Stata’s optimizer that is used by nearly all estimation commands, and so are most features. Mata has a side to it that is exceedingly serious and intended for use by serious developers, and every one of those features are available to users just as they are to StataCorp developers. This is one of the reasons there are so many user-written commands are available for Stata. Even if you don’t use the serious features, you benefit. Read more…

Categories: Mata Tags: , ,

Use poisson rather than regress; tell a friend

Do you ever fit regressions of the form

ln(yj) = b0 + b1x1j + b2x2j + … + bkxkj + εj

by typing

. generate lny = ln(y)

. regress lny x1 x2 … xk

The above is just an ordinary linear regression except that ln(y) appears on the left-hand side in place of y. Read more…

Precision (yet again), Part II

In part I, I wrote about precision issues in English. If you enjoyed that, you may want to stop reading now, because I’m about to go into the technical details. Actually, these details are pretty interesting.

For instance, I offered the following formula for calculating error due to float precision: Read more…

Categories: Numerical Analysis Tags: ,

Precision (yet again), Part I

I wrote about precision here and here, but they were pretty technical.

“Great,” coworkers inside StataCorp said to me, “but couldn’t you explain these issues in a way that doesn’t get lost in the details of how computers store binary and maybe, just maybe, write about floats and doubles from a user’s perspective instead of programmer’s perspective?”

“Mmmm,” I said clearly.

Later, when I tried, I liked the result. It contains new material, too. What follows is what I now wish I had written first. I’d would have still written the other two postings, but as technical appendices. Read more…

Categories: Numerical Analysis Tags: ,

Merging data, part 2: Multiple-key merges

Multiple-key merges arise when more than one variable is required to uniquely identify the observations in your data. In Merging data, part 1, I discussed single-key merges such as

        . merge 1:1 personid using ...

In that discussion, each observation in the dataset could be uniquely identified on the basis of a single variable. In panel or longitudinal datasets, there are multiple observations on each person or thing and to uniquely identify the observations, we need at least two key variables, such as Read more…

Categories: Data Management Tags: ,

Merging data, part 1: Merges gone bad

Merging concerns combining datasets on the same observations to produce a result with more variables. We will call the datasets one.dta and two.dta.

When it comes to combining datasets, the alternative to merging is appending, which is combining datasets on the same variables to produce a result with more observations. Appending datasets is not the subject for today. But just to fix ideas, appending looks like this: Read more…

Categories: Data Management Tags: ,

Multiprocessor (core) software (think Stata/MP) and percent parallelization

When most people first think about software designed to run on multiple cores such as Stata/MP, they think to themselves, two cores, twice as fast; four cores, four times as fast. They appreciate that reality will somehow intrude so that two cores won’t really be twice as fast as one, but they imagine the intrusion is something like friction and nothing that an intelligently placed drop of oil can’t improve.

In fact, something inherent intrudes. In any process to accomplish something—even physical processes—some parts may be able to to be performed in parallel, but there are invariably parts that just have to be performed one after the other. Anyone who cooks knows that you sometimes add some ingredients, cook a bit, and then add others, and cook some more. So it is, too, with calculating xt = f(xt-1) for t=1 to 100 and t0=1. Depending on the form of f(), sometimes there’s no alternative to calculating x1 = f(x0), then calculating x2 = f(x1), and so on. Read more…

Graphs, maps, and geocoding

Jim Hufford, Esq. had his first Stata lesson: “This is going to be awesome when I understand what all those little letters and things mean.”

Along those lines—awesome—Jim may want to see these nice Stata scatterplots from the “wannabe economists of the Graduate Institute of International and Development Studies in Geneva” at Rigotnomics.

If you want to graph data onto maps using Stata—and see another awesome graph—see Mitch Abdon’s “Fun with maps in Stata” over at the Stata Daily.

And if you’re interested in geocoding to obtain latitudes and longitudes from human-readable addresses or locations, see Adam Ozimek’s “Computers are taking our jobs: Stata nerds only edition” over at Modeled Behavior and see the related Stata Journal article “Stata utilities for geocoding and generating travel time and travel distance information” by Adam Ozimek and Daniel Miles.

Pi is (still) wrong

See this video, by Vi Hart:

This link was passed on to me by my friend Marcello. I’ve been bold enough to make up words such as eigenaxis and eigenpoint, but it takes real courage to suggest redefining π, even when you’re right!

After seeing the video, you can go here and here to learn more about what is being proposed.

Don’t click on comments until you’ve seen the video. Ms. Hart does a better job presenting the proposal than any of us can.

Categories: Mathematics Tags: ,

Understanding matrices intuitively, part 2, eigenvalues and eigenvectors

Last time, I showed you a way to graph and to think about matrices. This time, I want to apply the technique to eigenvalues and eigenvectors. The point is to give you a picture that will guide your intuition, just as it was previously.

Before I go on, several people asked after reading part 1 for the code I used to generate the graphs. Here it is, both for part 1 and part 2: Read more…