Author Archive

How to automate common tasks

Automating common tasks is crucial to effective data analysis. Automation saves you lots of time from repeating the same sets of operations, and it reduces errors by reducing what you have to repeat.

Let’s automate something using Stata. The task we are automating doesn’t much matter. What matters is that we get comfortable with how to automate tasks.

We will automate the simple task of normalizing a variable. That is to say, subtracting the variable’s mean and dividing by its standard deviation.

Just so you know, there are already community-contributed commands to do this and to do it more flexibly than we will. Type search normalize variable in Stata, and you will see one of those commands. (You will see things about other types of normalization that have nothing to do with normalizing a variable, but the command of interest is easy to pick out.) You can also normalize a single variable using Stata’s egen command, but we are going to do more than that.

As with all the articles in this series, I assume the reader is new to automating tasks in Stata. So, if you are already an expert, these articles may hold little interest for you. Or perhaps you will still find something novel. Read more…

Multilevel random effects in xtmixed and sem — the long and wide of it

xtmixed was built from the ground up for dealing with multilevel random effects — that is its raison d’être. sem was built for multivariate outcomes, for handling latent variables, and for estimating structural equations (also called simultaneous systems or models with endogeneity). Can sem also handle multilevel random effects (REs)? Do we care?

This would be a short entry if either answer were “no”, so let’s get after the first question. Read more…

Competing risks in the Stata News

The fourth quarter Stata News came out today. Among other things, it contains an article by Bobby Gutierrez, StataCorp’s Director of Statistics, about competing risks survival analysis. If any of you are like me, conversant in survival analysis but not an expert, I think you will enjoy Bobby’s article. In a mere page and a half, I learned the primary differences between competing risks analysis and the Cox proportional hazards model and why I will sometimes prefer competing risks. Bobby’s article can be read at

Stata/MP — having fun with millions

I was reviewing some timings from the Stata/MP Performance Report this morning. (For those who don’t know, Stata/MP is the version of Stata that has been programmed to take advantage of multiprocessor and multicore computers. It is functionally equivalent to the largest version of Stata, Stata/SE, and it is faster on multicore computers.)

What was unusual this morning is that I was running Stata/MP interactively. We usually run MP for large batch jobs that run thousands of timings on large datasets — either to tune performance or to produce reports like the Performance Report. That is the type of work Stata/MP was designed for — big jobs on big datasets. Read more…