Archive for the ‘Data Management’ Category

Merging data, part 1: Merges gone bad

Merging concerns combining datasets on the same observations to produce a result with more variables. We will call the datasets one.dta and two.dta.

When it comes to combining datasets, the alternative to merging is appending, which is combining datasets on the same variables to produce a result with more observations. Appending datasets is not the subject for today. But just to fix ideas, appending looks like this: Read more…

Categories: Data Management Tags: ,

Graphs, maps, and geocoding

Jim Hufford, Esq. had his first Stata lesson: “This is going to be awesome when I understand what all those little letters and things mean.”

Along those lines—awesome—Jim may want to see these nice Stata scatterplots from the “wannabe economists of the Graduate Institute of International and Development Studies in Geneva” at Rigotnomics.

If you want to graph data onto maps using Stata—and see another awesome graph—see Mitch Abdon’s “Fun with maps in Stata” over at the Stata Daily.

And if you’re interested in geocoding to obtain latitudes and longitudes from human-readable addresses or locations, see Adam Ozimek’s “Computers are taking our jobs: Stata nerds only edition” over at Modeled Behavior and see the related Stata Journal article “Stata utilities for geocoding and generating travel time and travel distance information” by Adam Ozimek and Daniel Miles.

Using dates and times from other software

Most software stores dates and times numerically, as durations from some sentinel date, but they differ on the sentinel date and on the units in which the duration is stored. Stata stores dates as the number of days since 01jan1960, and datetimes as the number of milliseconds since 01jan1960 00:00:00.000. January 3, 2011 is stored as 18,630, and 2pm on January 3 is stored as 1,609,682,400,000. Other packages use different choices for bases and units.

It sometimes happens that you need to process in Stata data imported from other software and end up with a numerical variable recording a date or datetime in the other software’s encoding. It is usually possible to adjust the numeric date or datetime values to the sentinel date and units that Stata uses. Below are conversion rules for SAS, SPSS, R, Excel, and Open Office. Read more…

Categories: Data Management Tags: ,

Connection string support added to odbc command

Stata’s odbc command allows you to import data from and export data to any ODBC data source on your computer. ODBC is a standardized way for applications to read data from and write data to different data sources such as databases and spreadsheets.

Until now, before you could use the odbc command, you had to add a named data source (DSN) to the computer via the ODBC Data Source Administrator. If you did not have administrator privileges on your computer, you could not do this. Read more…