Archive

Archive for the ‘Data Management’ Category

Importing WRDS data into Stata

Wharton Research Data Services (WRDS) is a leading research platform and business intelligence tool for 400+ corporate, academic, and government researchers. If your institution subscribes to WRDS, you can now easily access WRDS data remotely via Stata’s odbc command. For questions or subscription information click here. Read more…

Importing data with import fred

Introduction

The Federal Reserve Economic Database (FRED), maintained by the Federal Reserve Bank of St. Louis, makes available hundreds of thousands of time-series measuring economic and social outcomes. The new Stata 15 command import fred imports data from this repository.

In this post, I show how to use import fred to import data from FRED. I also discuss some of the metadata that import fred provides that can be useful in data management. I then demonstrate how to use an advanced feature: importing multiple revisions of series whose observations are updated over time. Read more…

Categories: Data Management Tags: ,

Importing Twitter data into Stata

In the past, we’ve had users ask if Stata could import Twitter data. So we asked one of our interns, Dawson Deere (currently working on his computer science degree at Texas A&M University) to see if he could write a new command to do this. He used Stata 15’s improved Java plugins feature to write a new twitter2stata command. To install twitter2stata, type

ssc install twitter2stata, replace

Read more…

Categories: Data Management Tags: ,

Handling gaps in time series using business calendars

Time-series data, such as financial data, often have known gaps because there are no observations on days such as weekends or holidays. Using regular Stata datetime formats with time-series data that have gaps can result in misleading analysis. Rather than treating these gaps as missing values, we should adjust our calculations appropriately. I illustrate a convenient way to work with irregularly spaced dates by using Stata’s business calendars.

In nasdaq.dta, I have daily data on Read more…

A tour of datetime in Stata

Converting a string date

Stata has a wide array of tools to work with dates. You can have dates in years, months, or even milliseconds. In this post, I will provide a brief tour of working with dates that will help you get started using all of Stata’s tools.

When you load a dataset, you will notice that every variable has a display format. For date variables, the display format is %td for daily dates, %tm for monthly dates, etc. Let’s load the wpi1 dataset as Read more…

Using import excel with real world data

Stata 12’s new import excel command can help you easily import real-world Excel files into Stata. Excel files often contain header and footer information in the first few and last few rows of a sheet, and you may not want that information loaded. Also, the column labels used in the sheet are invalid Stata variable names and therefore cannot be loaded. Both of these issues can be easily solved using import excel. Read more…

Categories: Data Management Tags: ,

The next leap second will be on June 30th, maybe

Leap seconds are the extra seconds inserted every so often to keep precise atomic clocks better synchronized with the rotation of the Earth. Scheduled for June 30th is the extra second 23:59:60 inserted between 23:59:59 and 00:00:00. Or maybe not.

Tomorrow or Friday a vote may be held at the International Telecommuncation Union (ITU) meeting in Geneva to abolish the leap second from the definition of UTC (Coordinated Universial Time). Which would mean StataCorp would not have to post an update to Stata to keep the %tC format working correctly. Read more…

Categories: Data Management Tags: ,

Merging data, part 2: Multiple-key merges

Multiple-key merges arise when more than one variable is required to uniquely identify the observations in your data. In Merging data, part 1, I discussed single-key merges such as

        . merge 1:1 personid using ...

In that discussion, each observation in the dataset could be uniquely identified on the basis of a single variable. In panel or longitudinal datasets, there are multiple observations on each person or thing and to uniquely identify the observations, we need at least two key variables, such as Read more…

Categories: Data Management Tags: ,

Merging data, part 1: Merges gone bad

Merging concerns combining datasets on the same observations to produce a result with more variables. We will call the datasets one.dta and two.dta.

When it comes to combining datasets, the alternative to merging is appending, which is combining datasets on the same variables to produce a result with more observations. Appending datasets is not the subject for today. But just to fix ideas, appending looks like this: Read more…

Categories: Data Management Tags: ,

Graphs, maps, and geocoding

Jim Hufford, Esq. had his first Stata lesson: “This is going to be awesome when I understand what all those little letters and things mean.”

Along those lines—awesome—Jim may want to see these nice Stata scatterplots from the “wannabe economists of the Graduate Institute of International and Development Studies in Geneva” at Rigotnomics.

If you want to graph data onto maps using Stata—and see another awesome graph—see Mitch Abdon’s “Fun with maps in Stata” over at the Stata Daily.

And if you’re interested in geocoding to obtain latitudes and longitudes from human-readable addresses or locations, see Adam Ozimek’s “Computers are taking our jobs: Stata nerds only edition” over at Modeled Behavior and see the related Stata Journal article “Stata utilities for geocoding and generating travel time and travel distance information” by Adam Ozimek and Daniel Miles.