Archive for the ‘Data Management’ Category

COVID-19 time-series data from Johns Hopkins University

In my last post, we learned how to import the raw COVID-19 data from the Johns Hopkins GitHub repository. This post will demonstrate how to convert the raw data to time-series data. We’ll also create some tables and graphs along the way. Read more…

Update to Import COVID-19 post

In my last post, I mentioned that I did not want to distribute my covid19.ado file because “it could be rendered useless if or when Johns Hopkins changes its data”. I wrote that on March 19, 2020, and the data changed on March 23, 2020. This will likely happen again (and again, and again …). I may post updates in the future as the data change, but you may need to adapt sooner than I can post. So let’s see how we can update our code to adapt to the changing data. Read more…

Import COVID-19 data from Johns Hopkins University

Like many of you, I am working from home and checking the latest news on COVID-19 frequently. I see a lot of numbers and graphs, so I looked around for the “official data”. One of the best data sources I have found is at the GitHub website for Johns Hopkins Whiting School of Engineering Center for Systems Science and Engineering. The data for each day are stored in a separate file, so I wrote a little Stata command called covid19 to download, combine, save, and graph these data. Read more…

Web scraping NBA data into Stata

As of November 2019, this command no longer works because of restrictions.

Since our intern, Chris Hassell, finished nfl2stata earlier than expected, he went ahead and created another command to web scrape for data on the NBA. The command is nba2stata. To install the command type

net install, replace

Read more…

Categories: Data Management Tags: ,

Web scraping NFL data into Stata

Football season is around the corner, and I could not be more excited. We have a pretty competitive StataCorp fantasy football league. I’m always looking for an edge in our league, so I challenged one of our interns, Chris Hassell, to write a command to web scrape for data on the NFL. The new command is nfl2stata. To install the command, type

net install, replace

Read more…

Categories: Data Management Tags: ,

Export tabulation results to Excel—Update

It’s summer time, which means we have interns working at StataCorp again. Our newest intern, Chris Hassell, was tasked with updating my community-contributed command tab2xl with most of the suggestions that blog readers left in the comments. Chris updated tab2xl and wrote tab2docx, which writes a tabulation table to a Word file using the putdocx command.
Read more…

Importing Facebook data into Stata

As of 2018, this command no longer works due to Facebook API restrictions.

In a previous post, we released a new command to import Twitter data into Stata. We have now added another new command, facebook2stata, that imports Facebook data. To install facebook2stata, type

net install, replace

Read more…

Categories: Data Management Tags: ,

Data management made easy

Data management and data cleaning are critically important steps in any data analysis. Many of us learned this lesson the hard way. Have you ever fit a model that includes age as a covariate and forgotten to convert the missing value codes of -99 to missing values? I have. Or maybe you overlooked a data entry error that resulted in an age of 354 that should have been 54. I’ve done that too. Careful data management and cleaning can help us avoid these kinds of embarrassing mistakes.

I recently recorded a series of data management videos for the Stata Youtube Channel. You can click on the links below to watch the videos. I included topics that I think are important, but the list is far from exhaustive. If you would like to see videos on additional topics, please leave your suggestion in the comments below.

Data management playlist

You can learn more about these topics and many others in the Data Management Reference Manual.

Importing WRDS data into Stata

Wharton Research Data Services (WRDS) is a leading research platform and business intelligence tool for 400+ corporate, academic, and government researchers. If your institution subscribes to WRDS, you can now easily access WRDS data remotely via Stata’s odbc command. For questions or subscription information click here. Read more…

Importing data with import fred


The Federal Reserve Economic Database (FRED), maintained by the Federal Reserve Bank of St. Louis, makes available hundreds of thousands of time-series measuring economic and social outcomes. The new Stata 15 command import fred imports data from this repository.

In this post, I show how to use import fred to import data from FRED. I also discuss some of the metadata that import fred provides that can be useful in data management. I then demonstrate how to use an advanced feature: importing multiple revisions of series whose observations are updated over time. Read more…

Categories: Data Management Tags: ,