Home > Programming > Automating web downloads and file unzipping

Automating web downloads and file unzipping

Andrew J. Dyck wrote a nice post on his blog on how to Download and unzip data files from Stata. He writes

Recently, I’ve been using Stata’s -shp2dta- command to convert some shapefiles to stata format, grabbing Lat/Lon data and merging into another dataset. There were several compressed shapefiles I wanted to download contained in a directory from the web. I could manually download each file and uncompress each one but that would be time consuming. Also, when the maps are updated, I’d have to do the download/uncompress all over again. I’ve found that the process can be automated from within Stata by using a combination of -shell- and some handy terminal commands. …

You should read the rest of his post. He goes on to show how you can script with Stata to automate shelling out to download and unzip a series of files from a website, and he introduces you to some cool Unix-like utilities for Windows.

We here at StataCorp use Stata for tasks like this all the time. In fact, we have built some tools into Stata to allow you to do much of what Andrew described without ever having to leave or shell out of Stata.

For example, Stata can access files over the Internet. Stata has a copy command. And, as of Stata 11, Stata can directly zip and unzip files and directories.

Putting all of those capabilities to use, you can accomplish Andrew’s goal by writing code directly in Stata such as

copy http://example.com/download1.zip download1.zip
copy http://example.com/download2.zip download2.zip
unzipfile download1.zip
unzipfile download2.zip

If there were a large number of files you wished to download and unzip, and they were all named in a regular manner (say, “download1.zip” through “download100.zip”), you could bring them all down and unzip them directly in Stata with a 4 line loop:

forvalues i = 1/100 {
    copy http://example.com/download`i'.zip download`i'.zip
    unzipfile download`i'.zip
}
Categories: Programming Tags: , , , ,