Home > Data Management > Wharton Research Data Services, Stata 17, and JDBC

Wharton Research Data Services, Stata 17, and JDBC

Working with Wharton Research Data Services (WRDS) data in Stata is now even easier. I previously wrote about accessing WRDS data via ODBC. With Stata 17, using JDBC makes configuring WRDS and Stata even easier—and the steps to configure are the same across all operating systems. Whether you download WRDS data to your local machine or work in the cloud, the command to use in Stata for JDBC is jdbc.

To set up jdbc, you need to download a JDBC driver .jar file. To get the correct .jar file, go to your database vendor’s website. The WRDS platform uses a Postgres database, so download the driver from https://jdbc.postgresql.org. This .jar file will work with jdbc on Unix, Mac, or Windows as long as you save the file along Stata’s ado-path. I moved the downloaded .jar file to the personal ado-path directory on my Windows machine, which is

C:\Users\kevin\ado\personal/

For my Mac, it’s

/Users/kevin/Documents/Stata/ado/personal

and for Unix, it’s

/home/kevin/ado/personal

Whether working in the WRDS cloud or locally on your machine, you will also need the correct Java driver class name, URL, username, and password. For WRDS, your connection setting will be the same as mine, except for the username and password, which WRDS will supply. The settings are

JDBC Settings WRDS Settings
Driver File postgresql-42.3.0.jar
Java Class Name org.postgresql.Drive
URL jdbc:postgresql://wrds-pgdata.wharton.upenn.edu:9737
Username stata
Password secret

Note that the typical URL syntax for JDBC is

jdbc:DBVenderName://IP_or_URL:Port/DBName?ConnectionOptions

Because these settings are challenging to remember, it’s best to create a do-file that stores them. My do-file is

local jar "postgresql-42.3.0.jar"
local classname "org.postgresql.Driver"
local url "jdbc:postgresql://wrds-pgdata.wharton.upenn.edu:9737/wrds?ssl=require&sslfactory=org.postgresql.ssl.NonValidatingFactory"
local user "stata"
local pass "secret"

To store these settings for the current Stata session, add the following jdbc connect command to your do-file, and run the do-file.

jdbc connect, jar("`jar'") driverclass("`classname'") url("`url'")   ///
     user("`user'") password("`pass'")

jdbc remembers your last settings for the entire Stata session. If you want these stored between sessions, you can add them to a profile.do file.

To list all the tables in the database, type jdbc showtables.

Typically, the list displayed from this command is several hundred lines long. The way to search for specific tables in WRDS is through using SQL wildcards. For example, the % wildcard matches zero or more characters.

. jdbc showtables dj_equities_201%

Database: wrds
----------------------------------------------------------------------------
Tables
----------------------------------------------------------------------------
dj_equities_2010
dj_equities_2011
dj_equities_2012
dj_equities_2013
dj_equities_2014
dj_equities_2015
dj_equities_2016
dj_equities_2017
dj_equities_2018
dj_equities_2019

If you know the name of the table, you can load it by typing jdbc load, table(“djdaily”).

If a table is large, it’s best to load the database columns you need to analyze rather than the entire table. Doing this saves memory. To list the columns of a table, use jdbc describe.

. jdbc describe djdaily
(28073 observations loaded)

Table: djdaily
----------------------------------------------------------------------------
Column name                                Column type
----------------------------------------------------------------------------
date                                       date
djc                                        float8
djct                                       float8
dji                                        float8
djit                                       float8
djt                                        float8
djtt                                       float8
dju                                        float8
djut                                       float8
date                                       date
djc                                        float8
djct                                       float8
dji                                        float8
djit                                       float8
djt                                        float8
djtt                                       float8
dju                                        float8
djut                                       float8

You can load specific columns of a table using a SELECT SQL statement.

. jdbc load, exec("SELECT date, djc, dji, djit FROM djdaily")
(28073 observations loaded)

Here I’ve shown you a few ways you can work with WRDS data using Stata 17’s new jdbc command. If you are working in another cloud environment or using a different database vendor, the steps to configure JDBC and Stata should be similar. In addition, jdbc has many other features I have not demonstrated. jdbc has other subcommands to help manage your database connections, insert data, and execute SQL statements. You can read about these subcommands in the [D] jdbc entry in the Stata Data Management Reference Manual.