Importing Twitter data into Stata
In the past, we’ve had users ask if Stata could import Twitter data. So we asked one of our interns, Dawson Deere (currently working on his computer science degree at Texas A&M University) to see if he could write a new command to do this. He used Stata 15’s improved Java plugins feature to write a new twitter2stata command. To install twitter2stata, type
ssc install twitter2stata, replace
Once installed, you can do the following
- Import tweets based on a search string
. twitter2stata searchtweets "search_string"
. twitter2stata searchusers "search_string"
. twitter2stata getuser "userId_or_userName"
. twitter2stata likes "userId_or_userName" . twitter2stata following "userId_or_userName" . twitter2stata followers "userId_or_userName" . twitter2stata lists "userId_or_userName"
. twitter2stata listusers "listId_or_listName" . twitter2stata listtweets "listId_or_listName"
The main purpose of this post is to show you how to get the twitter2stata command working in Stata. Below are the steps you must take.
-
To use this command, you must have a Twitter account. If you don’t have one, you can create one here.
-
Twitter limits the amount of data you can download. For the best rate limits, you must create a Twitter app. To do this, login to the Twitter website, and go to https://apps.twitter.com.
You should see
-
Click on the Create New App button. You will see
-
There are three fields you must fill in: Name, Description, and Website. The name of the application must be unique. You might have to try a few times to find a unique Twitter application name. Once you have filled in the form, click on Create your Twitter Application. Next, click on the Key and Access Tokens tab.
-
Click on the Create my access token button to generate your access token and access token secret. You will see
-
You will need to copy the
- Consumer Key (API Key)
- Consumer Secret (API Secret)
- Access Token
- Access Token Secret
and paste them into a do-file, for example,
local consumer_key "xWNlx*N9vESv0ZZBtGdm7fVB" local consumer_secret "7D25oVzWeDCHrUlQcp9929@GOcnqWCuUKhDel" local access_token "74741598400768-3hAYpZbiDvABPizx5lk57B8CTVyfa" local access_token_secret "7HjDf25oVzDWAeDCHrUlQcpfNGOTzcnqWCuUKhDel"
Be sure not to share these with anybody else.
In the same do-file, add the command
twitter2stata setaccess "`consumer_key'" "`consumer_secret'" /// "`access_token'" "`access_token_secret'"
to initialize these settings for twitter2stata. If you don’t use twitter2stata setaccess … before each twitter2stata session, you will recieve the error below.
. twitter2stata searchtweets "star wars", numtweets(10) access token and access token secret not set. Run twitter2stata setaccess to set your access token and access token secret. r(198);
My do-file is now
local consumer_key "xWNlx*N9vESv0ZZBtGdm7fVB" local consumer_secret "7D25oVzWeDCHrUlQcp9929@GOcnqWCuUKhDel" local access_token "74741598400768-3hAYpZbiDvABPizx5lk57B8CTVyfa" local access_token_secret "7HjDf25oVzDWAeDCHrUlQcpfNGOTzcnqWCuUKhDel" twitter2stata setaccess "`consumer_key'" "`consumer_secret'" /// "`access_token'" "`access_token_secret'" twitter2stata searchtweets "star wars", numtweets(10) list user_screen_name user_follower_count user_friend_count, abbreviate(20)
When I run the do-file, I get
. twitter2stata searchtweets "star wars", numtweets(10) (45 vars, 10 obs) . list user_screen_name user_follower_count user_friend_count, abbreviate(20) +------------------------------------------------------------+ | user_screen_name user_follower_count user_friend_count | |------------------------------------------------------------| 1. | 024AB 1297 1077 | 2. | StarWarsTime 2040 1213 | 3. | LockerGnome 24577 976 | 4. | CMG_HD 8 35 | 5. | dilnyminic1986 4 30 | |------------------------------------------------------------| 6. | StarWarsTime 2040 1213 | 7. | emimsohood1975 3 8 | 8. | Dan_NinjaRabbit 712 2252 | 9. | KatelynLunsford 13 38 | 10. | PudseyMac 335 444 | +------------------------------------------------------------+
Again, there are limits to the amount of data Twitter will let you import. These limits are subcommand specific and limit the number of calls you can make to Twitter’s REST API every 15 minutes. Click here, to see a chart of all the data rate limits.
You can read the full details of twitter2stata‘s functionality in its help file after installing it. Dawson was able to write this command using Stata 15’s improved Java API together with Java library Twitter4J. In a later post, I will discuss how he went about developing this command and show you how easy it is to write Java code for Stata.