Working with Java plugins (Part 2)
In my previous post, I talked about how to combine the Java library Twitter4J and Stata’s Java function Interface using Eclipse to create a helloWorld plugin. Now, I want to talk about how to call Twitter4j member functions to connect to Twitter REST API, return Twitter data, and load that data into Stata using the Stata SFI.
Adding twitter4J include files and globl
The current code is
package com.stata.kcrow; import com.stata.sfi.*; public class StTwitter { public static int HelloWorld(String args[]) { SFIToolkit.error("Hello World!"); return(0); } }
To use the twitter4J function calls, I need to add the following code to the top of our StTwitter.java
file:
import twitter4j.*; import twitter4j.conf.ConfigurationBuilder;
Also, I need to add a global Twitter class to our StTwitter class. My code now reads
package com.stata.kcrow; import com.stata.sfi.*; import twitter4j.*; import twitter4j.conf.ConfigurationBuilder; public class StTwitter { static Twitter twitter; public static int HelloWorld(String args[]) { SFIToolkit.error("Hello World!"); return(0); } }
Instance member function
Most website APIs use some form of OAuth for authentication with their servers. Twitter uses OAuth2. twitter2stata uses Twitter’s application-based authentication model, but there are other ways to connect. For this post, I will also use the application-based authentication model.
The first function I need to write is a function to authenticate to the Twitter website API. To do this, you need to get your Consumer Key (API Key), Consumer Secret (API Secret), Access Token, and Access Token Secret strings. If you don’t remember how to get these, see my previous blog post here.
Once you have these tokens, you can then write a simple function to pass this information using the class member function ConfigurationBuilder:
private static void getInstance() { ConfigurationBuilder cb; TwitterFactory tf; cb = new ConfigurationBuilder(); cb.setDebugEnabled(true) .setOAuthConsumerKey(CONSUMER_KEY) .setOAuthConsumerSecret(CONSUMER_SECRET) .setOAuthAccessToken(ACCESS_TOKEN) .setOAuthAccessTokenSecret(ACCESS_TOKEN_SECRET); tf = new TwitterFactory(cb.build()); twitter = tf.getInstance(); }
This function creates a ConfigurationBuilder instance, sets the login settings, creates a Twitter factory instance, and then initializes the global Twitter class instance. My class now reads
package com.stata.kcrow; import com.stata.sfi.*; import twitter4j.*; import twitter4j.conf.ConfigurationBuilder; public class StTwitter { static Twitter twitter ; private static void getInstance() { ConfigurationBuilder cb; TwitterFactory tf; cb = new ConfigurationBuilder(); cb.setDebugEnabled(true) .setOAuthConsumerKey("xWNlx*N9vESv0ZZBtGdm7fVB") .setOAuthConsumerSecret("7D25oVzWeDCHrUlQcp9929@GOcnqWCuUKhDel") .setOAuthAccessToken("74741598400768-3hAYpZbiDvABPizx5lk57B8CTVyfa") .setOAuthAccessTokenSecret("7HjDf25oVzDWAeDCHrUlQcpfNGOTzcnqWCuUKhDel"); tf = new TwitterFactory(cb.build()); twitter = tf.getInstance(); } public static int HelloWorld(String args[]) { SFIToolkit.error("Hello World!"); return(0); } }
I made this class private because this function does not need to be called from Stata.
Search member function
Now that I have written our instance function, I can write our search function. The Twitter4J functions I need are
I will use Query to set our search setttings, search() to fetch the search results, and getRateLimitStatus() to ensure I don’t go over the Twitter rate limits. Let’s write our function to fetch search results 100 at a time and hit the rate limits. Last, I will use getTweets() to fetch the Status objects (really tweet object). I can loop the Status objects returned by search() using a for and use a do/while loop to make sure I have results. I coded
public static int searchTweets(String[] args) { int rc, limit; String search_query; Query query; QueryResult result; getInstance(); //Call static member function above search_query = args[0]; // Argument passed from Stata. query = new Query(search_query); query.setCount(100); result = null; do { result = twitter.search(query); limit = result.getRateLimitStatus().getRemaining(); for (Status tweet_object: result.getTweets()) { //process data } } while (limit > 0); return(rc); }
In Java, it is recommended that you put code that could result in an error in the Try/Catch block. This makes error handling easier. You can use the TwitterException class to help with error handling. This class can generate more specific error messages, but I will not use it. For this example, I will issue a generic could not search tweets error for any TwitterException.
public static int searchTweets(String[] args) { int rc, limit; String search_query; Query query; QueryResult result; getInstance(); //Call static member function above search_query = args[0]; // Argument passed from Stata. query = new Query(search_query); query.setCount(100); result = null; try { do { result = twitter.search(query); limit = result.getRateLimitStatus().getRemaining(); for (Status tweet_object: result.getTweets()) { //process data } } while (limit > 0); } catch (TwitterException te) { SFIToolkit.errorln("could not search tweets"); rc = 606 } return(rc); }
So far, I have used two Stata SFI member functions:
Both of these functions display an error in the Stata Results windows. The only difference between the two is that SFIToolkit.errorln adds a line terminator at the end of the error. Now, let’s look at the Stata SFI Data class used to process the data returned from Twitter.
Write data into Stata
To process the data from Twitter, I must first add both variables and observations to Stata. To add observations to Stata, you use the SFI Data class function stObsTotal(). There are several functions to add variables, depending on the type.
I will use addVarStr() and addVarDouble() for this example. The tweet data Twitter returns is organized into two types of data: the tweet object and the user object. For this example, we are going to process a few of the metadata from both objects. From the tweet object I will process:
- text
- retweet_count
- favorite_count
From the user object, I will process:
- screen_name
- followers_count
- friends_count
In our searchTweets function, I need to add the following code to create our variables and add observations to Stata:
public static int searchTweets(String[] args) { int rc, limit; String search_query; Query query; QueryResult result; long obs ; getInstance(); //Call static member function above search_query = args[0]; // Argument passed from Stata. query = new Query(search_query); query.setCount(100); result = null; //Create variables rc = Data.addVarStr("text", 240); if (rc!=0) return(rc); rc = Data.addVarDouble("retweet_count"); if (rc!=0) return(rc); rc = Data.addVarDouble("favorite_count"); if (rc!=0) return(rc); rc = Data.addVarStr("screen_name", 30); if (rc!=0) return(rc); rc = Data.addVarDouble("followers_count"); if (rc!=0) return(rc); rc = Data.addVarDouble("friends_count"); if (rc!=0) return(rc); obs = 0; try { do { result = twitter.search(query); limit = result.getRateLimitStatus().getRemaining(); for (Status tweet_object: result.getTweets()) { //Add observations obs++; rc = Data.setObsTotal(obs); if (rc!=0) return(rc); //process data } } while (limit > 0); } catch (TwitterException te) { SFIToolkit.errorln("could not search tweets"); rc = 606; } return(rc); }
In our searchTweets member function, I need to write a private member function that copies the results returned from Twitter’s objects into Stata’s memory. I added the below call inside the for loop.
... for (Status tweet_object: result.getTweets()) { //Add observations obs++; rc = Data.setObsTotal(obs); if (rc!=0) return(rc); //process data rc = processData(obs, tweet_object, tweet_object.getUser()); if (rc!=0) return(rc); } ...
The function is
private static int processData(long obs, Status tweet_object, User user_object){ int rc; rc = Data.storeStr(1, obs, tweet_object.getText()); if(rc) return(rc); rc = Data.storeNum(2, obs, tweet_object.getRetweetCount()); if(rc) return(rc); rc = Data.storeNum(3, obs, tweet_object.getFavoriteCount()); if(rc) return(rc); rc = Data.storeStr(4, obs, user_object.getName()); if(rc) return(rc); rc = Data.storeNum(5, obs, user_object.getFollowersCount()); if(rc) return(rc); rc = Data.storeNum(6, obs, user_object.getFriendsCount()); if(rc) return(rc); return(rc); }
Note that the first argument of the the Data.store* is just the variable index order in the current datasets in memory. You can look at each function call to get the metadata for the tweet object here and for the user object here.
Final class
Our final code to the StTwitter class is
package com.stata.kcrow; import com.stata.sfi.*; import twitter4j.*; import twitter4j.conf.ConfigurationBuilder; public class StTwitter { static Twitter twitter; private static void getInstance() { ConfigurationBuilder cb; TwitterFactory tf; cb = new ConfigurationBuilder(); cb.setDebugEnabled(true) .setOAuthConsumerKey("xWNlx*N9vESv0ZZBtGdm7fVB") .setOAuthConsumerSecret("7D25oVzWeDCHrUlQcp9929@GOcnqWCuUKhDel") .setOAuthAccessToken("74741598400768-3hAYpZbiDvABPizx5lk57B8CTVyfa") .setOAuthAccessTokenSecret("7HjDf25oVzDWAeDCHrUlQcpfNGOTzcnqWCuUKhDel"); tf = new TwitterFactory(cb.build()); twitter = tf.getInstance(); } public static int searchTweets(String[] args) { int rc, limit; String search_query; Query query; QueryResult result; long obs; getInstance(); //Call static member function above search_query = args[0]; // Argument passed from Stata. query = new Query(search_query); query.setCount(100); result = null; //Create variables rc = Data.addVarStr("text", 240); if (rc!=0) return(rc); rc = Data.addVarDouble("retweet_count"); if (rc!=0) return(rc); rc = Data.addVarDouble("favorite_count"); if (rc!=0) return(rc); rc = Data.addVarStr("screen_name", 30); if (rc!=0) return(rc); rc = Data.addVarDouble("followers_count"); if (rc!=0) return(rc); rc = Data.addVarDouble("friends_count"); if (rc!=0) return(rc); obs = 0 ; try { do { result = twitter.search(query); limit = result.getRateLimitStatus().getRemaining(); for (Status tweet_object: result.getTweets()) { //Add observations obs++; rc = Data.setObsTotal(obs); if (rc!=0) return(rc); //process data rc = processData(obs, tweet_object, tweet_object.getUser()); if (rc!=0) return(rc); } } while (limit > 0); } catch (TwitterException te) { if (!SFIToolkit.errorDebug(SFIToolkit. stackTraceToString(te)+"\n")) { SFIToolkit.errorln("could not search tweets"); } rc = 606 ; } return(rc); } private static int processData(long obs, Status tweet_object, User user_object){ int rc; rc = Data.storeStr(1, obs, tweet_object.getText()); if (rc!=0) return(rc); rc = Data.storeNum(2, obs, tweet_object.getRetweetCount()); if (rc!=0) return(rc); rc = Data.storeNum(3, obs, tweet_object.getFavoriteCount()); if (rc!=0) return(rc); rc = Data.storeStr(4, obs, user_object.getName()); if (rc!=0) return(rc); rc = Data.storeNum(5, obs, user_object.getFollowersCount()); if (rc!=0) return(rc); rc = Data.storeNum(6, obs, user_object.getFriendsCount()); if (rc!=0) return(rc); return(rc); } public static int HelloWorld(String args[]) { SFIToolkit.error("Hello World!"); return(0); } }
Bundling and redistributing the JAR file
There are two main ways to make the StTwitter class work in Stata.
-
Copy the Twitter4J .jar files to somewhere along your adopath. You can then use the jar() option for javacall to specifiy which files to use. For example,
. which twitter4j-core-4.0.4.jar c:\ado\personal\twitter4j-core-4.0.4.jar . javacall com.stata.kcrow.StTwitter searchTweets, args("star wars") /// jars(test_twitter.jar;twitter4j-core-4.0.4.jar)
I recommend this method if you are developing a Java library for your own use.
-
Export the project as a Runnable JAR file
If you are developing a Java library for redistribution to somewhere like the Statistical Software Components archive, you may want to combine all .jar files into one .jar file.
Click the Next button, and type the
path/file
where you would like the .jar file saved.Last, click the Finish botton.
Also, make sure you have the correct software license type to re-distribute any library .jar file.
Parsing in Stata
With our StTwitter class coded and properly placed, I can now add parsing to the ado program:
program define twitter_test version 15 args search_string junk if ("`junk'" != "") { display as error "invalid syntax" exit 198 } javacall com.stata.kcrow.StTwitter searchTweets, /// args(`"`search_string'"') /// jars(test_twitter.jar;twitter4j-core-4.0.4.jar) end
Save the above file as twitter_test.ado along your adopath and, in Stata, type
. twitter_test "star wars" .describe Contains data obs: 18,000 vars: 6 size: 5,436,000 ------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- text str240 %240s retweet_count double %10.0g favorite_count double %10.0g screen_name str30 %30s followers_count double %10.0g friends_count double %10.0g ------------------------------------------------------------------------------- Sorted by: Note: Dataset has changed since last saved.
Conclusion
As you can see, it did not take much code to connect to Twitter’s API, return tweet data, and load that data into Stata. Twitter does sale enterprise licenses for their data, which have no limits on the amount of data you can download. You can also fetch data as far back as 2006. There is a different API for this data. The Twitter4J library supports this API as well, but the twitter2stata command does not.