Home > Programming > Working with Java plugins (Part 2)

Working with Java plugins (Part 2)

In my previous post, I talked about how to combine the Java library Twitter4J and Stata’s Java function Interface using Eclipse to create a helloWorld plugin. Now, I want to talk about how to call Twitter4j member functions to connect to Twitter REST API, return Twitter data, and load that data into Stata using the Stata SFI.

Adding twitter4J include files and globl

The current code is

package com.stata.kcrow;

import com.stata.sfi.*;

public class StTwitter {
        public static int HelloWorld(String args[]) {
                SFIToolkit.error("Hello World!");
                return(0);
        }
}

To use the twitter4J function calls, I need to add the following code to the top of our StTwitter.java file:

import twitter4j.*;
import twitter4j.conf.ConfigurationBuilder;

Also, I need to add a global Twitter class to our StTwitter class. My code now reads

package com.stata.kcrow;

import com.stata.sfi.*;

import twitter4j.*;
import twitter4j.conf.ConfigurationBuilder;

public class StTwitter {
        static Twitter twitter;

        public static int HelloWorld(String args[]) {
                SFIToolkit.error("Hello World!");
                return(0);
        }
}

Instance member function

Most website APIs use some form of OAuth for authentication with their servers. Twitter uses OAuth2. twitter2stata uses Twitter’s application-based authentication model, but there are other ways to connect. For this post, I will also use the application-based authentication model.

The first function I need to write is a function to authenticate to the Twitter website API. To do this, you need to get your Consumer Key (API Key), Consumer Secret (API Secret), Access Token, and Access Token Secret strings. If you don’t remember how to get these, see my previous blog post here.

Once you have these tokens, you can then write a simple function to pass this information using the class member function ConfigurationBuilder:

private static void getInstance() {
        ConfigurationBuilder    cb;
        TwitterFactory          tf;

        cb = new ConfigurationBuilder();

        cb.setDebugEnabled(true)
                .setOAuthConsumerKey(CONSUMER_KEY)
                .setOAuthConsumerSecret(CONSUMER_SECRET)
                .setOAuthAccessToken(ACCESS_TOKEN)
                .setOAuthAccessTokenSecret(ACCESS_TOKEN_SECRET);

        tf = new TwitterFactory(cb.build());
        twitter = tf.getInstance();
}

This function creates a ConfigurationBuilder instance, sets the login settings, creates a Twitter factory instance, and then initializes the global Twitter class instance. My class now reads

package com.stata.kcrow;

import com.stata.sfi.*;

import twitter4j.*;
import twitter4j.conf.ConfigurationBuilder;

public class StTwitter {
        static Twitter twitter ;

        private static void getInstance() {
                ConfigurationBuilder    cb;
                TwitterFactory          tf;

                cb = new ConfigurationBuilder();

                cb.setDebugEnabled(true)

.setOAuthConsumerKey("xWNlx*N9vESv0ZZBtGdm7fVB")
.setOAuthConsumerSecret("7D25oVzWeDCHrUlQcp9929@GOcnqWCuUKhDel")
.setOAuthAccessToken("74741598400768-3hAYpZbiDvABPizx5lk57B8CTVyfa")
.setOAuthAccessTokenSecret("7HjDf25oVzDWAeDCHrUlQcpfNGOTzcnqWCuUKhDel");

                tf = new TwitterFactory(cb.build());
                twitter = tf.getInstance();
        }

        public static int HelloWorld(String args[]) {
                SFIToolkit.error("Hello World!");
                return(0);
        }
}

I made this class private because this function does not need to be called from Stata.

Search member function

Now that I have written our instance function, I can write our search function. The Twitter4J functions I need are

I will use Query to set our search setttings, search() to fetch the search results, and getRateLimitStatus() to ensure I don’t go over the Twitter rate limits. Let’s write our function to fetch search results 100 at a time and hit the rate limits. Last, I will use getTweets() to fetch the Status objects (really tweet object). I can loop the Status objects returned by search() using a for and use a do/while loop to make sure I have results. I coded

public static int searchTweets(String[] args) {
        int                     rc, limit;
        String                  search_query;
        Query                   query;
        QueryResult             result;

        getInstance();          //Call static member function above

        search_query = args[0]; // Argument passed from Stata.

        query = new Query(search_query);
        query.setCount(100);
        result = null;

        do {
                result = twitter.search(query);
                limit = result.getRateLimitStatus().getRemaining();

                for (Status tweet_object: result.getTweets()) {
                        //process data
                }

        } while (limit > 0);

        return(rc);
}

In Java, it is recommended that you put code that could result in an error in the Try/Catch block. This makes error handling easier. You can use the TwitterException class to help with error handling. This class can generate more specific error messages, but I will not use it. For this example, I will issue a generic could not search tweets error for any TwitterException.

public static int searchTweets(String[] args) {
        int                     rc, limit;
        String                  search_query;
        Query                   query;
        QueryResult             result;

        getInstance();          //Call static member function above

        search_query = args[0]; // Argument passed from Stata.

        query = new Query(search_query);
        query.setCount(100);
        result = null;

        try {
                do {
                        result = twitter.search(query);
                        limit = result.getRateLimitStatus().getRemaining();

                        for (Status tweet_object: result.getTweets()) {
                                //process data
                        }
                } while (limit > 0);
        }
        catch (TwitterException te) {
                SFIToolkit.errorln("could not search tweets");
                rc = 606
        }
        return(rc);
}

So far, I have used two Stata SFI member functions:

Both of these functions display an error in the Stata Results windows. The only difference between the two is that SFIToolkit.errorln adds a line terminator at the end of the error. Now, let’s look at the Stata SFI Data class used to process the data returned from Twitter.

Write data into Stata

To process the data from Twitter, I must first add both variables and observations to Stata. To add observations to Stata, you use the SFI Data class function stObsTotal(). There are several functions to add variables, depending on the type.

I will use addVarStr() and addVarDouble() for this example. The tweet data Twitter returns is organized into two types of data: the tweet object and the user object. For this example, we are going to process a few of the metadata from both objects. From the tweet object I will process:

  • text
  • retweet_count
  • favorite_count

From the user object, I will process:

  • screen_name
  • followers_count
  • friends_count

In our searchTweets function, I need to add the following code to create our variables and add observations to Stata:

public static int searchTweets(String[] args) {
        int                     rc, limit;
        String                  search_query;
        Query                   query;
        QueryResult             result;
        long                    obs ;

        getInstance();          //Call static member function above

        search_query = args[0]; // Argument passed from Stata.

        query = new Query(search_query);
        query.setCount(100);
        result = null;

        //Create variables
        rc = Data.addVarStr("text", 240);
        if (rc!=0) return(rc);
        rc = Data.addVarDouble("retweet_count");
        if (rc!=0) return(rc);
        rc = Data.addVarDouble("favorite_count");
        if (rc!=0) return(rc);
        rc = Data.addVarStr("screen_name", 30);
        if (rc!=0) return(rc);
        rc = Data.addVarDouble("followers_count");
        if (rc!=0) return(rc);
        rc = Data.addVarDouble("friends_count");
        if (rc!=0) return(rc);

        obs = 0;
        try {

                do {
                        result = twitter.search(query);
                        limit = result.getRateLimitStatus().getRemaining();

                        for (Status tweet_object: result.getTweets()) {
                                //Add observations
                                obs++;
                                rc = Data.setObsTotal(obs);
                                if (rc!=0) return(rc);
                                //process data
                        }
                } while (limit > 0);
        }
        catch (TwitterException te) {
                SFIToolkit.errorln("could not search tweets");
                rc = 606;
        }
        return(rc);
}

In our searchTweets member function, I need to write a private member function that copies the results returned from Twitter’s objects into Stata’s memory. I added the below call inside the for loop.

...
        for (Status tweet_object: result.getTweets()) {
                //Add observations
                obs++;
                rc = Data.setObsTotal(obs);
                if (rc!=0) return(rc);
                //process data
                rc = processData(obs, tweet_object,
                        tweet_object.getUser());
                if (rc!=0) return(rc);
        }

...

The function is

private static int processData(long obs, Status tweet_object, User user_object){
        int                     rc;

        rc = Data.storeStr(1, obs, tweet_object.getText());
        if(rc) return(rc);
        rc = Data.storeNum(2, obs, tweet_object.getRetweetCount());
        if(rc) return(rc);
        rc = Data.storeNum(3, obs, tweet_object.getFavoriteCount());
        if(rc) return(rc);

        rc = Data.storeStr(4, obs, user_object.getName());
        if(rc) return(rc);
        rc = Data.storeNum(5, obs, user_object.getFollowersCount());
        if(rc) return(rc);
        rc = Data.storeNum(6, obs, user_object.getFriendsCount());
        if(rc) return(rc);

        return(rc);
}

Note that the first argument of the the Data.store* is just the variable index order in the current datasets in memory. You can look at each function call to get the metadata for the tweet object here and for the user object here.

Final class

Our final code to the StTwitter class is

package com.stata.kcrow;


import com.stata.sfi.*;

import twitter4j.*;
import twitter4j.conf.ConfigurationBuilder;

public class StTwitter {
        static Twitter twitter;

        private static void getInstance() {
                ConfigurationBuilder    cb;
                TwitterFactory          tf;

                cb = new ConfigurationBuilder();

                cb.setDebugEnabled(true)

.setOAuthConsumerKey("xWNlx*N9vESv0ZZBtGdm7fVB")
.setOAuthConsumerSecret("7D25oVzWeDCHrUlQcp9929@GOcnqWCuUKhDel")
.setOAuthAccessToken("74741598400768-3hAYpZbiDvABPizx5lk57B8CTVyfa")
.setOAuthAccessTokenSecret("7HjDf25oVzDWAeDCHrUlQcpfNGOTzcnqWCuUKhDel");

                tf = new TwitterFactory(cb.build());
                twitter = tf.getInstance();
        }
        public static int searchTweets(String[] args) {
                int                     rc, limit;
                String                  search_query;
                Query                   query;
                QueryResult             result;
                long                    obs;

                getInstance();          //Call static member function above

                search_query = args[0]; // Argument passed from Stata.

                query = new Query(search_query);
                query.setCount(100);
                result = null;
                //Create variables
                rc = Data.addVarStr("text", 240);
                if (rc!=0) return(rc);
                rc = Data.addVarDouble("retweet_count");
                if (rc!=0) return(rc);
                rc = Data.addVarDouble("favorite_count");
                if (rc!=0) return(rc);
                rc = Data.addVarStr("screen_name", 30);
                if (rc!=0) return(rc);
                rc = Data.addVarDouble("followers_count");
                if (rc!=0) return(rc);
                rc = Data.addVarDouble("friends_count");
                if (rc!=0) return(rc);

                obs = 0 ;
                try {

                        do {
                             result = twitter.search(query);
                             limit = result.getRateLimitStatus().getRemaining();

                           for (Status tweet_object: result.getTweets()) {
                                   //Add observations
                                   obs++;
                                   rc = Data.setObsTotal(obs);
                                   if (rc!=0) return(rc);
                                   //process data
                                   rc = processData(obs, tweet_object,
                                          tweet_object.getUser());
                                   if (rc!=0) return(rc);
                           }
                        } while (limit > 0);
                }
                catch (TwitterException te) {
                        if (!SFIToolkit.errorDebug(SFIToolkit.
                                stackTraceToString(te)+"\n")) {
                                SFIToolkit.errorln("could not search tweets");
                        }
                        rc = 606 ;
                }
                return(rc);
        }

private static int processData(long obs, Status tweet_object, User user_object){
                int                     rc;

                rc = Data.storeStr(1, obs, tweet_object.getText());
                if (rc!=0) return(rc);
                rc = Data.storeNum(2, obs, tweet_object.getRetweetCount());
                if (rc!=0) return(rc);
                rc = Data.storeNum(3, obs, tweet_object.getFavoriteCount());
                if (rc!=0) return(rc);

                rc = Data.storeStr(4, obs, user_object.getName());
                if (rc!=0) return(rc);
                rc = Data.storeNum(5, obs, user_object.getFollowersCount());
                if (rc!=0) return(rc);
                rc = Data.storeNum(6, obs, user_object.getFriendsCount());
                if (rc!=0) return(rc);

                return(rc);
        }

        public static int HelloWorld(String args[]) {
                SFIToolkit.error("Hello World!");
                return(0);
        }
}

Bundling and redistributing the JAR file

There are two main ways to make the StTwitter class work in Stata.

  1. Copy the Twitter4J .jar files to somewhere along your adopath. You can then use the jar() option for javacall to specifiy which files to use. For example,

    . which twitter4j-core-4.0.4.jar
    c:\ado\personal\twitter4j-core-4.0.4.jar
    
    . javacall com.stata.kcrow.StTwitter searchTweets, args("star wars") ///
    jars(test_twitter.jar;twitter4j-core-4.0.4.jar)
    

    I recommend this method if you are developing a Java library for your own use.

  2. Export the project as a Runnable JAR file

    If you are developing a Java library for redistribution to somewhere like the Statistical Software Components archive, you may want to combine all .jar files into one .jar file.

    graph1

    Click the Next button, and type the path/file where you would like the .jar file saved.

    graph1

    Last, click the Finish botton.

    Also, make sure you have the correct software license type to re-distribute any library .jar file.

Parsing in Stata

With our StTwitter class coded and properly placed, I can now add parsing to the ado program:

program define twitter_test
        version 15
        args search_string junk
        if ("`junk'" != "") {
                display as error "invalid syntax"
                exit 198
        }

        javacall com.stata.kcrow.StTwitter searchTweets,                ///
                args(`"`search_string'"')                               ///
                jars(test_twitter.jar;twitter4j-core-4.0.4.jar)
end

Save the above file as twitter_test.ado along your adopath and, in Stata, type

. twitter_test "star wars"

.describe

Contains data
  obs:        18,000
 vars:             6
 size:     5,436,000
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
text            str240  %240s
retweet_count   double  %10.0g
favorite_count  double  %10.0g
screen_name     str30   %30s
followers_count double  %10.0g
friends_count   double  %10.0g
-------------------------------------------------------------------------------
Sorted by:
     Note: Dataset has changed since last saved.

Conclusion

As you can see, it did not take much code to connect to Twitter’s API, return tweet data, and load that data into Stata. Twitter does sale enterprise licenses for their data, which have no limits on the amount of data you can download. You can also fetch data as far back as 2006. There is a different API for this data. The Twitter4J library supports this API as well, but the twitter2stata command does not.

Categories: Programming Tags: , ,