Programming an estimation command in Stata: Writing a C++ plugin

Home > Programming > Programming an estimation command in Stata: Writing a C++ plugin

Programming an estimation command in Stata: Writing a C++ plugin

22 February 2018 David M. Drukker, Executive Director of Econometrics Go to comments

This post is the third in a series that illustrates how to plug code written in another language (like C, C++, or Java) into Stata. This technique is known as writing a plugin or as writing a dynamic-link library (DLL) for Stata.

In this post, I write a plugin in C++ that implements the calculations performed by mymean_work() in mymean11.ado, discussed in Programming an estimation command in Stata: Preparing to write a plugin. I assume that you are familiar with the material in that post.

This post is analogous to Programming an estimation command in Stata: Writing a C plugin. The differences are due to the plugin code being in C++ instead of C. I do not assume that you are familiar with the material in that post, and you will find much of it repeated here.

This is the 31st post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Writing a hello-world C++ plugin

Before I do any computations, I illustrate how to write and compile a C++ plugin that communicates with Stata. Code block 1 contains the code for myhellocpp.ado that calls the C++ plugin hellocpp, which just displays “Hello from C++” in Stata.

Code block 1: myhellocpp.ado

*! version 1.0.0  13Feb2018
program define myhellocpp

    version 15.1

    plugin call hellocpp

end

program hellocpp, plugin

Line 6 executes the plugin whose handle is hellocpp. Line 10 loads the plugin implemented in hellocpp.plugin into the handle hellocpp. That the execute statement comes before the load statement seems odd at first. Stata ado-files are read in their entirety, and each ado-program, Mata function, or plugin handle is loaded before the lines of the main ado-program are executed. So line 10 is actually executed before line 6.

The name of the handle for the plugin, hellocpp in this case, must differ from the name of the main ado-program, myhellocpp in this case, and from any other ado-program defined in this .ado file.

The code for hello.cpp is in code block 2.

Code block 2: hello.cpp

// version 1.0.0 14Feb2018
#include "stplugin.h"
#include <string.h>
#include <string>

STDLL stata_call(int argc, char *argv[])
{
    char  msg[81] ;

    std::string mystring = "Hello from C++\n";
    strcpy(msg, mystring.c_str()) ;
    SF_display(msg) ;

    return((ST_retcode) 0) ;
}

Line 2 includes the Stata plugin header file stplugin.h. Line 6 is the standard declaration for the entry function for a C++ plugin for Stata. You should copy it. Inside stata_call(), argc will contain the number of arguments passed to the plugin, and string vector argv will contain the arguments themselves.

Line 8 declares and allocates space for the C++ character array msg. Using a character array might seem odd, because C++ programs generally use C++ strings to manipulate strings. We will need character arrays because functions in the Stata program interface (SPI) that accept “string” arguments use C/C++ character arrays.

For example, on line 12, the SPI function SF_display(), which makes Stata display “Hello from C++”, accepted the C++ character array msg. SF_display() would not accept the C++ string mystring, created on line 10. For this reason, I copied what C++ mystring contains into the character array msg on line 11.

Line 14 returns zero as the return code. Note that I casted the literal 0 to be the expected type ST_retcode.

I now discuss how to create the plugin hellocpp.plugin from hello.cpp. In the directory that contains myhellocpp.ado and hello.cpp, I also have stplugin.cpp. stplugin.cpp is a copy of stplugin.c. I had to copy stplugin.c to stplugin.cpp because some C++ compilers do not treat .c files as C++ source files. stplugin.c defines a function needed to make the stata_call() function available to Stata.

Do not change the contents of stplugin.h or stplugin.c. In fact, you do not even need to look at them.

On my OS X Mac that has the command-line developer tools installed, I use g++ to create hellocpp.plugin from stplugin.cpp and hello.cpp by typing

g++ -bundle -DSYSTEM=APPLEMAC stplugin.cpp hello.cpp -o hellocpp.plugin

The above g++ command compiles the two .cpp files and links them to create the DLL hellocpp.plugin, which myhellocpp.ado can call.

In the appendix to this post, I provide instructions for creating hellocpp.plugin on other platforms. https://www.stata.com/plugins/ provides complete documentation for writing and compiling C++ plugins.

Having created hellocpp.plugin, I can execute myhellocpp in Stata.

Example 2: myhellocpp

. myhellocpp
Hello from C++

For simplicity, I have stplugin.h, stplugin.cpp, hello.cpp, myhellocpp.ado, and hellocpp.plugin in the same directory. For larger projects, I would put the .ado and .plugin files in directories on Stata’s ADOPATH and use my compiler’s environment to manage where I put my header and C++ source files. For the examples in this post, I put all my .ado files, header files, C++ source files, and created .plugin files into a single directory.

Getting access to the Stata data in your plugin

hellocpp.plugin makes Stata display something created inside the plugin. The next step is giving the plugin access to the data in Stata. To illustrate this process, I discuss mylistcpp.ado, which uses a plugin to list out observations of the specified variables.

Let’s look at the ado-code first.

Code block 3: mylistcpp.ado

*! version 1.0.0  13Feb2018
program define mylistcpp, eclass

        version 15.1

        syntax varlist(numeric max=3) [if] [in]
        marksample touse

        display "Variables listed:  `varlist'"
        plugin call mylistwcpp `varlist' if `touse' `in'

end

program mylistwcpp, plugin

In line 6, syntax creates three local macros. It puts the variables specified by the user into the local macro varlist. It puts any if condition specified by the user into the local macro if. It puts any in range specified by the user into the local macro in. I specified max=3 to syntax to limit the number of variables to 3. This limitation is silly, and I would not need it for an example Stata/Mata program, but it simplifies the example C++ plugin.

In line 7, marksample creates a sample-inclusion variable and puts its name in the local macro touse. The sample-inclusion variable is zero for each excluded observation, and it is one for each included observation. marksample uses the variables in the local macro varlist, the if condition in the local macro if, and the range in the local macro in to create the sample-inclusion variable. (All three local macros were created by syntax.) An observation is excluded if any of the variables in the local macro varlist contain a missing value, if it was excluded by the condition in the local macro if, or if it was excluded by the range in the local macro in. The sample-inclusion variable is one for observations that were not excluded.

In line 9, I further simplified the C++ plugin by displaying the names of the variables whose values are listed out by the plugin.

In line 10, plugin calls mylistwcpp.plugin. Because `varlist’ is specified, the SPI function SF_vdata() will be able to access the variables contained in the local macro varlist. Because if `touse’ is specified, the SPI function SF_ifobs() will return zero if the sample-inclusion variable in touse is zero, and the function will return one if the sample-inclusion variable is one. Because `in’ is specified, the SPI functions SF_in1() and SF_in2() respectively return the first and last observations in any user-specified in range.

Specifying `in’ is not necessary to identify the sample specified by the user, because if `touse’ already specifies this sample-inclusion information. However, specifying `in’ can dramatically reduce the range of observations in the loop over the data, thereby speeding up the code.

In a directory that contains mylistw.cpp, stplugin.cpp, and stplugin.h, I created mylistwcpp.plugin on my Mac by typing

g++ -bundle -DSYSTEM=APPLEMAC stplugin.cpp mylistw.cpp -o mylistwcpp.plugin

Code block 4: mylistw.cpp

// version 1.0.0  14Feb2018
#include "stplugin.h"
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <string.h>

STDLL stata_call(int argc, char *argv[])
{
    ST_int       first, last, nVars, i, j, nObs  ;
    ST_double    value ;
    ST_retcode   rc ;
    int          nchar ;
    char         line[81], msg[81], svalue[26] ;
    std::string  line2 ;

    rc    = (ST_retcode) 0 ;
// Put the first observation in sample into first
    first = SF_in1 ();
// Put the last observation in sample into last
    last = SF_in2 ();
// Put number of variables in varlist passed in to plugin into nVars
    nVars  = SF_nvars() ;
// Initiate number of observations counter to 0
    nObs  = 0 ;

// Loop over observations
    for(i=first; i<=last; i++) {
        line2 = "";
// Only display observations for which if restriction is true
        if (!SF_ifobs(i)) {
            continue ;
        }
// Increment number of observations counter
        ++nObs ;
// Loop over variables
        for(j=1; j<=nVars; j++) {
// Put value of observation i on variable j into value 
            rc = SF_vdata(j, i, &value);
// Return with error if problem getting value
            if(rc>0) {
                 sprintf(msg, "Problem accessing Stata data\n") ;
                 SF_error(msg) ;
                 return(rc) ;
            }
// Return with error if missing value
            if (SF_is_missing(value)) {
                 sprintf(msg, "missing values encountered\n") ;
                 SF_error(msg) ;
                 return( (ST_retcode) 416 ) ;
            }
            snprintf(svalue,25,"%f   ",value) ;
            line2 = line2 + svalue ;
        }
// Display line or error message in Stata
        if (line2.length()<=80) {
            line2 = line2 + "\n" ;
            strcpy(line, line2.c_str()) ;
            SF_display(line) ;
        }
        else {
             sprintf(msg, "More than 80 bytes in line\n") ;
             SF_error(msg) ;
             return( (ST_retcode) 498 ) ;
        }
    }
    sprintf(line, "First observation was             %d\n", first) ;
    SF_display(line) ;
    sprintf(line, "Last observation was              %d\n", last) ;
    SF_display(line) ;
    sprintf(line, "Number of observations listed was %d\n", nObs) ;
    SF_display(line) ;

    return(rc) ;
}

If you are reading this post, you can read standard C++. I discuss how mylistw.cpp illustrates the structure of C++ plugins for Stata, and I explain the types and the functions defined by the Stata plugin interface (SPI) used in the code. Complete details about the SPI are available at https://www.stata.com/plugins/.

mylistw.cpp returns zero to Stata if all went well, and it returns a nonzero error code if something went wrong. Every time I call a function in mylistw.cpp that could fail, I check its return code. If that function failed, I make Stata display an error message, and I return a nonzero error code to Stata. This logic provides the overall structure to mylisw.cpp. Most of the code deals with error conditions or takes care not to put more characters into a string buffer than it can hold.

C++ plugins read from or write to Stata objects using functions defined in the SPI. mylistw.cpp does not return any results, so it has a simple structure.

It uses SPI functions to read from the specified sample of the data in Stata.
It uses standard C++ and SPI functions to list observations for the specified, sample and it keeps a counter of how many observations are in the specified sample.
It uses standard C++ and SPI functions to display which was the first observation in the sample, which was the last observation in the sample, and how many observations were in the specified sample.

Now, I discuss specific parts of mylistw.cpp.

In lines 10–12, I use the SPI defined types ST_int, ST_double, and ST_retcode for variables that the SPI functions return or that are arguments to the Stata plugin interface functions. Using these defined types is essential, because their mappings to primitive C++ types vary over time.

rc holds the return code that the plugin will return to Stata. In line 17, I initialize rc to zero. If an SPI function that might fail does what was requested, it returns a return code of zero. If an SPI function cannot do what was requested, it returns a nonzero return code. Each time I call an SPI function that might fail, I store the code it returns in rc. If rc is not zero, I make Stata display an error message and make the plugin return the nonzero value stored in rc.

Lines 19, 21, and 23 use SPI functions. SF_in1() puts the first observation specified by an in range into first. SF_in2() puts the last observation specified by an in range into last. If an in range was not specified to plugin, first will contain one, and last will contain the number of observations in the dataset. SF_nvars() puts the number of variables specified in the varlist into nVars.

Lines 31–33 ensure that we skip over observations that were excluded by the if restriction specified to plugin in line 10 of mylistcpp.ado. To illustrate some details, consider example 2.

Example 2: mylistcpp

. sysuse auto, clear
(1978 Automobile Data)

. mylistcpp mpg trunk rep78 if trunk < 21 in 2/10
Variables listed:  mpg trunk rep78
17.000000   11.000000   3.000000
20.000000   16.000000   3.000000
15.000000   20.000000   4.000000
20.000000   16.000000   3.000000
16.000000   17.000000   3.000000
19.000000   13.000000   3.000000
First observation was             2
Last observation was              10
Number of observations listed was 6

In line 31, SF_ifobs(i) returns one when the if restriction specified to plugin is one for observation i, and zero otherwise. In line 10 of mylistcpp.ado, we see that the if restriction passed to plugin is if `touse'. As discussed above, the sample-inclusion variable in the local macro touse is zero for excluded observations, and it is one for the included observations.

The in range on line 10 of mylistcpp.ado was included so that the loop over the observations in line 28 of mylistw.cpp would only go from the beginning to the end of any specified in range. In example 2, instead of looping over all 74 observations in the auto dataset, the loop on line 28 of mylistw.cpp only goes from 2 to 10.

In example 2, the sample-inclusion variable is 1 for 6 observations, and it is 0 for the other 68 observations. The in 2/10 range excludes observation 1 and the observations from 11–74. Of the first 10 observations, 2 are excluded because rep78 is missing. One observation is excluded because trunk is 21.

For comparison, all 9 observations between 2 and 10 are listed in example 3.

Example 3: list

. list mpg trunk rep78 in 2/10

     +---------------------+
     | mpg   trunk   rep78 |
     |---------------------|
  2. |  17      11       3 |
  3. |  22      12       . |
  4. |  20      16       3 |
  5. |  15      20       4 |
  6. |  18      21       3 |
     |---------------------|
  7. |  26      10       . |
  8. |  20      16       3 |
  9. |  16      17       3 |
 10. |  19      13       3 |
     +---------------------+

Returning to line 39 of mylistw.c, rc_st = SF_vdata(j, i, &value) puts the value of observation i on variable j into value, and it puts the code returned by SF_vdata() into rc. If all goes well, rc contains 0, and the error block in lines 42–44 is not entered. If SF_vdata() cannot store the data into value, the error block in lines 42--44 is entered. The error block makes Stata display an error message, and it causes mylistw.plugin to exit with the error code that rc contains. In the error block, SF_error() makes Stata display the contents of a C++ character array in red.

SF_vdata() can only access variables of one of the numerical Stata data types (byte, int, long, float, or double). (Use SF_sdata() for string data.) Regardless of which Stata numerical type the variable is, SF_vdata() stores the result as an ST_double. In example 2, mpg, trunk, rep78 are all of type int in Stata, but each was stored into value as an ST_double.

In line 47, SF_is_missing(value) returns 1 if value is a missing value, and it returns 0 otherwise. Lines 47–51 cause mlistw.plugin to exit with error 416 if any observation in one of the variables contains a missing value. These lines are redundant, because the sample-inclusion variable passed to mylistw.plugin excluded observations containing missing values. I included these lines to illustrate how I would safely exclude missing values from inside the plugin and to reiterate that C++ code must carefully deal with missing values. Stata missing values are valid double precision numbers in C++. You will get wrong results if you include Stata missing values in calculations.

The remaining lines construct the C++ character array line that is passed to Stata to display each observation, and they display the summary information about the sample.

Estimating the mean in a C++ plugin

I now discuss the ado-command mymeancpp that uses mycalcscpp.plugin to implement the calculations performed by mymean_work() in mymean11.ado, discussed in Programming an estimation command in Stata: Preparing to write a plugin.

The code for mymeancpp is in mymeancpp.ado, which is in code block 5.

Code block 5: mymeancpp.ado

*! version 1.0.0  13Feb2018
program define mymeancpp, eclass

        version 15.1

        syntax varlist(numeric) [if] [in]
        marksample touse
        tempname b V N

        local k : word count `varlist'
        matrix `b' = J(1, `k', .)
        matrix `V' = J(`k', `k', .)

        plugin call mycalcscpp `varlist' if `touse' `in', `b' `V' `N'

        matrix colnames `b'  = `varlist'
        matrix colnames `V'  = `varlist'
        matrix rownames `V'  = `varlist'
        ereturn post `b' `V', esample(`touse')
        ereturn scalar   N   = `N'
        ereturn scalar df_r  = `N'-1
        ereturn display

end

program mycalcscpp, plugin

The general structure of this program is the same as mymean10.ado and mymean11, discussed in Programming an estimation command in Stata: Preparing to write a plugin. From a bird's-eye view, mymeancpp.ado

parses the user input;
creates some names and objects to hold the results;
calls a work program to do the calculations;
stores the results returned by the work program in e(); and
displays the results.

The main difference between mymeancpp.ado and mymean11.ado is that the work program is a C++ plugin instead of a Mata function.

Lines 6 and 7 are identical to those in mylistcpp.ado. For a description of how these lines create the local macro varlist, the sample-inclusion variable contained in the local macro touse, and the local macro in that contains any user-specified in range, see the discussion of mylistcpp.ado in Getting access to the Stata data in your plugin.

Line 8 puts temporary names into the local macros b, V, and N. We use these names for results computed by the C++ plugin, and we know that we will not overwrite any results that a user has stored in global Stata memory. (Recall that Stata matrices and scalars are global objects in Stata; see Using temporary names for global objects in Programming an estimation command in Stata: A first ado-command for a discussion of this topic.) In addition, Stata will drop the objects in the temporary names created by tempname, when mymeancpp terminates.

Lines 10–12 create Stata matrices to hold the results. We use the temporary names created by tempname for these matrices.

Line 14 in mymeancpp is similar to its counterpart of line 10 in mylistcpp.ado. In this case, plugin calls mycalcspp.plugin to do the work. The details of varlist, if `touse', and `in'
were discussed above. What is new is that we pass the arguments `b' `V' `N' to pass the temporary names to mycalcs.plugin.

mycalcscpp.plugin

does the calculations;
puts the estimated means into the Stata matrix whose name is in the local macro b;
puts the VCE into the Stata matrix whose name is in the local macro V; and
puts the number of observations in the sample into the Stata scalar whose name is in the local macro N.

Lines 16–18 put the variable names on the column stripe of the vector of estimated means and on the row and column stripes of the VCE matrix. Lines 19–21 store the results in e(). Line 22 displays the results.

I now discuss the code that creates mycalcs.plugin. Before discussing details, let's create the plugin and run an example.

In a directory that contains mycalcs.cpp, mymatrix.cpp, mycalcsv.cpp, mymatrix.h, mycalcsv.h, stplugin.cpp, and stplugin.h, I created mycalcscpp.plugin on my Mac by typing

g++ -bundle -DSYSTEM=APPLEMAC mycalcs.cpp mymatrix.cpp mycalcsv.cpp stplugin.cpp -o mycalcscpp.plugin

Having created mycalcscpp.plugin, I ran example 3.

Example 3: mymeancpp

. mymeancpp mpg trunk rep78 in 1/60
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |     20.125   .6659933    30.22   0.000     18.79032    21.45968
       trunk |   14.42857   .5969931    24.17   0.000     13.23217    15.62497
       rep78 |   3.160714    .118915    26.58   0.000     2.922403    3.399025
------------------------------------------------------------------------------

I now discuss some aspects of the C++ code used to create mycalcscpp.plugin. I begin with mycalcs.cpp in code block 6, which contains the code for the entry function stata_call().

Code block 6: mycalcs.cpp

// version 1.0.0 14Feb2018
// C++ mycalcs plugin for Stata
#include "stplugin.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <new>
#include "mymatrix.h"
#include "mycalcsv.h"

//Unicode characters can make stata names up to 32*4+1=129 bytes
STDLL stata_call(int argc, char *argv[])
{
    ST_int       first, last, nvars, nobs ;
    ST_int       i  ;
    ST_retcode   rc ;
    char         bname[130], vname[130], nname[130], msg[81] ;

    rc = static_cast<ST_retcode>(0) ;

// Check that arguments are not too long for buffers
    for(i=0; i<3; i++) {
       if (strlen(argv[i])>129) {
            sprintf(msg, "Argument %d is more than 129 bytes long\n",i+1);
            SF_error(msg) ;
            return(static_cast<ST_retcode>(198)) ;
       }
    }
// Store arguments into strings 
// NB: No more checking required
//     functions will return nonzero codes if arguments specify bad names
    strcpy(bname,argv[0]) ;
    strcpy(vname,argv[1]) ;
    strcpy(nname,argv[2]) ;

    nvars = SF_nvars() ;
    first = SF_in1();
    last  = SF_in2();

// Create bmat to hold vector of sample averages
    MyMatrix bmat(1, nvars) ;
// Create vmat to hold VCE
    MyMatrix vmat(nvars, nvars) ;

// Put sample averages into bmat and # of obs in nobs
    rc = MyAve(bmat, first, last, nvars, &nobs) ;
    if(rc>0) return(rc) ;
// Put VCE into vmat and # of obs in n
    rc = MyV(bmat, vmat, first, last, nvars, nobs) ;
    if(rc>0) return(rc) ;

// Copy sample averages from bmat to Stata matrix bname
    rc = bmat.CopyCtoStataMatrix(bname)  ;
    if(rc>0) return(rc) ;

// Copy VCE from vmat to Stat matrix vname
    rc = vmat.CopyCtoStataMatrix(vname)  ;
    if(rc>0) return(rc) ;

// Copy number of obs from n to nname
    rc = SF_scal_save(nname, (ST_double) nobs);
    if(rc>0) return(rc) ;

    return(rc) ;
}

In summary, the code in mycalcs.cpp performs the following tasks.

It puts the names of Stata objects passed in as arguments into C++ character arrays that can be passed to SPI functions.
It creates the bmat and vmat instances of the MyMatrix class, which will hold results.
It uses the work functions MyAve() and MyV() to compute the results that are stored in bmat, vmat, and nobs.
It uses the member function CopyCtoStataMatrix() of the MyMatrix class and the SPI function SF_scal_save() to copy the results from bmat, vmat, and nobs to the Stata objects whose names were parsed in step 1.

mycalcs.cpp is easy to read, because I put all the details into the MyMatrix class and into work functions. The MyMatrix class is defined in mymatrix.cpp, and the work functions are defined in mycalcsv.cpp, which I discuss below.

Like mylistw.cpp, mycalcs.cpp uses the return code rc to handle error conditions. Each function returns zero if all went well, and each function returns a nonzero error code if it could not perform the requested job. If the code returned is not zero, mycalcs.cpp returns it immediately to stata_call(), which in turn returns the nonzero code to Stata. The error messages associated with the error conditions are displayed by the work functions.

In point (2), I noted that bmat and vmat are instances of the MyMatrix class. The sample averages and the VCE are best stored in matrices. To keep things simple and self-contained, I defined a bare-bones matrix class MyMatrix that uses row-major storage and only the member functions I needed. Except for the member function copyCtoStataMatrix(), the code for MyMatrix is standard C++, as can be seen in code block 7.

Code block 7: mymatrix.cpp

// version 1.0.0  14Feb2018
// source file for MyMatrix class
#include <stdio.h>
#include <new>
#include "mymatrix.h"
#include "stplugin.h"

#define M(i,j)  *(mat+(i)*c+j) 

// Constructor
//   creates matrix and and initializes elements to zero 
// Notes: matrices are long vectors with row-major storage
//    The i,j element of an r x c matrix is 
//    the (i-1)*r + (j-1) element of the of the vector
//    under C-style zero-base indexing
//
MyMatrix::MyMatrix(ST_int rows, ST_int cols) : r(rows), c(cols)
{
    ST_int i, j, TotalSize ;

    TotalSize = r * c ;
    try {
        mat = new ST_double[TotalSize];
        for(i=0; i<r; i++) {
            for(j=0; j<c; j++) {
                M(i,j) = 0.0 ;
            }
        }
        rc  = 0 ;
    }
    catch (std::bad_alloc) {
        rc = 909 ;
    }
}

// Destructor deallocates matrix
MyMatrix::~MyMatrix()
{
    if (mat) {
        delete [] mat ;
    }
}

// Divide each element by val
void MyMatrix::DivideByScalar(ST_double val)
{
    ST_int  i, j ;

    for(i=0; i<r; i++) {
        for(j=0; j<c; j++) {
            M(i,j) /= val ;
        }
    }
}

// Copy matrix object values to Stata matrix smname
ST_retcode MyMatrix::CopyCtoStataMatrix(char *smname)
{
    ST_int      i, j;
    ST_retcode  rc_st ;
    ST_double   val ;
    char        msg[80] ;

    rc_st = 0 ;
    for(i=0; i<r; i++) {
        for(j=0; j<c; j++) {
            val = GetValue(i, j) ;
            rc_st = SF_mat_store(smname, (i+1) , (j+1), val ) ;
            if(rc_st>0) {
                sprintf(msg, "cannot access Stata matrix %s\n", smname) ;
                SF_error(msg) ;
                return(rc_st) ;
            }
        }
    }
    return(rc_st) ;
}

#undef M

On line 8, I used a preprocessor macro to simplify the code that refers to an element of a matrix. I undefine the macro on line 79.

Lines 57–77 contain the code for CopyCtoStataMatrix(). Line 68 uses an SPI function that I have not yet discussed. SF_mat_store(char *sname, ST_int i, ST_int j, ST_double val) stores the value val in row i and column j of the Stata matrix whose name is contained in character array sname. The row i and column j are given in one-based indexing.

Line 67 uses the member function GetValue(i,j), which returns the (i,j) element of the matrix in an instance of the MyMatrix class. GetValue() is defined in mymatrix.h, which is the header file that contains the declaration for the MyMatrix class. Code block 8 contains the code in mymatrix.h.

Code block 8: mymatrix.h

// version 1.0.0  14Feb2018
// header file for MyMatrix class
#include "stplugin.h"

#define M(i,j)  *(mat+(i)*c+j) 

class MyMatrix {
private:
        ST_int    r, c;
        ST_double *mat ;
        ST_int    TotalSize ;

public:
        ST_retcode  rc ;
// constructor
        MyMatrix(ST_int rows, ST_int cols) ;
// destructor
        ~MyMatrix() ;
// Return (i,j)th element 
        inline ST_double GetValue(ST_int i, ST_int j) {
                return( M(i,j) ) ;
        }
// Store val into (i,j)th element 
        inline void StoreValue(ST_int i, ST_int j, ST_double val) {
                M(i,j) = val ;
        }
// Increment (i,j)th element  by val
        inline void IncrementByValue(ST_int i, ST_int j, ST_double val) {
                M(i,j) += val ;
        }
// Divide each element by val
        void DivideByScalar(ST_double val) ;
// Copy from class to Stata matrix smname
        ST_retcode CopyCtoStataMatrix(char *smname) ;
// Return rows of matrix
        inline ST_int Rows() { return r; }
// Return cols of matrix
        inline ST_int Cols() { return c ; }
};

#undef M

This code only uses standard C++ and coding techniques that I have already discussed.

In point (3), I noted that mycalcs.cpp uses the work functions MyAve() and MyV() to compute the results. These functions are defined in mycalcsv.cpp in code block 9.

Code block 9: mycalcsv.cpp

// version 1.0.0  14Feb2018
// Functions used in mycalcs.cpp
//
#include <stdio.h>
#include "stplugin.h"
#include "mymatrix.h"
#include "mycalcsv.h"

ST_retcode MyAve(MyMatrix &bmat, ST_int first, ST_int last,
    ST_int nvars, ST_int *nobs)
{
    ST_int     i, j ;
    ST_double  value ;
    ST_retcode rc ;
    char       msg[80] ;

    *nobs = (ST_int) 0 ;
    for(i=first-1; i<last; i++) {
        if (!SF_ifobs(i+1)) {
            continue ;
        }
        (*nobs)++ ;
        for(j=0; j<nvars; j++) {
            rc = SF_vdata(j+1, i+1, &value);
            if(rc>0) {
                sprintf(msg, "Problem accessing Stata data\n") ;
                SF_error(msg) ;
                return(rc) ;
            }
            if (SF_is_missing(value)) {
                sprintf(msg, "missing values encountered\n") ;
                SF_error(msg) ;
                return(static_cast<ST_retcode>(416)) ;
            }
            bmat.IncrementByValue(0, j, value) ;
        }
    }

    bmat.DivideByScalar( static_cast<ST_double>(*nobs) ) ;
    return(rc) ;
}

ST_retcode MyV(MyMatrix &bmat, MyMatrix &vmat, ST_int first,
    ST_int last, ST_int nvars, ST_int nobs)
{

    ST_int      i, j, j2 ;
    ST_double  value ;
    char       msg[80] ;
    ST_retcode rc ;

    rc = (ST_retcode) 0 ;
    MyMatrix emat2(1, nvars) ;

    for(i=first-1; i<last; i++) {
        if (!SF_ifobs(i+1)) {
            continue ;
        }
        for(j=0; j<nvars; j++) {
            rc = SF_vdata(j+1, i+1, &value);
            if(rc>0) {
                sprintf(msg, "Problem accessing Stata data\n") ;
                SF_error(msg) ;
                return(rc) ;
            }
            if (SF_is_missing(value)) {
                sprintf(msg, "missing values encountered\n") ;
                SF_error(msg) ;
                return(static_cast<ST_retcode>(416)) ;
            }
            emat2.StoreValue(0, j, (bmat.GetValue(0,j)-value) ) ;
        }
        for(j=0; j<nvars; j++) {
// matrix is symmetric only fill in lower diag in loop
//     over observations
            for(j2=0; j2<=j; j2++) {
                vmat.IncrementByValue(j, j2,
                    (emat2.GetValue(0,j)*emat2.GetValue(0,j2)) ) ;
            }
        }
    }


// Fill in above the diagonal
    for(j=0; j<nvars; j++) {
        for(j2=j+1; j2<nvars; j2++) {
            vmat.StoreValue(j, j2, vmat.GetValue(j2, j) ) ;
        }
    }

// Divide V by n*(n-1)
    vmat.DivideByScalar( (static_cast<ST_double> (nobs*(nobs-1))) ) ;

    return(rc) ;
}

The work function MyAve() is a C++ implementation of the MyAve() implemented in Mata in Programming an estimation command in Stata: Preparing to write a plugin. It puts the sample averages into the bmat instance of the MyMatrix class, and it puts the number of observations in the sample into nobs. Most of the code for this function is standard C++ or it uses SPI functions that I have already discussed. Lines 35 and 39 deserve comment.

Line 35 uses the IncrementByValue() member function of MyMatrix. When calculating the sample average and storing it in the jth element of a vector named b, one needs to store b[j] + value into b[j]. In other words, one increments the amount of the jth element in b by value. bmat.IncrementByValue(0,j, value) increments element j in bmat by value.

Line 39 uses the DivideByScalar() member function of MyMatrix. bmat.DivideByScalar(z) replaces each element of bmat with that element divided by the amount z.

MyV() is a C++ implementation of the Mata function MyV() discussed in Programming an estimation command in Stata: Preparing to write a plugin. It puts the VCE into the vmat instance of the MyMatrix class. Most of the code for this function is either standard C++, or uses techniques that I have already discussed. Lines 71, 78, and 87 use the MyMatrix member functions Storevalue() and GetValue(), which are defined in mymatrix.h. vmat.StoreValue(i, j, z) stores the value z into element (i, j) of the vmat instance of MyMatrix. vmat.GetValue(i, j) returns the value stored in element (i, j) of the vmat instance of MyMatrix.

Done and undone

I showed how to implement a C++ plugin that does the calculations performed by Mata work functions in mymean10.ado and mymean11.ado, as discussed in Programming an estimation command in Stata: Preparing to write a plugin. In the next post, I show how to implement these calculations in a Java plugin.

Appendix

In the text, I showed how to compile and link a plugin on an OS 10 Mac using the command-line developer tools. Here I give the commands for the gcc compiler on Windows 10 and on RedHat Linux.

Windows 10

This subsection provides the commands to compile and link the plugins in a Cygwin environment on a 64-bit Windows 10 system. Unlike the other platforms, we cannot just use gcc. In Cygwin, gcc compiles applications to run in the Cygwin POSIX/Unix environment. We want to use Cygwin to compile a library that will link to, and run in, a native Windows application. Cygwin has minimalist GNU compilers for Windows (MinGW) that will do what we want. The name of the appropriate compiler is platform dependent. On my 64-bit, x86-Intel machine, I used the x86_64-w64-mingw32-g++ compiler.

hellocpp.plugin

In a directory containing stplugin.h, stplugin.cpp, and hello.cpp, create hellocpp.plugin by typing

x86_64-w64-mingw32-g++ -shared -static stplugin.cpp hello.cpp -o hellocpp.plugin

mylistwcpp.plugin

In a directory that contains stplugin.h, stplugin.cpp, and mylistw.cpp, create mylistwcpp.plugin by typing

x86_64-w64-mingw32-g++ -shared -static stplugin.cpp mylistw.cpp -o mylistwcpp.plugin

mycalcscpp.plugin

In a directory that contains stplugin.cpp, stplugin.h, mycalcs.cpp, mycalcsv.h, and mycalcsv.cpp, create mycalcscpp.plugin by typing

x86_64-w64-mingw32-g++ -shared -static stplugin.cpp mycalcsv.cpp mycalcs.cpp -o mycalcscpp.plugin

RedHat Linux

This subsection provides the gcc commands to compile and link plugins on RedHat Linux.

hellocpp.plugin

In a directory containing stplugin.h, stplugin.cpp, and hello.cpp, create hellocpp.plugin by typing

g++ -shared -fPIC -DSYSTEM=OPUNIX stplugin.cpp hello.cpp -o hellocpp.plugin

mylistwcpp.plugin

In a directory that contains stplugin.h, stplugin.cpp, and mylistw.cpp, create mylistwcpp.plugin by typing

g++ -shared -fPIC -DSYSTEM=OPUNIX stplugin.cpp mylistw.cpp -o mylistwcpp.plugin

mycalcscpp.plugin

In a directory that contains stplugin.cpp, stplugin.h, mymatrix.cpp, mycalcsv.cpp, mycalcs.cpp, mymatrix.h, and mycalcsv.h, create mycalcscpp.plugin by typing

g++ -shared -fPIC -DSYSTEM=OPUNIX stplugin.cpp mymatrix.cpp mycalcsv.cpp mycalcs.cpp -o mycalcscpp.plugin

Categories: Programming Tags: #StataProgramming, plugin, programming

Programming an estimation command in Stata: Writing a Java plugin Programming an estimation command in Stata: Writing a C plugin

Programming an estimation command in Stata: Writing a C++ plugin

Subscribe to the Stata Blog

Recent articles

Archives

Categories

Links