Home > Programming > Programming an estimation command in Stata: Writing a C plugin

Programming an estimation command in Stata: Writing a C plugin

This post is the second in a series that illustrates how to plug code written in another language (like C, C++, or Java) into Stata. This technique is known as writing a plugin or as writing a dynamic-link library (DLL) for Stata.

In this post, I write a plugin in C that implements the calculations performed by mymean_work() in mymean11.ado, discussed in Programming an estimation command in Stata: Preparing to write a plugin. I assume that you are familiar with the material in that post.

This is the 30th post in the series Programming an estimation command in Stata. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Writing a hello-world C plugin

Before I do any computations, I illustrate how to write and compile a C plugin that communicates with Stata. Code block 1 contains the code for myhello.ado that calls the C plugin hello, which just displays “Hello from C” in Stata.

Code block 1: myhello.ado

*! version 1.0.0  13Feb2018
program define myhello

    version 15.1

    plugin call hello

end

program hello, plugin

Line 6 executes the plugin whose handle is hello. Line 10 loads the plugin implemented in hello.plugin into the handle hello. That the execute statement comes before the load statement seems odd at first. Stata ado-files are read in their entirety, and each ado-program, Mata function, or plugin handle is loaded before the lines of the main ado-program are executed. So line 10 is actually executed before line 6.

The name of the handle for the plugin, hello in this case, must differ from the name of the main ado-program, myhello in this case, and from any other ado-program defined in this .ado file.

The code for hello.c is in code block 2.

Code block 2: hello.c

// version 1.0.0 14Feb2018
#include "stplugin.h"
#include <stdio.h>
#include <string.h>

STDLL stata_call(int argc, char *argv[])
{
    char  msg[81] ;

    strcpy(msg, "Hello from C\n");
    SF_display(msg) ;
    return((ST_retcode) 0) ;
}

Line 2 includes the Stata plugin header file stplugin.h. Line 6 is the standard declaration for the entry function for a C plugin for Stata. You should copy it. Inside stata_call(), argc will contain the number of arguments passed to the plugin, and string vector argv will contain the arguments themselves.

Line 8 declares and allocates space for the C string msg. Line 10 puts “Hello from C” with a new line into msg. Line 11 has Stata display what msg contains. Line 12 returns zero as the return code. Note that I casted the literal 0 to be the expected type ST_retcode.

I now discuss how to create the plugin hello.plugin from hello.c. In the directory that contains myhello.ado and hello.c, I also have stplugin.c. stplugin.c defines a function needed to make the stata_call() function available to Stata.

Do not change the contents of stplugin.h or stplugin.c. In fact, you do not even need to look at them.

On my OS X Mac that has the command–line developer tools installed, I use gcc to create hello.plugin from stplugin.c and hello.c by typing

gcc -bundle -DSYSTEM=APPLEMAC stplugin.c hello.c -o hello.plugin

The above gcc command compiles the two .c files and links them to create the DLL hello.plugin, which myhello.ado can call.

In an appendix to this post, I provide instructions for creating hello.plugin on other platforms. https://www.stata.com/plugins/ provides complete documentation for writing and compiling C plugins.

Having created hello.plugin, I can execute myhello in Stata.

Example 1: myhello

. myhello
Hello from C

For simplicity, I have stplugin.h, stplugin.c, hello.c, myhello.ado, and hello.plugin in the same directory. For larger projects, I would put the .ado and .plugin files in directories on Stata’s ADOPATH and use my compiler’s environment to manage where I put my header and C source files. For the examples in this post, I put all my .ado files, header files, C source files, and created .plugin files into a single directory.

Getting access to the Stata data in your plugin

hello.plugin makes Stata display something created inside the plugin. The next step is giving the plugin access to the data in Stata. To illustrate this process, I discuss mylistc.ado, which uses a plugin to list out observations of the specified variables.

Let’s look at the ado-code first.

Code block 3: mylistc.ado

*! version 1.0.0  13Feb2018
program define mylistc, eclass

        version 15.1

        syntax varlist(numeric max=3) [if] [in]
        marksample touse

        display "Variables listed:  `varlist'"
        plugin call mylistw `varlist' if `touse' `in'

end

program mylistw, plugin

In line 6, syntax creates three local macros. It puts the variables specified by the user into the local macro varlist. It puts any if condition specified by the user into the local macro if. It puts any in condition specified by the user into the local macro in. I specified max=3 to syntax to limit the number of variables to 3. This limitation is silly, and I would not need it for an example Stata/Mata program, but it simplifies the example C plugin.

In line 7, marksample creates a sample-inclusion variable and puts its name in the local macro touse. The sample-inclusion variable is zero for each excluded observation and one for each included observation. marksample uses the variables in the local macro varlist, the condition in the local macro if, and the range in the local macro in to create the sample-inclusion variable. (All three local macros were created by syntax.) An observation is excluded if any of the variables in the local macro varlist contain a missing value, if it was excluded by the condition in the local macro if, or if it was excluded by the range in the local macro in. The sample-inclusion variable is one for observations that were not excluded.

In line 9, I further simplified the C plugin by displaying the names of the variables whose values are listed out by the plugin.

In line 10, plugin calls mylistw.plugin. Because `varlist’ is specified, the Stata plugin interface (SPI) function SF_vdata() will be able to access the variables contained in the local macro varlist. Because if `touse’ is specified, the SPI function SF_ifobs() will return zero if the sample-inclusion variable in `touse’ is zero, and the function will return one if the sample-inclusion variable is one. Because `in’ is specified, the SPI functions SF_in1() and SF_in2() respectively return the first and last observations in any user-specified in range.

Specifying `in’ is not necessary to identify the sample specified by the user, because if `touse’ already specifies this sample-inclusion information. However, specifying `in’ can dramatically reduce the range of observations in the loop over the data, thereby speeding up the code.

In a directory that contains stplugin.h, stplugin.c, and mylistw.c, I created mylistw.plugin on my Mac by typing

gcc -bundle -DSYSTEM=APPLEMAC stplugin.c mylistw.c -o mylistw.plugin

Code block 4: mylistw.c

// version 1.0.0 14Feb2018
#include "stplugin.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

STDLL stata_call(int argc, char *argv[])
{
    ST_int       i, j, nObs, nVars  ;
    ST_int       first, last ;
    ST_double    value ;
    ST_retcode   rc ;
    int          nchar ;
    char         line[82], strval[27], msg[81] ;

    rc    = (ST_retcode) 0 ;
// Put the first observation in sample into first
    first = SF_in1();
// Put the last observation in sample into last
    last  = SF_in2();
// Put number of variables in varlist passed in to plugin into nVars
    nVars = SF_nvars() ;
// Initiate number of observations counter to 0
    nObs  = 0 ;

// Loop over observations
    for(i=first; i<=last; i++) {
        line[0] = '\0' ;
// Only display observations for which if restriction is true
        if (!SF_ifobs(i)) {
            continue ;
        }
// Increment number of observations counter
        ++nObs ;
// Loop over variables
        for(j=1; j<=nVars; j++) {
// Put value of observation i on variable j into value 
            rc = SF_vdata(j, i, &value);
// Return with error if problem getting value
            if(rc>0) {
                 sprintf(msg, "Problem accessing Stata data\n") ;
                 SF_error(msg) ;
                 return(rc) ;
            }
// Return with error if missing value
            if (SF_is_missing(value)) {
                 sprintf(msg, "missing values encountered\n") ;
                 SF_error(msg) ;
                 return( (ST_retcode) 416 ) ;
            }
            nchar = snprintf(strval,25,"%f ",value) ;
// Return with error if number is too large or cannot be
//     formated into string as float
            if (nchar<=0 || nchar>25) {
                 sprintf(msg, "number is too large or badly formated\n") ;
                 SF_error(msg) ;
                 return( (ST_retcode) 498 ) ;
            }
// If new version of line is not too big, concatenate string value onto line
            if ( (strlen(strval) + strlen(line))<=80) {
                strcat(line,strval) ;
            }
            else {
                 sprintf(msg, "More than 80 bytes in line\n") ;
                 SF_error(msg) ;
                 return( (ST_retcode) 498 ) ;
            }
        }
// We know that line has 80 bytes or less, so next line is safe
        strcat(line,"\n") ;
// Display line in Stata
        SF_display(line) ;
    }
    sprintf(line, "First observation was             %d\n", first) ;
    SF_display(line) ;
    sprintf(line, "Last observation was              %d\n", last) ;
    SF_display(line) ;
    sprintf(line, "Number of observations listed was %d\n", nObs) ;
    SF_display(line) ;

    return(rc) ;
}

If you are reading this post, you can read standard C. I discuss how mylistw.c illustrates the structure of a C plugin for Stata, and I explain the types and the functions defined by the SPI used in the code. Complete details about the SPI are available at https://www.stata.com/plugins/.

mylistw.c returns zero to Stata if all went well, and it returns a nonzero error code if something went wrong. Every time I call a function in mylistw.c that could fail, I check its return code. If that function failed, I make Stata display an error message, and I return a nonzero error code to Stata. This logic provides the overall structure to mylisw.c. Most of the code deals with error conditions or takes care not to put more characters into a string buffer than it can hold.

C plugins read from or write to Stata objects using functions defined in the SPI. mylistw.c does not return any results, so it has a simple structure.

  • It uses SPI functions to read from the specified sample of the data in Stata.
  • It uses standard C and SPI functions to list observations for the specified sample, and it keeps a counter of how many observations are in the specified sample.
  • It uses standard C and SPI functions to display which was the first observation in the sample, which was the last observation in the sample, and how many observations were in the specified sample.

Now, I discuss specific parts of mylistw.c.

In lines 9–12, I use the SPI defined types ST_int, ST_double, and ST_retcode for variables that the SPI functions return or that are arguments to the SPI functions. Using these defined types is essential, because their mappings to primitive C types vary over time.

rc holds the return code that the plugin will return to Stata. In line 16, I initialize rc to zero. If an SPI function that might fail does what was requested, it returns a return code of zero. If an SPI function cannot do what was requested, it returns a nonzero return code. Each time I call an SPI function that could fail, I store the code it returns in rc. If rc is not zero, I make Stata display an error message and make the plugin return the nonzero value stored in rc.

Lines 18, 20, and 22 use SPI functions. SF_in1() puts the first observation specified by an in range into first. SF_in2() puts the last observation specified by an in range into last. If an in range was not specified to plugin, first will contain one, and last will contain the number of observations in the dataset. SF_nvars() puts the number of variables specified in the varlist in to nVars.

Lines 30–32 ensure that we skip over observations that were excluded by the if restriction specified to plugin in line 10 of mylistc.ado. To illustrate some details, consider example 2.

Example 2: mylistc

. sysuse auto, clear
(1978 Automobile Data)

. mylistc mpg trunk rep78 if trunk < 21 in 2/10
Variables listed:  mpg trunk rep78
17.000000 11.000000 3.000000
20.000000 16.000000 3.000000
15.000000 20.000000 4.000000
20.000000 16.000000 3.000000
16.000000 17.000000 3.000000
19.000000 13.000000 3.000000
First observation was             2
Last observation was              10
Number of observations listed was 6

In line 30, SF_ifobs(i) returns one when the if restriction specified to plugin is one for observation i and zero otherwise. In line 10 of mylist.ado, we see that the if restriction passed into plugin is if `touse'. As discussed above, the sample-inclusion variable in the local macro touse is zero for excluded observations and one for the included observations.

The in range on line 10 of mylistc.ado was included so that the loop over the observations in line 27 of mylistw.c would only go from the beginning to the end of any specified in range. In example 2, instead of looping over all 74 observations in the auto dataset, the loop on line 27 of mylistw.c only goes from 2 to 10.

In example 2, the sample-inclusion variable is 1 for 6 observations and 0 for the other 68 observations. The in 2/10 range excludes observation one and the observations from 11–74. Of the first 10 observations, 2 are excluded because rep78 is missing. One observation is excluded because trunk is 21.

For comparison, all 9 observations between 2 and 10 are listed in example 3.

Example 3: list

. list mpg trunk rep78 in 2/10

     +---------------------+
     | mpg   trunk   rep78 |
     |---------------------|
  2. |  17      11       3 |
  3. |  22      12       . |
  4. |  20      16       3 |
  5. |  15      20       4 |
  6. |  18      21       3 |
     |---------------------|
  7. |  26      10       . |
  8. |  20      16       3 |
  9. |  16      17       3 |
 10. |  19      13       3 |
     +---------------------+

Returning to line 38 of mylistw.c, rc = SF_vdata(j, i, &value) puts the value of observation i on variable j into value, and it puts the code returned by SF_vdata() into rc. If all goes well, rc contains 0, and the error block in lines 41–43 is not entered. If SF_vdata() cannot store the data into value, the error block in lines 41–43 is entered, and it makes Stata display an error message and causes mylistw.plugin to exit with the error code that rc contains. In the error block, SF_error() makes Stata display the contents of a C string in red.

SF_vdata() can only access variables of one the numerical Stata data types (byte, int, long, float, or double). (Use SF_sdata() for string data.) Regardless of which Stata numerical type the variable is, SF_vdata() stores the result as an ST_double. In example 2, mpg, trunk, rep78 are all of type int in Stata, but each was stored into value as an ST_double.

In line 46, SF_is_missing(value) returns 1 if value is a missing value and 0 otherwise. Lines 46–50 cause mlistw.plugin to exit with error 416 if any observation in one of the variables contains a missing value. These lines are redundant, because the sample-inclusion variable passed into mylistw.plugin excluded observations containing missing values. I included these lines to illustrate how I would safely exclude missing values from inside the plugin and to reiterate that C code must carefully deal with missing values. Stata missing values are valid double precision numbers in C. You will get wrong results if you include Stata missing values in calculations.

The remaining lines construct the C string line that is passed to Stata to display for each observation and finally display the summary information about the sample.

Estimating the mean in a C plugin

I now discuss the ado-command mymeanc, which uses mycalcs.plugin to implement the calculations performed by mymean_work(), in mymean11.ado discussed in Programming an estimation command in Stata: Preparing to write a plugin.

The code for mymeanc is in mymeanc.ado, which is in code block 5.

Code block 5: mymeanc.ado

*! version 1.0.0  13Feb2018
program define mymeanc, eclass

    version 15.1

    syntax varlist(numeric) [if] [in]
    marksample touse
    tempname b V N

    local k : word count `varlist'
    matrix `b' = J(1, `k', .)
    matrix `V' = J(`k', `k', .)

    plugin call mycalcs `varlist' if `touse' `in', `b' `V' `N'

    matrix colnames `b'  = `varlist'
    matrix colnames `V'  = `varlist'
    matrix rownames `V'  = `varlist'
    ereturn post `b' `V', esample(`touse')
    ereturn scalar   N   = `N'
    ereturn scalar df_r  = `N'-1
    ereturn display

end

program mycalcs, plugin

The general structure of this program is the same as mymean10.ado and mymean11, discussed in Programming an estimation command in Stata: Preparing to write a plugin. From a bird's-eye view, mymeanc.ado,

  • parses the user input;
  • creates some names and objects to hold the results;
  • calls a work program to do the calculations;
  • stores the results returned by the work program in e(); and
  • displays the results.

The main difference between mymeanc.ado and mymean11.ado is that the work program is a C plugin instead of a Mata function.

Lines 6 and 7 are identical to those in mylistc.ado. For a description of how these lines create the local macro varlist, the sample-inclusion variable contained in the local macro touse, and the local macro in that contains any user-specified in range, see the discussion of mylistc.ado in Getting access to the Stata data in your plugin.

Line 8 puts temporary names into the local macros b, V, and N. We use these names for results computed by the C plugin and know that we will not overwrite any results that a user has stored in global Stata memory. (Recall that Stata matrices and scalars are global objects in Stata; see Using temporary names for global objects in Programming an estimation command in Stata: A first ado-command for a discussion of this topic.) In addition, Stata will drop the objects in the temporary names created by tempname, when mymeanc
terminates.

Lines 10–12 create Stata matrices to hold the results. We use the temporary names created by tempname for these matrices.

Line 14 in mymeanc.ado is similar to its counterpart of line 10 in mylistc.ado. In this case, plugin calls mycalcs.plugin to do the work. The details of varlist, if `touse' and `in' were discussed above. What is new is that we pass the argument `b' `V' `N' to pass the temporary names to mycalcs.plugin.

mycalcs.plugin

  • does the calculations;
  • puts the estimated means into the Stata matrix whose name is in the local macro b;
  • puts the estimated variance of the estimator (VCE) into the Stata matrix whose name is in the local macro V; and
  • puts the number of observations in the sample into the Stata scalar whose name is in the local macro N.

Lines 16–18 put the variable names on the column stripe of the vector of estimated means and on the row and column stripes of the VCE matrix. Lines 19–21 store the results in e(). Line 22 displays the results.

I now discuss the code that creates mycalcs.plugin. Before discussing details, let's create the plugin and run an example.

In a directory that contains mycalcs.c, mycalcsw.h, mycalcsw.c, stplugin.c, and stplugin.h, I created mycalcs.plugin on my Mac by typing

gcc -bundle -DSYSTEM=APPLEMAC stplugin.c mycalcsw.c mycalcs.c -o mycalcs.plugin

Having created mycalcs.plugin, I ran example 3.

Example 3: mymeanc

. mymeanc mpg trunk rep78 in 1/60
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |     20.125   .6659933    30.22   0.000     18.79032    21.45968
       trunk |   14.42857   .5969931    24.17   0.000     13.23217    15.62497
       rep78 |   3.160714    .118915    26.58   0.000     2.922403    3.399025
------------------------------------------------------------------------------

I now discuss some aspects of the C code used to create mycalcs.plugin. I begin with mycalcs.c in code block 6, which contains the code for the entry function stata_call().

Code block 6: mycalcs.c

// version 1.0.0 14Feb2018
#include "stplugin.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "mycalcsw.h"

//Unicode characters can make Stata names up to 32*4+1 bytes
STDLL stata_call(int argc, char *argv[])
{
    ST_int       first, last, nVars, nObs  ;
    ST_int       i  ;
    ST_retcode   rc ;
    char         bname[130], vname[130], nname[130], msg[81] ;
    ST_double    *bmat, *vmat  ;

    bmat  = NULL ;
    vmat  = NULL ;

// Put number of variables in varlist in to nVars
    nVars = SF_nvars() ;
// Put the first observation in sample into first
    first = SF_in1();
// Put the last observation in sample into last
    last  = SF_in2();

// Check that arguments are not too long for buffers
    for(i=0; i<3; i++) {
        if (strlen(argv[i])>129) {
            sprintf(msg, "Argument %d is more than 129 bytes long\n",i+1);
            SF_error(msg) ;
            return((ST_retcode) 198) ;
        }
    }
// Store arguments into strings 
// NB: No more checking required
//     SPI functions will return nonzero codes if arguments specify bad names
    strcpy(bname,argv[0]) ;
    strcpy(vname,argv[1]) ;
    strcpy(nname,argv[2]) ;

    // Allocate space for bmat and initialize to 1 x c matrix of zeros
    rc = InitCmat(&bmat, (ST_int) 1, nVars ) ;
    if (rc>0) {
        return( rc ) ;
    }

    // Allocate space for vmat and initialize to nVars x nVars matrix of zeros
    rc = InitCmat(&vmat, nVars, nVars ) ;
    if (rc>0) {
        free(bmat) ;
        return( rc ) ;
    }

    // Put sample averages in bmat and number of obs in n
    rc = MyAve(bmat, first, last, nVars, &nObs) ;
    if(rc>0) {
        free(bmat) ;
        free(vmat) ;
        return(rc) ;
    }

    // Put VCE in vmat
    rc = MyV(bmat, vmat, first, last, nVars, nObs) ;
    if(rc>0) {
        free(bmat) ;
        free(vmat) ;
        return(rc) ;
    }

    // Copy sample averages from bmat to Stata matrix bname
    rc = CopyCtoStataMatrix(bmat, bname, (ST_int) 1, nVars) ;
    if(rc>0) {
        free(bmat) ;
        free(vmat) ;
        return(rc) ;
    }

    // Copy VCE from vmat to Stat matrix vname
    rc = CopyCtoStataMatrix(vmat, vname,  nVars, nVars) ;
    if(rc>0) {
        free(bmat) ;
        free(vmat) ;
        return(rc) ;
    }

    // Copy number of obs from nObs to nname
    rc = SF_scal_save(nname, (ST_double) nObs);
    if(rc>0) {
        free(bmat) ;
        free(vmat) ;
        return(rc) ;
    }

    free(bmat) ;
    free(vmat) ;
    return(rc) ;
}

In summary, the code in mycalcs.c performs the following tasks.

  1. It puts the names of Stata objects passed in as arguments into C strings that can be passed to work functions.
  2. It uses the work function InitCmat() to allocate space for the C arrays bmat and vmat that will hold matrix results.
  3. It uses the work functions MyAve() and MyV() to compute the results that are stored in bmat, vmat, and nObs.
  4. It uses the work function CopyCtoStataMatrix() and the SPI function SF_scal_save() to copy the results from bmat, vmat, and nObs to the Stata objects whose names were parsed in step 1.
  5. It frees the allocated C arrays and returns a return code.

mycalcs.c is easy to read, because I put all the details into the work functions. These functions are defined in mycalcsw.c, and I discuss them below.

Like mylistw.c, mycalcs.c uses the return code rc to handle error conditions. Each work function returns zero if all went well, and it returns a nonzero error code if it could not perform the requested job. If the return code is not zero, mycalcs.c enters into a block of code to handle the error. Each error block makes Stata display an error message, it frees any allocated C arrays, and finally, it causes stata_call() to return the nonzero code.

I now discuss the work functions in mycalcsw.c in code block 7.

Code block 7: mycalcsw.c

// version 1.0.0 14Feb2018
#include <stdlib.h>
#include <stdio.h>
#include "stplugin.h"
#include "mycalcsw.h"

// Note: Matrices are long vectors with row-major storage
//    The i,j element of an r x c matrix is 
//    the (i-1)*r + (j-1) element of the of the vector
//    under C-style zero-base indexing
//
//    Define preprocesor macros to facilitate readability

#define  M(i, j)   *(*mat + (i)*c + j)
#define  B(j)      *(bmat+j)
#define  E(j)      *(emat+j)
#define  V(i, j)   *(vmat + (i)*c + j)
#define  C(i, j)   *(cmat + (i)*c + j)

ST_retcode InitCmat(ST_double **mat, ST_int r, ST_int c)
{
    ST_int  i, j ;
    char    msg[80] ;

    *mat = (ST_double *) malloc((size_t) r*c*sizeof(ST_double)) ;
    if (*mat == NULL ) {
        sprintf(msg, "Insufficient memory\n") ;
        SF_error(msg) ;
        return( (ST_retcode) 909) ;
    }

    for(i=0; i<r; i++) {
        for(j=0; j<c; j++) {
            M(i, j) = (ST_double) 0 ;
        }
    }

    return((ST_int) 0) ;
}

ST_retcode MyAve(ST_double *bmat, ST_int first, ST_int last, ST_int nVars,
    ST_int *nObs)
{
    ST_int     i, j ;
    ST_double  value ;
    ST_retcode rc ;
    char       msg[80] ;

    rc    = (ST_retcode) 0 ;
    *nObs = (ST_int) 0 ;
    for(i=first-1; i<last; i++) {
        if (SF_ifobs(i+1)) {
            ++(*nObs) ;
            for(j=0; j<nVars; j++) {
                rc = SF_vdata(j+1, i+1, &value);
                if(rc>0) {
                    sprintf(msg, "Problem accessing Stata data\n") ;
                    SF_error(msg) ;
                    return(rc) ;
                }
                if (SF_is_missing(value)) {
                    sprintf(msg, "missing values encountered\n") ;
                    SF_error(msg) ;
                    return( (ST_retcode) 416 ) ;
                }
                B(j) += value ;
            }
        }
    }

    DivideByScalar(bmat, (ST_int) 1, nVars, (ST_double) *nObs) ;
    return(rc) ;
}

ST_retcode MyV(ST_double *bmat, ST_double *vmat, ST_int first, ST_int last,
    ST_int nVars, ST_int nObs)
{
    ST_int     i, j, j2, c ;
    ST_double  *emat, value  ;
    char       msg[80] ;
    ST_retcode rc ;

// used in macros for matrices
    c     = nVars ;
    emat  = NULL;

    rc = InitCmat(&emat, 1, nVars ) ;
    if (rc>0) {
        return( rc ) ;
    }

    for(i=first-1; i<last; i++) {
        if (SF_ifobs(i+1)) {
            for(j=0; j<nVars; j++) {
                rc = SF_vdata(j+1, i+1, &value);
                if(rc>0) {
                    free(emat) ;
                    sprintf(msg, "Problem accessing Stata data\n") ;
                    SF_error(msg) ;
                    return(rc) ;
                }
                if (SF_is_missing(value)) {
                    free(emat) ;
                    sprintf(msg, "missing values encountered\n") ;
                    SF_error(msg) ;
                    return( (ST_retcode) 416 ) ;
                }
                E(j) = value - B(j) ;
            }
            for(j=0; j<nVars; j++) {
// matrix is symmetric: only fill in lower diag in loop over observations
                for(j2=0; j2<=j; j2++) {
                    V(j,j2) += (E(j))*(E(j2)) ;
                }
            }
        }
    }

// Fill in above the diagonal
    for(j=0; j<nVars; j++) {
        for(j2=j+1; j2<nVars; j2++) {
            V(j,j2) = V(j2,j) ;
        }
    }

    DivideByScalar(vmat, nVars, nVars, (ST_double) (nObs*(nObs-1)) )  ;

    free(emat) ;
    return(rc) ;
}

ST_retcode CopyCtoStataMatrix(ST_double *cmat, char *smat, ST_int r, ST_int c)
{
    ST_int      i, j;
    ST_retcode  rc ;
    char        msg[80] ;

    rc = (ST_retcode) 0 ;
    for(i=0; i<r; i++) {
        for(j=0; j<c; j++) {
            rc = SF_mat_store(smat, (i+1) , (j+1), C(i,j) ) ;
            if(rc>0) {
                sprintf(msg, "cannot access Stata matrix %s\n", smat) ;
                SF_error(msg) ;
                return(rc) ;
            }
        }
    }
    return(rc) ;
}


// Replace each element in r x c matrix mat with that element 
// divided by val
void DivideByScalar(ST_double *vmat, ST_int r, ST_int c, ST_double val)
{
    ST_int  i, j ;

    for(i=0; i<r; i++) {
        for(j=0; j<c; j++) {
            V(i,j) /= val ;
        }
    }
}

#undef M
#undef B
#undef E
#undef V

Two aspects of how I implemented matrices in C arrays deserve some comment. First, I stored the matrices as vectors with row-major storage, as I mentioned in the comments on lines 7–10. Second, I used the preprocessor macros defined on lines 14–18 to make the code easier to read. Note that I undefined these macros on lines 166–169.

Aside from its use of SF_error() to make Stata display an error message if malloc() cannot allocate memory, the work function InitCmat() uses standard C to implement a matrix allocation and initialization function.

The work function MyAve() is a C implementation of the MyAve() implemented in Mata in Programming an estimation command in Stata: Preparing to write a plugin. MyAve() handles Stata data and missing values as I described above, when I discussed mylistw.c. The work function DivideByScalar(), called on line 71, divides each element in bmat by the number of sample observations stored in n. (Casts ensure that floating point instead of integer division is performed.)

The work function MyV() is a C implementation of the MyV() implemented in Mata in Programming an estimation command in Stata: Preparing to write a plugin. MyV() uses most of the coding techniques and functions discussed so far. This function is longer than the others, but everything in it is either standard C or something that I have already discussed.

The work function CopyCtoStataMatrix() copies results from a C array to a Stata matrix. It uses SF_mat_store( smat, (i+1) , (j+1), C(i,j) ) to copy the element from row i and column j of a C array to to the corresponding element in the Stata matrix. The Stata matrix elements are specified as (i+1) and (j+1) because the C matrices in my code use zero-based indexing while SF_mat_store() uses one-based indexing for the Stata matrix elements.

The work function DivideByScalar() divides each element in a C array by a scalar.

For completeness, I now discuss mycalcsw.h. mycalcsw.h, given in code block 8, contains function prototypes of the work functions defined in mycalcsw.c.

Code block 8: mycalcsw.h

// version 1.0.0 14Feb2018
// header file for mycalcs.c and mycalcw.c
ST_retcode InitCmat(ST_double **mat, ST_int r, ST_int c) ;
ST_retcode MyAve(ST_double *bmat, ST_int first, ST_int last,
    ST_int nVars, ST_int *nObs) ;
ST_retcode MyV(ST_double *bmat, ST_double *vmat, ST_int first, ST_int last,
    ST_int nVars, ST_int nObs)  ;
ST_retcode CopyCtoStataMatrix(ST_double *cmat, char *smat, ST_int r, ST_int c) ;
void       DivideByScalar(ST_double *mat, ST_int r, ST_int c, ST_double val)  ;

Done and undone

I showed how to implement a C plugin that does the calculations performed by Mata work functions in mymean10.ado and mymean11.ado, as discussed in program 29 post.

In the next post, I show how to implement these calculations in a C++ plugin.

Appendix

In the text, I showed how to compile and link a plugin on an OS 10 Mac using the
command-line developer tools. Here I give the commands for the gcc compiler on Windows 10 and on RedHat Linux.

Windows 10

This subsection provides the commands to compile and link the plugins in a Cygwin environment on a 64-bit Windows 10 system. Unlike the other platforms, we cannot just use gcc. In Cygwin, gcc compiles applications to run in the Cygwin POSIX/Unix environment. We want to use Cygwin to compile a library that will link to, and run in, a native Windows application. Cygwin has minimalist GNU compilers for Windows (MinGW) that will do what we want. The name of the appropriate compiler is platform dependent. On my 64-bit, x86-Intel machine, I used the x86_64-w64-mingw32-gcc compiler.

hello.plugin

In a directory containing stplugin.h, stplugin.c, and hello.c, create hello.plugin by typing

x86_64-w64-mingw32-gcc -shared -mno-clwb stplugin.c hello.c -o hello.plugin

mylistw.plugin

In a directory that contains stplugin.h, stplugin.c, and mylistw.c, create mylistw.plugin by typing

x86_64-w64-mingw32-gcc -shared -mno-clwb stplugin.c mylistw.c -o mylistw.plugin

mycalcs.plugin

In a directory that contains stplugin.c, stplugin.h, mycalcs.c, mycalcsw.h, and mycalcsw.c, create mycalcs.plugin by typing

x86_64-w64-mingw32-gcc -shared -mno-clwb stplugin.c mycalcsw.c mycalcs.c -o mycalcs.plugin

RedHat Linux

This subsection provides the gcc commands to compile and link plugins on RedHat Linux.

hello.plugin

In a directory containing stplugin.h, stplugin.c, and hello.c, create hello.plugin by typing

gcc -shared -fPIC -DSYSTEM=OPUNIX stplugin.c hello.c -o hello.plugin

mylistw.plugin

In a directory that contains stplugin.h, stplugin.c, and mylistw.c, create mylistw.plugin by typing

gcc -shared -fPIC -DSYSTEM=OPUNIX stplugin.c mylistw.c -o mylistw.plugin

mycalcs.plugin

In a directory that contains stplugin.c, stplugin.h, mycalcs.c, mycalcsw.h, and mycalcsw.c, create mycalcs.plugin by typing

gcc -shared -fPIC -DSYSTEM=OPUNIX stplugin.c mycalcsw.c mycalcs.c -o mycalcs.plugin