## Programming an estimation command in Stata: Consolidating your code

\(

\newcommand{\xb}{{\bf x}}

\newcommand{\gb}{{\bf g}}

\newcommand{\Hb}{{\bf H}}

\newcommand{\Gb}{{\bf G}}

\newcommand{\Eb}{{\bf E}}

\newcommand{\betab}{\boldsymbol{\beta}}

\)I write ado-commands that estimate the parameters of an exponential conditional mean (ECM) model and a probit conditional mean (PCM) model by nonlinear least squares, using the methods that I discussed in the post Programming an estimation command in Stata: Nonlinear least-squares estimators. These commands will either share lots of code or repeat lots of code, because they are so similar. It is almost always better to share code than to repeat code. Shared code only needs to be changed in one place to add a feature or to fix a problem; repeated code must be changed everywhere. I introduce Mata libraries to share Mata functions across ado-commands, and I introduce wrapper commands to share ado-code.

This is the 27th post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Ado-commands for ECM and PCM models**

I now convert the examples of NLS for ECM and PCM models discussed in Programming an estimation command in Stata: Nonlinear least-squares estimators to ado-commands. To keep things simple, the ado-commands discussed here omit many standard features such as factor variables, time-series variables, robust estimators of the VCE, and cluster–robust estimators of the VCE.

**mynlexp1** implements an NLS estimator for the parameters of the ECM model.

**Code block 1: mynlexp1.ado**

*! version 1.0.0 09May2016 program define mynlexp1, eclass sortpreserve version 14.1 syntax varlist [if] [in] [, noCONStant ] marksample touse gettoken depvar indeps : varlist tempname b V N rank mata: mywork("`depvar'", "`indeps'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'" ) if "`constant'" == "" { local indeps "`indeps' _cons" } matrix colnames `b' = `indeps' matrix colnames `V' = `indeps' matrix rownames `V' = `indeps' ereturn post `b' `V', esample(`touse') ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mynlexp" ereturn display end mata: void MYNLExp(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector r, f, xb real matrix df xb = X*b' f = exp(xb) r = y - f val = -(r:^2) df = f:*X if (todo>=1) { grad = r:*df } if (todo==2) { hess = -1*quadcross(df, df) } } void mywork( string scalar depvar, string scalar indeps, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname) { real vector y, b real matrix X, V real scalar n, p, ssr transmorphic S y = st_data(., depvar, touse) n = rows(y) X = st_data(., indeps, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &MYNLExp()) optimize_init_params(S, J(1, p, .01)) optimize_init_evaluatortype(S, "gf2") optimize_init_conv_vtol(S, 1e-10) b = optimize(S) V = invsym(-1*optimize_result_Hessian(S)) ssr = (-1/(n-p))*optimize_result_value(S) V = ssr*V st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, p) } end

Lines 2–29 define the ado-command, which uses the Mata work function **mywork()** defined on lines 54–89. Lines 33–52 define the evaluator function **MYNLExp()** used by **optimize()** in **mywork()**. This structure should be familiar from the Poisson regression examples previously

discussed.

**mynlprobit1** implements an NLS estimator for the parameters of the PCM model.

**Code block 2: mynlprobit1.ado**

*! version 1.0.0 09May2016 program define mynlprobit1, eclass sortpreserve version 14.1 syntax varlist [if] [in] [, noCONStant ] marksample touse gettoken depvar indeps : varlist tempname b V N rank mata: mywork("`depvar'", "`indeps'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'" ) if "`constant'" == "" { local indeps "`indeps' _cons" } matrix colnames `b' = `indeps' matrix colnames `V' = `indeps' matrix rownames `V' = `indeps' ereturn post `b' `V', esample(`touse') ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mynlexp" ereturn display end mata: void MYNLProbit(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector r, f, xb real matrix df xb = X*b' f = normal(xb) r = y - f val = -(r:^2) df = normalden(xb):*X if (todo>=1) { grad = r:*df } if (todo==2) { hess = -1*quadcross(df, df) } } void mywork( string scalar depvar, string scalar indeps, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname) { real vector y, b real matrix X, V real scalar n, p, ssr transmorphic S y = st_data(., depvar, touse) n = rows(y) X = st_data(., indeps, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &MYNLProbit()) optimize_init_params(S, J(1, p, .01)) optimize_init_evaluatortype(S, "gf2") optimize_init_conv_vtol(S, 1e-10) b = optimize(S) V = invsym(-1*optimize_result_Hessian(S)) ssr = (-1/(n-p))*optimize_result_value(S) V = ssr*V st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, p) } end

The code for **mynlprobit1** is nearly identical to that for **mynlexp1**.

Duplicated code is dangerous. Anytime you want to add a feature or fix a problem, you must do it twice. I highly recommend that you avoid duplicated code, and I illustrate how by rewriting these commands to have a single Mata code base and then a single ado-code base.

**Libraries of Mata code**

The **mywork()** functions used in **mynlexp1** and **mynlprobit1** differ only in the evaluator function they call; see line 76 in code blocks 1 and 2. I would like to have one **mywork()** function that is called by **mynlexp1** and by **mynlprobit1**.

Mata functions defined at the bottom of an ado-file are local to that ado-file, so I cannot use this method for defining the **mywork()** function that will be used by both **mynlexp1** and **mynlprobit1**. What I need is a file containing compiled Mata functions that are callable from any other Mata function or from within any ado-file or do-file. This type of file is known as a library.

I use **mynllib.mata** in code block 3 to make the library **lmynllib.mlib** containing the compiled Mata functions **MYNLWork()**, **MYNLProbit()**, **MYNLExp()**.

**Code block 3: mynllib.mata**

mata: mata clear void MYNLExp(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector r, f real matrix df f = exp(X*b') r = y - f val = -(r:^2) df = f:*X if (todo>=1) { grad = r:*df } if (todo==2) { hess = -1*quadcross(df, df) } } void MYNLProbit(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector r, f, xb real matrix df xb = X*b' f = normal(xb) r = y - f val = -(r:^2) df = normalden(xb):*X if (todo>=1) { grad = r:*df } if (todo==2) { hess = -1*quadcross(df, df) } } void MYNLWork( string scalar depvar, string scalar indeps, string scalar touse, string scalar constant, string scalar bname, string scalar Vname, string scalar nname, string scalar rname, string scalar model) { real vector y, b real matrix X, V real scalar n, p, ssr string scalar emsg pointer(function) f transmorphic S if (model=="expm") { f = &MYNLExp() } else if (model=="probit") { f = &MYNLProbit() } else { emsg = "{red}model " + model + " invalid\n" printf(emsg) exit(error(498)) } y = st_data(., depvar, touse) n = rows(y) X = st_data(., indeps, touse) if (constant == "") { X = X,J(n, 1, 1) } p = cols(X) S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, f) optimize_init_params(S, J(1, p, .01)) optimize_init_evaluatortype(S, "gf2") optimize_init_conv_vtol(S, 1e-10) b = optimize(S) V = invsym(-1*optimize_result_Hessian(S)) ssr = (-1/(n-p))*optimize_result_value(S) V = ssr*V st_matrix(bname, b) st_matrix(Vname, V) st_numscalar(nname, n) st_numscalar(rname, p) } mata mlib create lmynllib, replace mata mlib add lmynllib MYNLWork() MYNLProbit() MYNLExp() end

Lines 1 and 99 open and close the Mata session in which I define the functions, create the library, and add the functions to the library. Lines 4–22 define **MYNLExp()**, which I have already discussed. Lines 24–43 define **MYNLProbit()**, which I have already discussed. Lines 45–94 define **MYNLWork()**, which is the work function that both **mynlexp2.ado** and **mynlprobit2.ado** will use. Note that I have used uppercase letters in the names **MYNLExp()**, **MYNLProbit()**, **MYNLWork()**. Functions in Mata libraries are global: they can be called from anywhere, and their names must be unique in the space of names for Mata functions. I try to avoid using function names that other programmers might use by prefixing the names of my functions with an uppercase name of the library and beginning the function name with an uppercase letter.

Line 49 specifies that the ninth argument **MYNLWork()** is a **string scalar** known as **model** inside the function. The ado-commands will pass either “expm” or “probit” in this argument.

If **model** contains “expm”, line 60 stores the address of the function **MYNLExp()** in **f**. The variable type that holds the address of a function is known as a pointer to a function. For this reason, line 56 declares **f** to a pointer to a function. If **model** contains “probit”, line 63 stores the address of the function **MYNLProbit()** in **f**. If **model** does not contain “expm” or “probit”, lines 66–68 display an error message and exit.

Pointers hold the address of an object. All I need here is a box that holds the address of the evaluator function corresponding to the model fit by the ado-command that will call **MYNLWork()**. Line 56 declares **f** to be this type of box, lines 60 and 63 store the memory of the correct function in **f**, and line 81 puts the address stored in **f** into the optimize object **S**. Type **help M-2 pointers** to learn more about pointers.

The remaining lines of **MYNLWork()** are the same as the lines in the **mywork()** functions in **mynlexp1** and **mynlprobit1**.

There is still a lot of duplicated code in the evaluator functions **MYNLExp()** and **MYNLProbit()**. Instead of using a pointer to the evaluator function, I could have consolidated the evaluator functions **MYNLExp()** and **MYNLProbit()** into a single function that used an additional argument to decide which case to evaluate. I chose the presented method because it is faster. Consolidating the evaluator functions would have slowed down the function that I most want to speed up. (The evaluator function is called many times by **optimize()**.) So, in this case, I accepted the risk of duplicated code for the advantage of speed.

Line 96 creates the Mata library **lmynllib.mlib** in the current directory, replacing any previously defined version of this library. Line 97 puts the compiled versions of **MYNLWork()**, **MYNLProbit()**, and **MYNLExp()** into **lmynllib.mnlib**. At this point, the file **lmynllib.mlib** in the current directory contains the compiled functions **MYNLWork()**, **MYNLProbit()**, and **MYNLExp()**.

**mynllib.mata** is a do-file that makes a Mata library, hence the **.mata** suffix instead of the **.do** suffix. I can execute it by typing **do mynllib.mata**. Example 1 makes the library **lmynllib.mlib**.

**Example 1: Making a Mata library**

. program drop _all . mata: mata clear . quietly do mynllib.mata . mata: mata mlib index .mlib libraries to be searched are now lmatabase;lmatapss;lmataado;lmatapostest;lmatafc;lmatasem;lmatapath; > lmatamcmc;lmatagsem;lmataopt;lmynllib;lfreduse;lpoparms;lspmat

After dropping all the ado-commands in memory, and clearing Mata, I used **quietly do mynllib.mata** to make the library, because I do not want to see the code again. **mata: mata mlib index** updates the list libraries known to Mata; this step adds **lmynllib.mlib** to the list of known libraries so that I can use the functions therein defined.

Having made **lmynllib.mlib** and added it to the list of known libraries, I can use the functions therein defined in one-line Mata calls in my ado-commands. Consider **mynlexp2**.

**Code block 4: mynlexp2.ado**

*! version 2.0.0 10May2016 program define mynlexp2, eclass sortpreserve version 14.1 syntax varlist [if] [in] [, noCONStant ] marksample touse gettoken depvar indeps : varlist tempname b V N rank mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "expm" ) if "`constant'" == "" { local indeps "`indeps' _cons" } matrix colnames `b' = `indeps' matrix colnames `V' = `indeps' matrix rownames `V' = `indeps' ereturn post `b' `V', esample(`touse') ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mynlexp" ereturn display end

The code for **mynlexp2** is almost the same as the ado-code for **mynlexp1** in code block 1. The only differences are that line 12 calls **MYNLWork()** instead of **mywork()** and that **MYNLWork()** accepts a ninth argument, specified on line 13 to be “expm”.

Now consider **mynlprobit2** in code block 5.

**Code block 5: mynlprobit2.ado**

*! version 2.0.0 10May2016 program define mynlprobit2, eclass sortpreserve version 14.1 syntax varlist [if] [in] [, noCONStant ] marksample touse gettoken depvar indeps : varlist tempname b V N rank mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "probit" ) if "`constant'" == "" { local indeps "`indeps' _cons" } matrix colnames `b' = `indeps' matrix colnames `V' = `indeps' matrix rownames `V' = `indeps' ereturn post `b' `V', esample(`touse') ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "mynlprobit" ereturn display end

The analogous changes are made to **mynlprobit2** that were made to **mynlexp2**. In particular, note that line 13 passes “probit” in the ninth argument to **MYNLWork()**.

**Writing a work ado-command**

The Mata library allowed me to consolidate the duplicated Mata code. I still have lots of duplicated ado-code. To consolidate the duplicated ado-code, I have **mynlexp3** and **mynlprobit3** call a single ado-command that does the work as seen in code blocks 6 and 7.

**Code block 6: mynlexp3.ado**

*! version 3.0.0 11May2016 program define mynlexp3 version 14.1 mynlwork expm `0' end

**Code block 7: mynlprobit3.ado**

*! version 3.0.0 11May2016 program define mynlprobit3, eclass sortpreserve version 14.1 mynlwork probit `0' end

Recall that whatever the user specified is contained in the local macro **0**. Line 5 of **mynlexp3** passes whatever the user specified prefixed with “expm” to **mynlwork**. Similarly, line 5 of **mynlprobit3** passes whatever the user specified prefixed with “probit” to **mynlwork**. **mynlexp3** and **mynlprobit3** are known as wrapper commands, because they just wrap calls to **mynlwork**, which does the actual work.

Code block 8 contains the code for **mynlwork**.

**Code block 8: mynlwork.ado**

*! version 1.0.0 11May2016 program define mynlwork, eclass sortpreserve version 14.1 gettoken model 0 : 0 if "`model'" == "expm" { local cname "mynlexp" } else if "`model'" == "probit" { local cname "mynlprobit" } else { diplay "{red}model `model' invalid" exit 498 } syntax varlist [if] [in] [, noCONStant ] marksample touse gettoken depvar indeps : varlist tempname b V N rank mata: MYNLWork("`depvar'", "`indeps'", "`touse'", "`constant'", /// "`b'", "`V'", "`N'", "`rank'", "`model'" ) if "`constant'" == "" { local indeps "`indeps' _cons" } matrix colnames `b' = `indeps' matrix colnames `V' = `indeps' matrix rownames `V' = `indeps' ereturn post `b' `V', esample(`touse') ereturn scalar N = `N' ereturn scalar rank = `rank' ereturn local cmd "`cname'" ereturn display end

Line 5 uses **gettoken** to put the model prefixed to the user input by **mynlexp3** or **mynlprobit3** in the local macro **model** and to put whatever the user specified in the local macro **0**. Lines 7–16 put the name of the calling command in the local macro **cname** or exit with an error message, if the model is not recognized. In theory, the error case handled in lines 13–16 is not necessary, because I should know how to call my own command. Experience has taught me that handling these extra error cases makes changing the code in the future much easier, so I consider this good practice.

By line 17, the local macro **model** and the local macro **cname** contain all that differs between the cases handled by **mynlexp3** and **mynlprobit3**. **model** is passed to **MYNLWork()** on line 26, and **cname** is used to store the command name in **e(cmd)** on line 38.

Examples 2 and 3 illustrate that **mynlexp3** and **mynlprobit3** produce output corresponding to the results produced by **nl** in examples 2 and 4 in Programming an estimation command in Stata: Nonlinear least-squares estimators.

**Example 2: mynlexp3 output**

. mynlexp3 accidents cvalue tickets Iteration 0: f(p) = -2530.846 Iteration 1: f(p) = -1116.4901 Iteration 2: f(p) = -248.56923 Iteration 3: f(p) = -225.91644 Iteration 4: f(p) = -225.89573 Iteration 5: f(p) = -225.89573 Iteration 6: f(p) = -225.89573 ---------------------------------------------------------------------------- | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------+---------------------------------------------------------------- cvalue | .1759434 .0323911 5.43 0.000 .1124581 .2394287 tickets | 1.447672 .0333599 43.40 0.000 1.382287 1.513056 _cons | -7.660608 .2355725 -32.52 0.000 -8.122322 -7.198894 ----------------------------------------------------------------------------

**Example 3: mynlprobit3 output**

. mynlprobit3 hadaccident cvalue tickets Iteration 0: f(p) = -132.90997 Iteration 1: f(p) = -16.917203 Iteration 2: f(p) = -10.995001 Iteration 3: f(p) = -10.437501 Iteration 4: f(p) = -10.427738 Iteration 5: f(p) = -10.427156 Iteration 6: f(p) = -10.427123 Iteration 7: f(p) = -10.427121 Iteration 8: f(p) = -10.42712 Iteration 9: f(p) = -10.42712 Iteration 10: f(p) = -10.42712 ---------------------------------------------------------------------------- | Coef. Std. Err. z P>|z| [95% Conf. Interval] -----------+---------------------------------------------------------------- cvalue | .3616322 .0918214 3.94 0.000 .1816656 .5415988 tickets | 2.177509 .1974173 11.03 0.000 1.790578 2.56444 _cons | -10.95166 1.05565 -10.37 0.000 -13.02069 -8.882622 ----------------------------------------------------------------------------

**Done and undone**

It is almost always better to share code than to repeat code. Shared code only needs to be changed in one place to add a feature or to fix a problem; repeated code must be changed everywhere. I introduced Mata libraries to share Mata functions across ado-commands, and I introduced wrapper commands to share ado-code.

Pingback: The Stata Blog » Programming an estimation command in Stata: A map to posted entries()