## Programming an estimation command in Stata: Mata functions

I show how to write a function in Mata, the matrix programming language that is part of Stata. This post uses concepts introduced in Programming an estimation command in Stata: Mata 101.

This is the twelfth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Mata functions**

Commands do work in Stata. Functions do work in Mata. Commands operate on Stata objects, like variables, and users specify options to alter the behavior. Mata functions accept arguments, operate on the arguments, and may return a result or alter the value of an argument to contain a result.

Consider **myadd()** defined below.

**Code block 1: myadd()**

mata: function myadd(X, Y) { A = X + Y return(A) } end

**myadd()** accepts two arguments, **X** and **Y**, puts the sum of **X** and **Y** into **A**, and returns **A**. For example,

**Example 1: Defining and using a function**

. mata: ------------------------------------------------- mata (type end to exit) ------ : function myadd(X, Y) > { > A = X + Y > return(A) > } : C = J(3, 3, 4) : D = I(3) : W = myadd(C,D) : W [symmetric] 1 2 3 +-------------+ 1 | 5 | 2 | 4 5 | 3 | 4 4 5 | +-------------+ : end --------------------------------------------------------------------------------

After defining **myadd()**, I create the matrices **C** and **D**, and I pass **C** and **D** into **myadd()**, which returns their sum. Mata assigns the returned sum to **W**, which I display. Note that inside the function **myadd()**, **C** and **D** are respectively known as **X** and **Y**.

The **A** created in **myadd()** can only be accessed inside **myadd()**, and it does not conflict with an **A** defined outside **myadd()**; that is, **A** is local to the function **myadd()**. I now illustrate that **A** is local to **myadd()**.

**Example 2: A is local to myadd()**

. mata: ------------------------------------------------- mata (type end to exit) ------ : A = J(3, 3, 4) : A [symmetric] 1 2 3 +-------------+ 1 | 4 | 2 | 4 4 | 3 | 4 4 4 | +-------------+ : W = myadd(A,D) : A [symmetric] 1 2 3 +-------------+ 1 | 4 | 2 | 4 4 | 3 | 4 4 4 | +-------------+ : end --------------------------------------------------------------------------------

Having illustrated that the **A** defined inside **myadd()** is local to **myadd()**, I should point out that the **C** and **D** matrices I defined in example 1 are in global Mata memory. As in ado-programs, we do not want to use fixed names in global Mata memory in our programs so that we do not destroy the users’ data. Fortunately, this problem is easily solved by writing Mata functions that write their results out to Stata and do not return results. I will provide detailed discussions of this solution in the commands that I develop in subsequent posts.

When a Mata function changes the value of an argument inside the function, that changes the value of that argument outside the function; in other words, arguments are passed by address. Mata functions can compute more than one result by storing these results in arguments. For example, **sumproduct()** returns both the sum and the element-wise product of two matrices.

**Code block 2: sumproduct()**

function sumproduct(X, Y, S, P) { S = X + Y P = X :* Y return }

**sumproduct()** returns the sum of the arguments **X** and **Y** in the argument **S** and the element-wise product in **P**.

**Example 3: Returning results in arguments**

. mata: ------------------------------------------------- mata (type end to exit) ------ : function sumproduct(X, Y, S, P) > { > S = X + Y > P = X :* Y > return > } : A = I(3) : B = rowshape(1::9, 3) : A [symmetric] 1 2 3 +-------------+ 1 | 1 | 2 | 0 1 | 3 | 0 0 1 | +-------------+ : B 1 2 3 +-------------+ 1 | 1 2 3 | 2 | 4 5 6 | 3 | 7 8 9 | +-------------+ : W=. : W . : Z=. : Z . : sumproduct(A, B, W, Z) : W 1 2 3 +----------------+ 1 | 2 2 3 | 2 | 4 6 6 | 3 | 7 8 10 | +----------------+ : Z [symmetric] 1 2 3 +-------------+ 1 | 1 | 2 | 0 5 | 3 | 0 0 9 | +-------------+ : end --------------------------------------------------------------------------------

After defining **sumproduct()**, I use **I()** to create **A** and use **rowshape()** to create **B**. I then create **W** and **Z**; each is a scalar missing value. I must create **W** and **Z** before I pass them as arguments; otherwise, I would be referencing arguments that do not exist. After calling **sumproduct()**, I display **W** and **Z** to illustrate that they now contain the sum and element-wise product of **X** and **Y**.

In **myadd()** and **sumproduct()**, I did not specify what type of thing each argument must be, nor did I specify what type of thing each function would return. In other words, I used implicit declarations. Implicit declarations are easier to type than explicit declarations, but they make error messages and code less informative. I highly recommend explicitly declaring returns, arguments, and local variables to make your code and error messages more readable.

**myadd2()** is a version of **myadd()** that explicitly declares the type of thing returned, the type of thing that each argument must be, and the type that each local-to-the-function thing must be.

**Code block 3: myadd2(): Explicit declarations**

mata: numeric matrix myadd2(numeric matrix X, numeric matrix Y) { numeric matrix A A = X + Y return(A) } end

**myadd2()** returns a **numeric matrix** that it constructs by adding the **numeric matrix** **X** to the **numeric matrix** **Y**. The local-to-the-function object **A** is also a **numeric matrix**. A **numeric matrix** is either a **real matrix** or a **complex matrix**; it cannot be a **string matrix**.

Explicitly declaring returns, arguments, and local variables makes the code more informative. I immediately see that **myadd2()** does not work with string matrices, but this property is buried in the code for **myadd()**.

I cannot sufficiently stress the importance of writing easy-to-read code. Reading other people’s code is an essential part of programming. It is instructional, and it allows you to adopt the solutions that others have found or implemented. If you are new to programming, you may not yet realize that after a few months, reading your own code is like reading someone else’s code. Even if you never give your code to anyone else, it is essential that you write easy-to-read code so that you can read it at a later date.

Explicit declarations also make some error messages easier to track down. In examples 4 and 5, I pass a string matrix to **myadd()** and to **myadd2()**, respectively.

**Example 4: Passing a string matrix to myadd()**

. mata: ------------------------------------------------- mata (type end to exit) ------ : B = I(3) : C = J(3,3,"hello") : myadd(B,C) myadd(): 3250 type mismatch: - function returned error (0 lines skipped) -------------------------------------------------------------------------------- r(3250);

**Example 5: Passing a string matrix to myadd2()**

. mata: ------------------------------------------------- mata (type end to exit) ------ : B = I(3) : C = J(3,3,"hello") : numeric matrix myadd2(numeric matrix X, numeric matrix Y) > { > numeric matrix A > > A = X + Y > return(A) > } : myadd2(B,C) myadd2(): 3251 C[3,3] found where real or complex required: - function returned error (0 lines skipped) -------------------------------------------------------------------------------- r(3251); end of do-file

The error message in example 4 indicates that somewhere in **myadd()**, an operator or a function could not perform something on two objects because their types were not compatible. Do not be deluded by the simplicity of **myadd()**. Tracking down a **type mismatch** in real code can be difficult.

In contrast, the error message in example 5 says that the matrix **C** we passed to **myadd2()** is neither a real nor a complex matrix like the argument of **myadd2()** requires. Looking at the code and the error message immediately informs me that the problem is that I passed a string matrix to a function that requires a numeric matrix.

Explicit declarations are so highly recommended that Mata has a setting to require it, as illustrated below.

**Example 6: Turning on matastrict**

. mata: mata set matastrict on

Setting **matastrict** to **on** causes the Mata compiler to require that return and local variables be explicitly declared. By default, **matastrict** is set to **off**, in which case return and local variables may be implicitly declared.

When **matastrict** is set to **on**, arguments are not required to be explicitly declared because some arguments hold results whose input and output types could differ. Consider **makereal()** defined and used in example 7.

**Example 7: Changing an arguments type**

. mata: ------------------------------------------------- mata (type end to exit) ------ : void makereal(A) > { > A = substr(A, 11,1) > A = strtoreal(A) > } : A = J(2,2, "Number is 4") : A [symmetric] 1 2 +-----------------------------+ 1 | Number is 4 | 2 | Number is 4 Number is 4 | +-----------------------------+ : makereal(A) : A + I(2) [symmetric] 1 2 +---------+ 1 | 5 | 2 | 4 5 | +---------+ : end --------------------------------------------------------------------------------

The declaration of **makereal()** specifies that **makereal()** returns nothing because **void** comes before the name of the function. Even though **matastrict** is set to **on**, I did not declare what type of thing **A** must be. The two executable lines of **makereal()** clarify that **A** must be a string on input and that **A** will be real on output, which I subsequently illustrate.

I use the feature that arguments may be implicitly declared to make my code easier to read. Many of the Mata functions that I write replace arguments with results. I explicitly declare arguments that are inputs, and I implicitly declare arguments that contain outputs. Consider **sumproduct2()**.

**Code block 4: sumproduct2(): Explicit declarations of inputs but not outputs**

void sumproduct2(real matrix X, real matrix Y, S, P) { S = X + Y P = X :* Y return }

**sumproduct2()** returns nothing because **void** comes before the function name. The first argument **X** is **real matrix**, the second argument **Y** is a **real matrix**, the third argument **S** is implicitly declared, and the fourth argument **P** is implicitly declared. My coding convention that inputs are explicitly declared and that outputs are implicitly declared immediately informs me that **X** and **Y** are inputs but that **S** and **P** are outputs. That **X** and **Y** are inputs and that **S** and **P** are outputs is illustrated in example 8.

**Example 8: Explicitly declaring inputs but not outputs**

. mata: ------------------------------------------------- mata (type end to exit) ------ : void sumproduct2(real matrix X, real matrix Y, S, P) > { > S = X + Y > P = X :* Y > return > } : A = I(2) : B = rowshape(1::4, 2) : C = . : D = . : sumproduct2(A, B, C, D) : C 1 2 +---------+ 1 | 2 2 | 2 | 3 5 | +---------+ : D [symmetric] 1 2 +---------+ 1 | 1 | 2 | 0 4 | +---------+ : end --------------------------------------------------------------------------------

**Done and undone**

I showed how to write a function in Mata and discussed declarations in some detail. Type help m2_declarations for many more details.

In my next post, I use Mata functions to perform the computations for a simple estimation command.

Pingback: The Stata Blog » Programming an estimation command in Stata: A first ado-command using Mata()