\(

\newcommand{\xb}{{\bf x}}

\newcommand{\betab}{\boldsymbol{\beta}}\)I show how to use **optimize()** in Mata to maximize a Poisson log-likelihood function and to obtain estimators of the variance–covariance of the estimator (**VCE**) based on independent and identically distributed (**IID**) observations or on robust methods.

This is the eighteenth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**Using optimize()**

There are many optional choices that one may make when solving a nonlinear optimization problem, but there are very few that one must make. The **optimize*()** functions in Mata handle this problem by making a set of default choices for you, requiring that you specify a few things, and allowing you to change any of the default choices.

When I use **optimize()** to solve a Read more…

\(\newcommand{\betab}{\boldsymbol{\beta}}

\newcommand{\xb}{{\bf x}}

\newcommand{\yb}{{\bf y}}

\newcommand{\gb}{{\bf g}}

\newcommand{\Hb}{{\bf H}}

\newcommand{\thetab}{\boldsymbol{\theta}}

\newcommand{\Xb}{{\bf X}}

\)I review the theory behind nonlinear optimization and get more practice in Mata programming by implementing an optimizer in Mata. In real problems, I recommend using the **optimize()** function or **moptimize()** function instead of the one I describe here. In subsequent posts, I will discuss **optimize()** and **moptimize()**. This post will help you develop your Mata programming skills and will improve your understanding of how **optimize()** and **moptimize()** work.

This is the seventeenth post in the series **Programming an estimation command in Stata**. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

**A quick review of nonlinear optimization**

We want to maximize a real-valued function \(Q(\thetab)\), where \(\thetab\) is a \(p\times 1\) vector of parameters. Minimization is done by maximizing \(-Q(\thetab)\). We require that \(Q(\thetab)\) is twice, continuously differentiable, so that we can use a second-order Taylor series to approximate \(Q(\thetab)\) in a neighborhood of the point \(\thetab_s\),

\[

Q(\thetab) \approx Q(\thetab_s) + \gb_s'(\thetab -\thetab_s)

+ \frac{1}{2} (\thetab -\thetab_s)’\Hb_s (\thetab -\thetab_s)

\tag{1}

\]

where \(\gb_s\) is the \(p\times 1\) vector of first derivatives of \(Q(\thetab)\) evaluated at \(\thetab_s\) and \(\Hb_s\) is the \(p\times p\) matrix of second derivatives of \(Q(\thetab)\) evaluated at \(\thetab_s\), known as the Hessian matrix.

Nonlinear maximization algorithms start with Read more…