\(
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)I show how to use optimize() in Mata to maximize a Poisson log-likelihood function and to obtain estimators of the variance–covariance of the estimator (VCE) based on independent and identically distributed (IID) observations or on robust methods.
This is the eighteenth post in the series Programming an estimation command in Stata.  I recommend that you start at the beginning.  See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
Using optimize()
There are many optional choices that one may make when solving a nonlinear optimization problem, but there are very few that one must make.  The optimize*() functions in Mata handle this problem by making a set of default choices for you, requiring that you specify a few things, and allowing you to change any of the default choices.
When I use optimize() to solve a Read more…
			
		 
		
	 
	
		
		
		
			\(\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\xb}{{\bf x}}
\newcommand{\yb}{{\bf y}}
\newcommand{\gb}{{\bf g}}
\newcommand{\Hb}{{\bf H}}
\newcommand{\thetab}{\boldsymbol{\theta}}
\newcommand{\Xb}{{\bf X}}
\)I review the theory behind nonlinear optimization and get more practice in Mata programming by implementing an optimizer in Mata. In real problems, I recommend using the optimize() function or moptimize() function instead of the one I describe here. In subsequent posts, I will discuss optimize() and moptimize(). This post will help you develop your Mata programming skills and will improve your understanding of how optimize() and moptimize() work.
This is the seventeenth post in the series Programming an estimation command in Stata.  I recommend that you start at the beginning.  See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
A quick review of nonlinear optimization
We want to maximize a real-valued function \(Q(\thetab)\), where \(\thetab\) is a \(p\times 1\) vector of parameters. Minimization is done by maximizing \(-Q(\thetab)\). We require that \(Q(\thetab)\) is twice, continuously differentiable, so that we can use a second-order Taylor series to approximate \(Q(\thetab)\) in a neighborhood of the point \(\thetab_s\),
\[
Q(\thetab) \approx Q(\thetab_s) + \gb_s'(\thetab -\thetab_s)
+ \frac{1}{2} (\thetab -\thetab_s)’\Hb_s (\thetab -\thetab_s)
\tag{1}
\]
where \(\gb_s\) is the \(p\times 1\) vector of first derivatives of \(Q(\thetab)\) evaluated at \(\thetab_s\) and \(\Hb_s\) is the \(p\times p\) matrix of second derivatives of \(Q(\thetab)\) evaluated at \(\thetab_s\), known as the Hessian matrix.
Nonlinear maximization algorithms start with Read more…