Programming an estimation command in Stata: Handling factor variables in optimize()
\(
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)I discuss a method for handling factor variables when performing nonlinear optimization using optimize(). After illustrating the issue caused by factor variables, I present a method and apply it to an example using optimize().
This is the twenty post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.
How poisson handles factor variables
Consider the Poisson regression in which I include a full set of indicator variables created from the categorical variable kids and a constant term.
Example 1: Collinear factor variables
. clear all . use accident3 . poisson accidents cvalue ibn.kids traffic, coeflegend note: 3.kids omitted because of collinearity Iteration 0: log likelihood = -546.35782 Iteration 1: log likelihood = -545.11016 Iteration 2: log likelihood = -545.10898 Iteration 3: log likelihood = -545.10898 Poisson regression Number of obs = 505 LR chi2(5) = 361.62 Prob > chi2 = 0.0000 Log likelihood = -545.10898 Pseudo R2 = 0.2491 ------------------------------------------------------------------------------ accidents | Coef. Legend -------------+---------------------------------------------------------------- cvalue | -.6582924 _b[cvalue] | kids | 0 | 3.233932 _b[0bn.kids] 1 | 1.571582 _b[1.kids] 2 | 1.659241 _b[2.kids] 3 | 0 _b[3o.kids] | traffic | .1383977 _b[traffic] _cons | -2.518175 _b[_cons] ------------------------------------------------------------------------------
The full set of indicator variables is collinear with the constant term. The output shows that no variables were dropped but the name 3o.kids specifies that 3.kids was omitted. Omitted variables are not dropped; instead their coefficients are constrained to zero.
Specifying variables as omitted instead of dropping them allows postestimation features such as margins to work properly.
For the case in example 1, poisson is maximizing the log-likelihood function subject to constraint that \(\beta_{3.kids}=0\). In terms of the parameter vector \(\betab\), I represent this constraint by
\[
\left[\begin{matrix}
0&0&0&0&1&0&0
\end{matrix}\right]
\betab’ = 0
\]
where \(\betab=(
\betab_{cvalue},
\betab_{0.kids},
\betab_{1.kids},
\betab_{2.kids},
\betab_{3.kids},
\betab_{traffic},
\betab_{\_cons})\)
In general, I can represent \(q\) linear equality constraints on a \(1\times k\) parameter vector as
\[
{\bf C}\betab’ = {\bf c}
\]
where \({\bf C}\) is a \(q\times k\) matrix and \({\bf c}\) is a \(q\times 1\) vector. These constraints are conveniently represented as \(\widetilde{\bf C}=\left[{\bf C},{\bf c}\right]\).
I now show how to use optimize() to solve optimization problems subject to linear equality constraints by putting \(\widetilde{\bf C}\) into the optimize object. In code block 1, I use optimize() to maximize the Poisson log-likelihood function for the problem in example 1. Code block 1 augments example 3 in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters by using optimize_init_constraints() to impose a linear equality constraint on the coefficient vector.
Code block 1: Linear equality constraints in optimize()
mata: void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = -exp(xb) + y:*xb - lnfactorial(y) } y = st_data(., "accidents") X = st_data(., "cvalue ibn.kids traffic") X = X,J(rows(X), 1, 1) C = e(5, 7) c = 0 Ct = C,c S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf0") optimize_init_params(S, J(1, 7, .01)) optimize_init_constraints(S, Ct) bh = optimize(S) optimize_result_params(S) end
Only line 13, lines 16–18, and line 26 differ from the code in example 3 in Programming an estimation command in Stata: Using optimize() to estimate Poisson parameters. Line 13 illustrates that st_data() can create the indicator variables from the factor variable ibn.kids. Lines 16–18 define the constraint matrix \(\widetilde{\bf C}\) for this problem. Line 26 puts \(\widetilde{\bf C}\) into the optimize() object S, which causes optimize() to maximize the Poisson log-likelihood function with evaluator plleval3() subject to constraints specified in matrix Ct.
Example 2 illustrates that code block 1 reproduces the point estimates reported in example 1.
Example 2: Linear equality constraints in optimize()
. do pc . mata: ------------------------------------------------- mata (type end to exit) ----- : void plleval3(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb > > xb = X*b' > val = -exp(xb) + y:*xb - lnfactorial(y) > } note: argument todo unused note: argument grad unused note: argument hess unused : : y = st_data(., "accidents") : X = st_data(., "cvalue ibn.kids traffic") : X = X,J(rows(X), 1, 1) : : C = e(5, 7) : c = 0 : Ct = C,c : : S = optimize_init() : optimize_init_argument(S, 1, y) : optimize_init_argument(S, 2, X) : optimize_init_evaluator(S, &plleval3()) : optimize_init_evaluatortype(S, "gf0") : optimize_init_params(S, J(1, 7, .01)) : optimize_init_constraints(S, Ct) : : bh = optimize(S) Iteration 0: f(p) = -845.47138 Iteration 1: f(p) = -572.68676 Iteration 2: f(p) = -545.68381 Iteration 3: f(p) = -545.11241 Iteration 4: f(p) = -545.10898 Iteration 5: f(p) = -545.10898 : optimize_result_params(S) 1 2 3 4 +------------------------------------------------------------- 1 | -.6582923624 3.233932519 1.571581623 1.65924145 +------------------------------------------------------------- 5 6 7 ----------------------------------------------+ 1 0 .13839766 -2.518174926 | ----------------------------------------------+ : end ------------------------------------------------------------------------------- . end of do-file
Code block 1 shows how to use a linear equality constraint to handle collinear variables when we know which variables are omitted. In the code for an estimation command, we must
- find which variables will be omitted, and
- create the constraint matrix \(\widetilde{\bf C}\) that imposes the constraints implied by omitting these variables.
Example 3 illustrates that _rmcoll stores a list of variables that identifies which variables will be omitted in r(varlist), thereby solving problem 1.
Example 3: Using _rmcoll to identify omitted variables
. _rmcoll cvalue ibn.kids traffic, expand note: 3.kids omitted because of collinearity . return list scalars: r(k_omitted) = 1 macros: r(varlist) : "cvalue 0bn.kids 1.kids 2.kids 3o.kids traffic" . local cnames "`r(varlist)' _cons"
I specified the option expand so that _rmcoll would expand any factor variables. The expanded variable list in the local r(varlist) identifies 3.kids as a variable that must be omitted. I then put this expanded variable list, augmented by the name _cons, in the local macro cnames.
Here is an outline for the solution to problem 2 that I present in examples 4–6.
- In example 4, I create the Stata vector bt, whose column names are contained in cnames.
- In example 5, I use _ms_omit_info to create the Stata vector bto, which indicates which variables will be omitted from bt.
- In example 6, I create a Mata matrix specifying the constraints from bto.
Now for the details, beginning with example 4.
Example 4: Putting the coefficient names on a Stata vector
. matrix bt = J(1, 7, 0) . matrix colnames bt = `cnames' . matrix list bt bt[1,7] 0. 1. 2. 3o. cvalue kids kids kids kids traffic _cons r1 0 0 0 0 0 0 0
cnames contains the names of the coefficients for this problem, so I create a conformable row vector bt, make cnames the column names on bt, and display bt. The values in bt do not matter; the column names are the important information.
In example 5, _ms_omit_info uses the column names on a Stata vector to create the vector r(omit), which specifies which variables are omitted.
Example 5: Creating a vector that indicates omitted variables
. matrix bt = J(1, 8, 0) . matrix colnames bt = `cnames' . _ms_omit_info bt . return list scalars: r(k_omit) = 1 matrices: r(omit) : 1 x 8 . matrix bto = r(omit) . matrix list bto bto[1,8] c1 c2 c3 c4 c5 c6 c7 c8 r1 0 0 0 0 1 0 0 0
An element of r(omit) is 1 if the corresponding variable is omitted. An element of r(omit) is 0 if the corresponding variable is not omitted. I put a copy of r(omit) in bto.
In example 6, I create a constraint matrix from bto. The loop in example 6 will create the constraint matrix implied by any r(omit) vector created by _ms_omit_info.
Example 6: Creating a constraint matrix from r(omit)
. mata: ------------------------------------------------- mata (type end to exit) ----- : mo = st_matrix("bto") : ko = sum(mo) : p = cols(mo) : if (ko>0) { > Cm = J(0, p, .) > for(j=1; j<=p; j++) { > if (mo[j]==1) { > Cm = Cm \ e(j, p) > } > } > Cm = Cm, J(ko, 1, 0) > } > else { > Cm = J(0,p+1,.) > } : "Constraint matrix is " Constraint matrix is : Cm 1 2 3 4 5 6 7 8 9 +-------------------------------------+ 1 | 0 0 0 0 1 0 0 0 0 | +-------------------------------------+ : end -------------------------------------------------------------------------------
After copying bto to the Mata vector mo, I put the number of constraints in the scalar ko and the number of parameters in p. If there are constraints, I initialize Cm to be a matrix with zero rows and p columns, use a for loop to iteratively append a new row corresponding to each constraint identified in mo, and finish by appending a ko \(\times\) 1 column of zeros on to Cm. If there are no constraints, I put a matrix with zero rows and p+1 columns in Cm.
Regardless of whether there are any omitted variables, I can put the Cm matrix created by the method in example 6 into an optimize() object. If there are no omitted variables, Cm will have zero rows, and no constraints will be imposed. If there are omitted variables, Cm will have ko rows, and the constraints for the omitted variables will be imposed.
Code block 2 combines these pieces into a coherent example.
Code block 2: Putting it all together
clear all use accident3 local depvar "accidents" local indepvars "cvalue ibn.kids traffic" _rmcoll `indepvars', expand local cnames "`r(varlist)' _cons" local p : word count `cnames' matrix bt = J(1, `p', 0) matrix colnames bt = `cnames' _ms_omit_info bt matrix bto = r(omit) mata: void plleval3(real scalar todo, real vector b, /// real vector y, real matrix X, /// val, grad, hess) { real vector xb xb = X*b' val = -exp(xb) + y:*xb - lnfactorial(y) } y = st_data(., "`depvar'") X = st_data(., "`indepvars'") X = X,J(rows(X), 1, 1) mo = st_matrix("bto") ko = sum(mo) p = cols(mo) if (ko>0) { Ct = J(0, p, .) for(j=1; j<=p; j++) { if (mo[j]==1) { Ct = Ct \ e(j, p) } } Ct = Ct, J(ko, 1, 0) } else { Ct = J(0,p+1,.) } S = optimize_init() optimize_init_argument(S, 1, y) optimize_init_argument(S, 2, X) optimize_init_evaluator(S, &plleval3()) optimize_init_evaluatortype(S, "gf0") optimize_init_params(S, J(1, 7, .01)) optimize_init_constraints(S, Ct) bh = optimize(S) optimize_result_params(S) end
Lines 1–2 drop all the objects that I created in previous examples and read the accident3 dataset into memory. Lines 3–4 create locals to hold the dependent variable and the independent variables.
Lines 5–6 use _rmcoll to identify which variables should be omitted and put the list of names in the local macro cnames, as in example 3. Lines 7–9 create bt, whose column names specify which variables should be omitted, as in example 4. Lines 10–11 create bto, whose entries specify which variables should be omitted, as in example 5. Lines
28–42 create the constraint matrix Ct from bto, as in example 6.
Line 50 puts Ct into the optimize() object.
Example 7 illustrates that the code in code block 2 reproduces the
previously obtained results.
Example 7:Putting it all together
. do pc2 . clear all . use accident3 . local depvar "accidents" . local indepvars "cvalue ibn.kids traffic" . _rmcoll `indepvars', expand note: 3.kids omitted because of collinearity . local cnames "`r(varlist)' _cons" . local p : word count `cnames' . matrix bt = J(1, `p', 0) . matrix colnames bt = `cnames' . _ms_omit_info bt . matrix bto = r(omit) . . mata: ------------------------------------------------- mata (type end to exit) ----- : void plleval3(real scalar todo, real vector b, /// > real vector y, real matrix X, /// > val, grad, hess) > { > real vector xb > > xb = X*b' > val = -exp(xb) + y:*xb - lnfactorial(y) > } note: argument todo unused note: argument grad unused note: argument hess unused : : y = st_data(., "`depvar'") : X = st_data(., "`indepvars'") : X = X,J(rows(X), 1, 1) : : mo = st_matrix("bto") : ko = sum(mo) : p = cols(mo) : if (ko>0) { > Ct = J(0, p, .) > for(j=1; j if (mo[j]==1) { > Ct = Ct \ e(j, p) > } > } > Ct = Ct, J(ko, 1, 0) > } > else { > Ct = J(0,p+1,.) > } : : S = optimize_init() : optimize_init_argument(S, 1, y) : optimize_init_argument(S, 2, X) : optimize_init_evaluator(S, &plleval3()) : optimize_init_evaluatortype(S, "gf0") : optimize_init_params(S, J(1, 7, .01)) : optimize_init_constraints(S, Ct) : : bh = optimize(S) Iteration 0: f(p) = -845.47138 Iteration 1: f(p) = -572.68676 Iteration 2: f(p) = -545.68381 Iteration 3: f(p) = -545.11241 Iteration 4: f(p) = -545.10898 Iteration 5: f(p) = -545.10898 : optimize_result_params(S) 1 2 3 4 +------------------------------------------------------------- 1 | -.6582923624 3.233932519 1.571581623 1.65924145 +------------------------------------------------------------- 5 6 7 ----------------------------------------------+ 1 0 .13839766 -2.518174926 | ----------------------------------------------+ : end ------------------------------------------------------------------------------- . end of do-file
Done and undone
I discussed a method for handling factor variables when performing nonlinear optimization using optimize(). In my next post, I implement these methods in an estimation command for Poisson regression.