## Multiple-equation models: Estimation and marginal effects using gmm

We estimate the average treatment effect (ATE) for an exponential mean model with an endogenous treatment. We have a two-step estimation problem where the first step corresponds to the treatment model and the second to the outcome model. As shown in *Using gmm to solve two-step estimation problems*, this can be solved with the generalized method of moments using **gmm**.

This continues the series of posts where we illustrate how to obtain correct standard errors and marginal effects for models with multiple steps. In the previous posts, we used **gsem** and **mlexp** to estimate the parameters of models with separable likelihoods. In the current model, because the treatment is endogenous, the likelihood for the model is no longer separable. We demonstrate how we can use **gmm** to estimate the parameters in these situations.

**Model**

We begin by describing the potential-outcome framework used to define an average treatment effect. For each treatment level, there is an outcome that we would observe if a person were to receive that treatment level. When we have an outcome with an exponential mean and there are two treatment levels, we can specify how the means of the potential outcomes \(y_{0i}\) and \(y_{1i}\) are generated from the regressors \({\bf x}_i\) and error terms \(\epsilon_{0i}\) and \(\epsilon_{1i}\),

\begin{eqnarray*}

E(y_{0i}\vert{\bf x}_i,\epsilon_{0i}) &=& \exp({\bf x}_i{\boldsymbol \beta}_0 + \beta_{00}+\epsilon_{0i}) \cr

E(y_{1i}\vert{\bf x}_i,\epsilon_{1i}) &=& \exp({\bf x}_i{\boldsymbol \beta}_1 + \beta_{10}+\epsilon_{1i})

\end{eqnarray*}

where the parameters \(\beta_{00}\) and \(\beta_{10}\) are constant intercepts and \({\boldsymbol \beta}_0\) and \({\boldsymbol \beta}_1\) are the coefficients on the regressors \({\bf x}_i\). Note that the distribution of the potential outcomes could be Poisson, lognormal, or some other distribution with an exponential mean.

For treatment \(t_i\), we observe the outcome

\begin{equation}

y_i = (1-t_i) y_{0i} + t_i y_{1i}

\nonumber

\end{equation}

So we observe \(y_{1i}\) under the treatment (\(t_{i}=1\)) and \(y_{0i}\) when the treatment is withheld (\(t_{i}=0\)). The potential-outcome errors \(\epsilon_{0i}\) and \(\epsilon_{1i}\) are correlated with the treatment assignment.

The treatment \(t_i\) is determined by regressors \({\bf z}_i\) in a probit regression, such that

\begin{equation}

t_i = {\bf 1}({\bf z}_i{\boldsymbol \psi} + u_i > 0)

\nonumber

\end{equation}

where the treatment error \(u_i\) is standard normal.

We treat \(t_i\) as endogenous by allowing \(\epsilon_{0i}\) and \(\epsilon_{1i}\) to be correlated with \(u_i\). In this post we will assume that the variances of the unobserved errors are the same for the control and treatment group and that their correlation with \(u_i\) is also equal. We assume that the errors are trivariate normal with mean zero and covariance

\begin{equation}

\left[\begin{matrix}

\sigma^{2} & \sigma_{01} & \sigma_{t} \cr

\sigma_{01} & \sigma^{2} & \sigma_{t} \cr

\sigma_{t} & \sigma_{t} & 1

\end{matrix}\right]

\nonumber

\end{equation}

**Treatment effect**

We want to identify the treatment effect of \(t_i\) conditional on \({\bf x}_i\). This is just the difference of the potential-outcome means of \(y_{1i}\) and \(y_{0i}\), conditional on \({\bf x}_i\).

\begin{eqnarray*}

E(y_{1i}-y_{0i}\vert{\bf x}_i) &=& E\left\{E(y_{1i}-y_{0i}\vert {\bf x}_i, \epsilon_{0i}, \epsilon_{1i}) \vert{\bf x}_i \right\} \cr

&=& E\left\{\exp({\bf x}_i{\boldsymbol \beta}_1 + \beta_{10}+\epsilon_{1i})\vert{\bf x}_i\right\} –

E\left\{\exp({\bf x}_i{\boldsymbol \beta}_0 + \beta_{00}+\epsilon_{0i})\vert{\bf x}_i\right\}\cr

&=& \exp\left({\bf x}_i{\boldsymbol \beta}_1 + \beta_{10}+\frac{\sigma^2}{2}\right) –

\exp\left({\bf x}_i{\boldsymbol \beta}_0 + \beta_{00}+\frac{\sigma^2}{2}\right)

\cr

\end{eqnarray*}

So we can identify the treatment effect of \(t_i\) at \({\bf x}_i\) if we know \({\boldsymbol \beta}_0\), \({\boldsymbol \beta}_1\), \(\beta_{00}^{\star}=\beta_{00}+\sigma^2 / 2\), and \(\beta_{10}^{\star}=\beta_{10}+\sigma^2 / 2\).

The ATE is the margin of the conditional treatment effects over the regressors \({\bf x}_i\).

\begin{equation}

\mbox{ATE} = E(y_{1i} – y_{0i}) = E\left\{E(y_{1i}-y_{0i}\vert{\bf x}_i)\right\} \nonumber

\end{equation}

Our estimator will provide consistent estimates of the model parameters and the ATE.

**Estimator**

We cannot just run separate regressions for the control and treatment groups and difference the means to estimate the treatment effect (regression adjustment estimation of the treatment effect). The potential-outcome errors \(\epsilon_{0i}\) and \(\epsilon_{1i}\) are correlated with \(t_i\), and a regression that ignores this correlation will not give consistent point estimates.

By modeling the treatment, and using this information in a model for the exponential mean, we can account for the correlation and obtain consistent point estimates. This is a two-step problem.

Unlike as in the previous posts, we cannot perform the steps independently to get point estimates. We also do not need to make some of the parametric assumptions that we made in the previous posts. In our current model, \(y_i\) could be Poisson, lognormal, or some other distribution with an exponential mean.

We can specify our model in terms of moment conditions that are satisfied by the parameters, rather than strict distributional assumptions that impose a likelihood. Moment conditions are expected values that specify the model parameters in terms of the true moments. The generalized method of moments (GMM) finds parameter values that are closest to satisfying the sample equivalent of the moment conditions. For our two-step problem, we can estimate the moment conditions for both steps simultaneously, as shown in *Using gmm to solve two-step estimation problems*.

The treatment regressor coefficients \({\boldsymbol \psi}\) are identified by the moment conditions

\begin{equation}

E\left[{\bf z}_i\left\{t_i – \Phi\left({\bf z}_i{\boldsymbol \psi}\right)\right\}\right] = {\bf 0}

\nonumber

\end{equation}

This is the first step. We use the treatment assignment model parameters to help account for the endogeneity of the assignment and the outcome errors in the second step, where we model the exponential mean. The moment conditions for the second step come from a nonlinear least-squares estimator that models \(E(y_i\vert {\bf x}_i,t_i, {\bf z}_i)\) and are given by

\begin{equation}

E\left[ \frac{\partial E\left(y_{i}\vert {\bf x}_i, t_i, z_i \right)}{\partial {\boldsymbol \beta}} \left\{y_i –

E\left(y_{i}\vert {\bf x}_i, t_i, z_i \right)

\right\}\right]

\nonumber

\end{equation}

where

\begin{eqnarray*}

E\left(y_{i}\vert {\bf x}_i, t_i, z_i \right) &=&

E\left(t_i y_{1i} + (1-t_i) y_{0i} \vert {\bf x}_i, t_i, z_i \right) \cr

&=& \exp\left({\bf x}_i {\boldsymbol \beta}_{t_i} + \beta_{t_i0}^{\star}\right) \left\{\frac{\Phi(\sigma_{t}+ {\bf z}_i{\boldsymbol \psi)}}{\Phi({\bf z}_i{\boldsymbol \psi})}\right\}^{t_i} \left\{\frac{1-\Phi(\sigma_{t}+ {\bf z}_i{\boldsymbol \psi)}}{1-\Phi({\bf z}_i{\boldsymbol \psi})}\right\}^{1-t_i} \cr

\end{eqnarray*}

and \({\boldsymbol \beta}\) is

\begin{equation}

{\boldsymbol \beta} = \left[\begin{matrix} {\boldsymbol \beta}_{1} \cr

{\boldsymbol \beta}_{0} \cr

\beta_{00}^{\star} \cr

\beta_{10}^{\star} \cr

\sigma_{t}\end{matrix}\right]

\nonumber

\end{equation}

More details on the derivation of the conditional mean is provided in the appendix. To obtain consistent point estimates, we could find an estimate of \({\boldsymbol \psi}\) that satisfies the sample moment conditions for the first stage. Then this estimate, \(\widehat{\boldsymbol \psi}\), would be used in place of \({\boldsymbol \psi}\) in the sample equivalent of the second-stage moment conditions. The estimate \(\widehat{\boldsymbol \beta}\) would be computed using these conditions. With this estimate \(\widehat{\boldsymbol \beta}\), we could also estimate the ATE. The ATE is a function of the parameters of the second moments and has moment condition

\begin{equation}

E\left\{\mbox{ATE} – \left(

\exp\left({\bf x}_i{\boldsymbol \beta}_1 + \beta_{10}+\frac{\sigma^2}{2}\right) –

\exp\left({\bf x}_i{\boldsymbol \beta}_0 + \beta_{00}+\frac{\sigma^2}{2}\right)

\right)\right\}

\nonumber

\end{equation}

Our point estimates would be consistent, but we would need to adjust the standard errors of the second stage and the treatment effect for the multiple steps of estimation. Sections 10.3 and 2.5 of Cameron and Trivedi (2013) and section 18.5 of Wooldridge (2010) describe the current model and the two-step approach. Using GMM with all stages simultaneously would automatically adjust the standard errors. We will use the command **gmm** to estimate the first and second stages together with the treatment effect using GMM.

**Estimation**

We use the interactive version of **gmm** to estimate the parameters from simulated data. Our outcome has a lognormal distribution. As in the last post, we will use local macros to organize our work.

We store the moment equation for the treatment in the macro **first**. The instrument option for **first** is stored in the local macro **first_inst**. The moment conditions are calculated by multiplying the equation by the instruments.

. local first (first:t - normal({t: x1 z1 z2 _cons})) . local first_inst instruments(first: x1 z1 z2)

Now, we will look at the moment conditions for the second stage. We will begin with the moment conditions that correspond to the derivatives for the coefficients and intercepts of the exponential mean. We store the moment equation in the macro **second** and the instrument option in the macro **second_inst**. As in the first case, the moment conditions are calculated by multiplying the equation by the instruments. Note that we specify **noconstant** and use **ibn** for **t** in the instruments option. We will have separate intercepts for the treatment and control group and no marginal constant intercept.

. local second (second:(exp(0.t*({y0:x1 x2 x3 _cons})+ > 1.t*({y1:x1 x2 x3 _cons}))* > cond(t,normal({sigmat}+{t:})/normal({t:}), > (normal(-{sigmat}-{t:}))/(normal(-{t:}))))* > (y- (exp(0.t*({y0:})+ > 1.t*({y1:}))* > cond(t,normal({sigmat}+{t:})/normal({t:}), > (normal(-{sigmat}-{t:}))/(normal(-{t:})))))) . local second_inst instruments(second: c.(x1 x2 x3)#i.t ibn.t, noconstant)

Finally, we show the moment conditions that correspond to the derivative of the exponential mean for the covariance parameter and the ATE. These are stored in local macros **secondt** and **ate**. We do not need to specify the instruments for these moment equations; the constant is the instrument.

. local secondt (secondt:(exp(0.t*({y0:})+1.t*({y1:}))* > cond(t,normalden({sigmat}+{t:})/normal({t:}), > (-normalden(-{sigmat}-{t:}))/(normal(-{t:}))))* > (y- (exp(0.t*({y0:})+ > 1.t*({y1:}))* > cond(t,normal({sigmat}+{t:})/normal({t:}), > (normal(-{sigmat}-{t:}))/(normal(-{t:})))))) . local ate (ate: {ate} - (exp({y1:})-exp({y0:})))

Now, we use **gmm** to estimate the parameters of the exponential mean model with an endogenous treatment and its ATE. Our moment conditions exactly identify the model, so we use one-step estimation and an identity initial weight matrix.

. matrix I = I(14) . gmm `first' `second' `secondt' `ate', `first_inst' `second_inst' > onestep winitial(I) Step 1 Iteration 0: GMM criterion Q(b) = .47089834 Iteration 1: GMM criterion Q(b) = .01758819 Iteration 2: GMM criterion Q(b) = .00464735 Iteration 3: GMM criterion Q(b) = .00001541 Iteration 4: GMM criterion Q(b) = 4.462e-08 Iteration 5: GMM criterion Q(b) = 2.860e-13 note: model is exactly identified GMM estimation Number of parameters = 14 Number of moments = 14 Initial weight matrix: user Number of obs = 10,000 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- t | x1 | .4878797 .016208 30.10 0.000 .4561126 .5196469 z1 | .3201969 .0197232 16.23 0.000 .2815401 .3588537 z2 | -1.023134 .0204703 -49.98 0.000 -1.063255 -.9830127 _cons | -.5308057 .0275083 -19.30 0.000 -.5847209 -.4768905 -------------+---------------------------------------------------------------- y0 | x1 | .2897787 .0208321 13.91 0.000 .2489486 .3306088 x2 | .2154847 .0182694 11.79 0.000 .1796773 .2512922 x3 | -.2993678 .017933 -16.69 0.000 -.3345159 -.2642197 _cons | -.1904076 .0288861 -6.59 0.000 -.2470232 -.1337919 -------------+---------------------------------------------------------------- y1 | x1 | .1828499 .0307878 5.94 0.000 .1225069 .2431928 x2 | .4439466 .0296223 14.99 0.000 .3858879 .5020052 x3 | -.5825316 .0277333 -21.00 0.000 -.6368878 -.5281753 _cons | -.6311437 .0308619 -20.45 0.000 -.6916319 -.5706554 -------------+---------------------------------------------------------------- /sigmat | .292965 .0338062 8.67 0.000 .2267061 .359224 /ate | -.2984529 .0242588 -12.30 0.000 -.3459992 -.2509065 ------------------------------------------------------------------------------ Instruments for equation first: x1 z1 z2 _cons Instruments for equation second: 0.t#c.x1 1.t#c.x1 0.t#c.x2 1.t#c.x2 0.t#c.x3 1.t#c.x3 0.t 1.t Instruments for equation secondt: _cons Instruments for equation ate: _cons

We estimate that the ATE is \(-\)0.3. We will compare this estimate to the sample difference of \(y_{1}\) and \(y_{0}\).

. gen diff = y1 - y0 . sum diff Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- diff | 10,000 -.3016842 .6915832 -9.553974 5.870782

In our sample, the average difference of \(y_{1}\) and \(y_{0}\) is also \(-\)0.3.

**Final considerations**

We illustrated how to use **gmm** to estimate the parameters of an exponential mean model with an endogenous treatment and its average treatment effect. We demonstrate how to perform multistep estimation in Stata when the model is not separable, and estimation of each step requires information from the previous step.

In the previous posts, we used **gsem** and **mlexp** to estimate the parameters of models with separable likelihoods. In the current model, because the treatment is endogenous, the likelihood for the model is no longer separable. We demonstrated how we can use **gmm** to estimate the parameters in these situations.

**Appendix 1**

The moment conditions for the exponential mean correspond to the difference between \(y_i\) and \(E(y_i\vert {\bf x}_i,t_i, {\bf z}_i)\). The conditional mean of \(y_i\) depends on the conditional means of the unobserved outcome errors \(\epsilon_{0i}\) and \(\epsilon_{1i}\). This explicitly models the covariance between treatment assignment and the unobserved outcome errors and allows us to estimate the parameters needed to estimate conditional and average treatment effects.

For \(j=0,1\), the conditional mean of the potential outcome \(y_{ji}\) is

\begin{eqnarray*}

E\left(y_{ji}\vert {\bf x}_i, t_i, z_i \right) &=&

E\left\{ E\left(y_{ji}\vert {\bf x}_i, \epsilon_{ji}, t_i, z_i \right)\vert {\bf x}_i, t_i, z_i \right\} \cr

&=& E\left\{\exp\left({\bf x}_i{\boldsymbol \beta}_j + \beta_{j0}+\epsilon_{ji}\right) \vert {\bf x}_i, t_i, z_i \right\} \cr

&=&

\exp\left({\bf x}_i{\boldsymbol \beta}_j + \beta_{j0}\right)E\left\{\exp\left(\epsilon_{ji}\right) \vert {\bf x}_i, t_i, z_i \right\}

\end{eqnarray*}

Because \(\epsilon_{ji}\) is correlated with \(t_i\) via the unobserved error \(u_i\), we have

\begin{eqnarray*}

\epsilon_{ji} = \sigma_{t}u_i + c_{ji}

\end{eqnarray*}

where \(c_{ji}\) is normal with zero mean and variance \(\sigma^2-\sigma_{t}^2\) and independent of \(u_i\).

Terza (1998) derived the conditional expectation of \(\exp(\sigma_{t}u_i)\). Using his results, we obtain

\begin{eqnarray*}

E\left(y_{i}\vert {\bf x}_i, t_i, z_i \right) &=&

E\left(t_i y_{1i} + (1-t_i) y_{0i} \vert {\bf x}_i, t_i, z_i \right) \cr

&=& \exp\left({\bf x}_i {\boldsymbol \beta}_{t_i} + \beta_{t_i0}^{\star}\right) \left\{\frac{\Phi(\sigma_{t}+ {\bf z}_i{\boldsymbol \psi)}}{\Phi({\bf z}_i{\boldsymbol \psi})}\right\}^{t_i} \left\{\frac{1-\Phi(\sigma_{t}+ {\bf z}_i{\boldsymbol \psi)}}{1-\Phi({\bf z}_i{\boldsymbol \psi})}\right\}^{1-t_i} \cr

\end{eqnarray*}

**Appendix 2**

We simulate data from a lognormal model with an endogenous treatment. The code used to produce the data is given below.

. set seed 113432 . set obs 10000 number of observations (_N) was 0, now 10,000 . // Exogenous regressors . generate x1 = rnormal() . generate x2 = rnormal() . generate x3 = rpoisson(1) . // Treatment regressors . generate z1 = ln(rchi2(4)) . generate z2 = rnormal() . matrix corr = ( 1, .4, .4 \ > .4, 1, .4 \ > .4, .4, 1) . matrix sds = (.8, .8,1) . drawnorm e0 e1 u, corr(corr) sd(sds) . gen t = .5*x1 + .3*z1 - z2 -.5 + u> 0 . gen y0 = exp(.3*x1 + .2*x2 - .3*x3 + -.5 + e0) . gen y1 = exp(.2*x1 + .4*x2 - .6*x3 + -.9 + e1) . gen y = 0.t*y0 + 1.t*y1

The potential-outcome errors have different variances and correlations. We also use different coefficients for the control and treatment groups.

**References**

Cameron, A. C., and P. K. Trivedi. 2013. *Regression Analysis of Count Data*. 2nd ed. New York: Cambridge University Press.

Terza, J. V. 1998. Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects. *Journal of Econometrics* 84: 129–154.

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, MA: MIT Press.