## Using mlexp to estimate endogenous treatment effects in a heteroskedastic probit model

I use features new to Stata 14.1 to estimate an average treatment effect (ATE) for a heteroskedastic probit model with an endogenous treatment. In 14.1, we added new prediction statistics after **mlexp** that **margins** can use to estimate an ATE.

I am building on a previous post in which I demonstrated how to use **mlexp** to estimate the parameters of a probit model with an endogenous treatment and used **margins** to estimate the ATE for the model Using mlexp to estimate endogenous treatment effects in a probit model. Currently, no official commands estimate the heteroskedastic probit model with an endogenous treatment, so in this post I show how **mlexp** can be used to extend the models estimated by Stata.

**Heteroskedastic probit model**

For binary outcome \(y_i\) and regressors \({\bf x}_i\), the probit model assumes

\[\begin{equation}

y_i = {\bf 1}({\bf x}_i{\boldsymbol \beta} + \epsilon_i > 0)

\end{equation}\]

The indicator function \({\bf 1}(\cdot)\) outputs 1 when its input is true and outputs 0 otherwise. The error \(\epsilon_i\) is standard normal.

Assuming that the error has constant variance may not always be wise. Suppose we are studying a certain business decision. Large firms, because they have the resources to take chances, may exhibit more variation in the factors that affect their decision than small firms.

In the heteroskedastic probit model, regressors \({\bf w}_i\) determine the variance of \(\epsilon_i\). Following Harvey (1976), we have

\[\begin{equation} \mbox{Var}\left(\epsilon_i\right) = \left\{\exp\left({\bf

w}_i{\boldsymbol \gamma}\right)\right\}^2 \nonumber \end{equation}\]

**Heteroskedastic probit model with treatment**

In this section, I review the potential-outcome framework used to define an ATE and extend it for the heteroskedastic probit model. For each treatment level, there is an outcome that we would observe if a person were to select that treatment level. When the outcome is binary and there are two treatment levels, we can specify how the potential outcomes \(y_{0i}\) and \(y_{1i}\) are generated from the regressors \({\bf x}_i\) and the error terms \(\epsilon_{0i}\) and \(\epsilon_{1i}\):

\[\begin{eqnarray*}

y_{0i} &=& {\bf 1}({\bf x}_i{\boldsymbol \beta}_0 + \epsilon_{0i} > 0) \cr

y_{1i} &=& {\bf 1}({\bf x}_i{\boldsymbol \beta}_1 + \epsilon_{1i} > 0)

\end{eqnarray*}\]

We assume a heteroskedastic probit model for the potential outcomes. The errors are normal with mean \(0\) and conditional variance generated by regressors \({\bf w}_i\). In this post, we assume equal variance of the potential outcome errors.

\[\begin{equation}

\mbox{Var}\left(\epsilon_{0i}\right) = \mbox{Var}\left(\epsilon_{1i}\right) =

\left\{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)\right\}^2 \nonumber

\end{equation}\]

The heteroskedastic probit model for potential outcomes \(y_{0i}\) and \(y_{1i}\) with treatment \(t_i\) assumes that we observe the outcome

\[\begin{equation}

y_i = (1-t_i) y_{0i} + t_i y_{1i}

\nonumber

\end{equation}\]

So we observe \(y_{1i}\) under the treatment (\(t_{i}=1\)) and \(y_{0i}\) when the treatment is withheld (\(t_{i}=0\)).

The treatment \(t_i\) is determined by regressors \({\bf z}_i\) and error \(u_i\):

\[\begin{equation}

t_i = {\bf 1}({\bf z}_i{\boldsymbol \psi} + u_i > 0)

\nonumber

\end{equation}\]

The treatment error \(u_i\) is normal with mean zero, and we allow its variance to be determined by another set of regressors \({\bf v}_i\):

\[\begin{equation}

\mbox{Var}\left(u_i\right) =

\left\{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)\right\}^2 \nonumber

\end{equation}\]

**Heteroskedastic probit model with endogenous treatment**

In the previous post, I described how to model endogeneity for the treatment \(t_i\) by correlating the outcome errors \(\epsilon_{0i}\) and \(\epsilon_{1i}\) with the treatment error \(u_i\). We use the same framework for modeling endogeneity here. The variance of the errors may change depending on the heteroskedasticity regressors \({\bf w}_i\) and \({\bf v}_i\), but their correlation remains constant. The errors \(\epsilon_{0i}\), \(\epsilon_{1i}\), and \(u_i\) are trivariate normal with correlation

\[\begin{equation}

\left[\begin{matrix}

1 & \rho_{01} & \rho_{t} \cr

\rho_{01} & 1 & \rho_{t} \cr

\rho_{t} & \rho_{t} & 1

\end{matrix}\right]

\nonumber

\end{equation}\]

Now we have all the pieces we need to write the log likelihood of the heteroskedastic probit model with an endogenous treatment. The form of the likelihood is similar to what was given in the previous post. Now the inputs to the bivariate normal cumulative distribution function, \(\Phi_2\), are standardized by dividing by the conditional standard deviations of the errors.

The log likelihood for observation \(i\) is

\[\begin{eqnarray*}

\ln L_i = & & {\bf 1}(y_i =1 \mbox{ and } t_i = 1) \ln \Phi_2\left\{\frac{{\bf x}_i{\boldsymbol \beta}_1}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},\rho_t\right\} + \cr

& & {\bf 1}(y_i=0 \mbox{ and } t_i=1)\ln \Phi_2\left\{\frac{-{\bf x}_i{\boldsymbol \beta}_1}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},-\rho_t\right\} + \cr

& & {\bf 1}(y_i=1 \mbox{ and } t_i=0) \ln \Phi_2\left\{\frac{{\bf x}_i{\boldsymbol \beta}_0}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{-{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},-\rho_t\right\} + \cr

& & {\bf 1}(y_i=0 \mbox{ and } t_i = 0)\ln \Phi_2\left\{\frac{-{\bf x}_i{\boldsymbol \beta}_0}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{-{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},\rho_t\right\}

\end{eqnarray*}\]

**The data**

We will simulate data from a heteroskedastic probit model with an endogenous treatment and then estimate the parameters of the model with **mlexp**. Then, we will use **margins** to estimate the ATE.

. set seed 323 . set obs 10000 number of observations (_N) was 0, now 10,000 . generate x = .8*rnormal() + 4 . generate b = rpoisson(1) . generate z = rnormal() . matrix cm = (1, .3,.7 \ .3, 1, .7 \ .7, .7, 1) . drawnorm ey0 ey1 et, corr(cm)

We simulate a random sample of 10,000 observations. The treatment and outcome regressors are generated in a similar manner to their creation in the last post. As in the last post, we generate the errors with **drawnorm** to have correlation \(0.7\).

. generate g = runiform() . generate h = rnormal() . quietly replace ey0 = ey0*exp(.5*g) . quietly replace ey1 = ey1*exp(.5*g) . quietly replace et = et*exp(.1*h) . generate t = .5*x - .1*b + .5*z - 2.4 + et > 0 . generate y0 = .6*x - .8 + ey0 > 0 . generate y1 = .3*x - 1.3 + ey1 > 0 . generate y = (1-t)*y0 + t*y1

The uniform variable **g** is generated as a regressor for the outcome error variance, while **h** is a regressor for the treatment error variance. We scale the errors by using the variance regressors so that they are heteroskedastic, and then we generate the treatment and outcome indicators.

**Estimating the model parameters**

Now, we will use **mlexp** to estimate the parameters of the heteroskedastic probit model with an endogenous treatment. As in the previous post, we use the **cond()** function to calculate different values of the likelihood based on the different values of \(y\) and \(t\). We use the factor-variable operator **ibn** on \(t\) in equation **y** to allow for a different intercept at each level of \(t\). An interaction between \(t\) and \(x\) is also specified in equation **y**. This allows for a different coefficient on \(x\) at each level of \(t\).

. mlexp (ln(cond(t, /// > cond(y,binormal({y: i.t#c.x ibn.t}/exp({g:g}), /// > {t: x b z _cons}/exp({h:h}),{rho}), /// > binormal(-{y:}/exp({g:}),{t:}/exp({h:}),-{rho})), /// > cond(y,binormal({y:}/exp({g:}),-{t:}/exp({h:}),-{rho}), /// > binormal(-{y:}/exp({g:}),-{t:}/exp({h:}),{rho}) /// > )))), vce(robust) initial: log pseudolikelihood = -13862.944 alternative: log pseudolikelihood = -16501.619 rescale: log pseudolikelihood = -13858.877 rescale eq: log pseudolikelihood = -11224.877 Iteration 0: log pseudolikelihood = -11224.877 (not concave) Iteration 1: log pseudolikelihood = -10644.625 Iteration 2: log pseudolikelihood = -10074.998 Iteration 3: log pseudolikelihood = -9976.6027 Iteration 4: log pseudolikelihood = -9973.0988 Iteration 5: log pseudolikelihood = -9973.0913 Iteration 6: log pseudolikelihood = -9973.0913 Maximum likelihood estimation Log pseudolikelihood = -9973.0913 Number of obs = 10,000 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- y | t#c.x | 0 | .6178115 .0334521 18.47 0.000 .5522467 .6833764 1 | .2732094 .0365742 7.47 0.000 .2015253 .3448936 | t | 0 | -.8403294 .1130197 -7.44 0.000 -1.061844 -.6188149 1 | -1.215177 .1837483 -6.61 0.000 -1.575317 -.8550371 -------------+---------------------------------------------------------------- g | g | .4993187 .0513297 9.73 0.000 .3987143 .5999232 -------------+---------------------------------------------------------------- t | x | .4985802 .0183033 27.24 0.000 .4627065 .5344539 b | -.1140255 .0132988 -8.57 0.000 -.1400908 -.0879603 z | .4993995 .0150844 33.11 0.000 .4698347 .5289643 _cons | -2.402772 .0780275 -30.79 0.000 -2.555703 -2.249841 -------------+---------------------------------------------------------------- h | h | .1011185 .0199762 5.06 0.000 .0619658 .1402713 -------------+---------------------------------------------------------------- /rho | .7036964 .0326734 21.54 0.000 .6396577 .7677351 ------------------------------------------------------------------------------

Our parameter estimates are close to their true values.

**Estimating the ATE**

The ATE of \(t\) is the expected value of the difference between \(y_{1i}\) and \(y_{0i}\), the average difference between the potential outcomes. Using the law of iterated expectations, we have

\[\begin{eqnarray*}

E(y_{1i}-y_{0i})&=& E\left\{ E\left(y_{1i}-y_{0i}|{\bf x}_i,{\bf w}_i\right)\right\} \cr

&=& E\left\lbrack\Phi\left\{\frac{{\bf x}_i{\boldsymbol \beta}_1}{

\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}\right\}-

\Phi\left\{\frac{{\bf x}_i{\boldsymbol \beta}_0}{

\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}\right\}\right\rbrack \cr

\end{eqnarray*}\]

This can be estimated as a mean of predictions.

Now, we estimate the ATE by using **margins**. We specify the normal probability expression in the **expression()** option. We use the expression function **xb()** to get the linear predictions for the outcome equation and the outcome error variance equation. We can now predict these linear forms after **mlexp** in Stata 14.1. We specify **r.t** so that **margins** will take the difference of the expression under **t=1** and **t=0**. We specify **vce(unconditional)** to obtain standard errors for the population ATE rather than the sample ATE; we specified **vce(robust)** for **mlexp** so that we could specify **vce(unconditional)** for **margins**. The **contrast(nowald)** option is specified to omit the Wald test for the difference.

. margins r.t, expression(normal(xb(y)/exp(xb(g)))) /// > vce(unconditional) contrast(nowald) Contrasts of predictive margins Expression : normal(xb(y)/exp(xb(g))) -------------------------------------------------------------- | Unconditional | Contrast Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ t | (1 vs 0) | -.4183043 .0202635 -.4580202 -.3785885 --------------------------------------------------------------

We estimate that the ATE of \(t\) on \(y\) is \(-0.42\). So taking the treatment decreases the probability of a positive outcome by \(0.42\) on average over the population.

We will compare this estimate to the average difference of \(y_{1}\) and \(y_{0}\) in the sample. We can do this because we simulated the data. In practice, only one potential outcome is observed for every observation, and this average difference cannot be computed.

. generate diff = y1 - y0 . sum diff Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- diff | 10,000 -.4164 .5506736 -1 1

In our sample, the average difference of \(y_{1}\) and \(y_{0}\) is also \(-0.42\).

**Conclusion**

I have demonstrated how to estimate the parameters of a model that is not available in Stata: the heteroskedastic probit model with an endogenous treatment using **mlexp**. See [R] mlexp for more details about **mlexp**. I have also demonstrated how to use **margins** to estimate the ATE for the heteroskedastic probit model with an endogenous treatment. See [R] margins for more details about **mlexp**.

**Reference**

Harvey, A. C. 1976. Estimating regression models with multiplicative heteroscedasticity. *Econometrica* 44: 461-465.