## Using mlexp to estimate endogenous treatment effects in a probit model

I use features new to Stata 14.1 to estimate an average treatment effect (ATE) for a probit model with an endogenous treatment. In 14.1, we added new prediction statistics after **mlexp** that **margins** can use to estimate an ATE.

I am building on a previous post in which I demonstrated how to use **mlexp** to estimate the parameters of a probit model with sample selection. Our results match those obtained with **biprobit**; see [R] biprobit for more details. In a future post, I use these techniques to estimate treatment-effect parameters not yet available from another Stata command.

**Probit model with treatment**

In this section, I describe the potential-outcome framework used to define an ATE. For each treatment level, there is an outcome that we would observe if a person were to select that treatment level. When the outcome is binary and there are two treatment levels, we can specify how the potential outcomes \(y_{0i}\) and \(y_{1i}\) are generated from the regressors \({\bf x}_i\) and the error terms \(\epsilon_{0i}\) and \(\epsilon_{1i}\):

\[\begin{eqnarray*}

y_{0i} &=& {\bf 1}({\bf x}_i{\boldsymbol \beta}_0 + \epsilon_{0i} > 0) \cr

y_{1i} &=& {\bf 1}({\bf x}_i{\boldsymbol \beta}_1 + \epsilon_{1i} > 0)

\end{eqnarray*}\]

(Assuming that each error is standard normal, this gives us a bivariate probit model.) The indicator function \({\bf 1}(\cdot)\) outputs 1 when its input is true and 0 otherwise.

The probit model for potential outcomes \(y_{0i}\) and \(y_{1i}\) with treatment \(t_i\) assumes that we observe the outcome

\[\begin{equation}

y_i = (1-t_i) y_{0i} + t_i y_{1i}

\nonumber

\end{equation}\]

So we observe \(y_{1i}\) under the treatment (\(t_{i}=1\)) and \(y_{0i}\) when the treatment is withheld (\(t_{i}=0\)).

The treatment \(t_i\) is determined by regressors \({\bf z}_i\) and standard normal error \(u_i\):

\[\begin{equation}

t_i = {\bf 1}({\bf z}_i{\boldsymbol \psi} + u_i > 0)

\nonumber

\end{equation}\]

**Probit model with endogenous treatment**

We could estimate the parameters \({\boldsymbol \beta}_0\) and \({\boldsymbol \beta}_1\) using a probit regression on \(y_i\) if \(t_i\) was not related to the unobserved errors \(\epsilon_{0i}\) and \(\epsilon_{1i}\). This may not always be the case. Suppose we modeled whether parents send their children to private school and used private tutoring for the child as a treatment. Unobserved factors that influence private school enrollment may be correlated with the unobserved factors that influence whether private tutoring is given. The treatment would be correlated with the unobserved errors of the outcome.

We can treat \(t_i\) as endogenous by allowing \(\epsilon_{0i}\) and \(\epsilon_{1i}\) to be correlated with \(u_i\). In this post, we will assume that these correlations are the same. Formally, \(\epsilon_{0i}\), \(\epsilon_{1i}\), and \(u_i\) are trivariate normal with covariance:

\[\begin{equation}

\left[\begin{matrix}

1 & \rho_{01} & \rho_{t} \cr

\rho_{01} & 1 & \rho_{t} \cr

\rho_{t} & \rho_{t} & 1

\end{matrix}\right]

\nonumber

\end{equation}\]

The correlation \(\rho_{01}\) cannot be identified because we never observe both \(y_{0i}\) and \(y_{1i}\). However, identification of \(\rho_{01}\) is not necessary to estimate the other parameters, because we will observe the covariates and outcome in observations from each treatment group.

The log-likelihood for observation \(i\) is

\[\begin{eqnarray*}

\ln L_i = & & {\bf 1}(y_i =1 \mbox{ and } t_i = 1) \ln \Phi_2({\bf x}_i{\boldsymbol \beta}_1, {\bf z}_i{\boldsymbol \gamma},\rho_t) + \cr

& & {\bf 1}(y_i=0 \mbox{ and } t_i=1)\ln \Phi_2(-{\bf x}_i{\boldsymbol \beta}_1, {\bf z}_i{\boldsymbol \gamma},-\rho_t) + \cr

& & {\bf 1}(y_i=1 \mbox{ and } t_i=0) \ln \Phi_2({\bf x}_i{\boldsymbol \beta}_0, -{\bf z}_i{\boldsymbol \gamma},-\rho_t) + \cr

& & {\bf 1}(y_i=0 \mbox{ and } t_i = 0)\ln \Phi_2(-{\bf x}_i{\boldsymbol \beta}_0, -{\bf z}_i{\boldsymbol \gamma},\rho_t)

\end{eqnarray*}\]

where \(\Phi_2\) is the bivariate normal cumulative distribution function.

This model is a variation of the bivariate probit model. For a good introduction to the bivariate probit model, see Pindyck and Rubinfeld (1998).

**The data**

We will simulate data from a probit model with an endogenous treatment and then estimate the parameters of the model using **mlexp**. Then, we will use **margins** to estimate the ATE. We simulate a random sample of 10,000 observations.

. set seed 3211 . set obs 10000 number of observations (_N) was 0, now 10,000 . gen x = rnormal() + 4 . gen b = rpoisson(1) . gen z = rnormal()

First, we generate the regressors. The variable \(x\) has a normal distribution with a mean of 4 and variance of 1. It is used as a regressor for the outcome and treatment. The variable \(b\) has a Poisson distribution with a mean of 1 and will be used as a treatment regressor. A standard normal variable \(z\) is also used as a treatment regressor.

. matrix cm = (1, .3,.7 \ .3, 1, .7 \ .7, .7, 1) . drawnorm ey0 ey1 et, corr(cm) . gen t = .5*x - .1*b + .4*z - 2.4 + et > 0 . gen y0 = .6*x - .8 + ey0 > 0 . gen y1 = .3*x - 1.2 + ey1 > 0 . gen y = (1-t)*y0 + t*y1

Next, we draw the unobserved errors. The potential outcome and treatment errors will have correlation \(.7\). We generate the errors using the **drawnorm** command. Finally, the outcome and treatment indicators are created.

**Estimating the model parameters**

Now, we will use **mlexp** to estimate the parameters of the probit model with an endogenous treatment. As in the previous post, we use the **cond()** function to calculate different values of the likelihood based on the different values of \(y\) and \(t\). We use the factor variable operator **ibn** on \(t\) in equation **y** to allow for a different intercept at each level of \(t\). An interaction between \(t\) and \(x\) is also specified in equation **y**. This allows for a different coefficient on \(x\) at each level of \(t\). We also specify **vce(robust)** so that we can use **vce(unconditional)** when we use **margins** later.

. mlexp (ln(cond(t,cond(y,binormal({y: i.t#c.x ibn.t}, /// > {t: x b z _cons}, {rho}), /// > binormal(-{y:},{t:}, -{rho})), /// > cond(y,binormal({y:},-{t:},-{rho}), /// > binormal(-{y:},-{t:},{rho}))))) /// > , vce(robust) initial: log pseudolikelihood = -13862.944 alternative: log pseudolikelihood = -15511.071 rescale: log pseudolikelihood = -13818.369 rescale eq: log pseudolikelihood = -10510.488 Iteration 0: log pseudolikelihood = -10510.488 (not concave) Iteration 1: log pseudolikelihood = -10004.946 Iteration 2: log pseudolikelihood = -9487.4032 Iteration 3: log pseudolikelihood = -9286.0118 Iteration 4: log pseudolikelihood = -9183.901 Iteration 5: log pseudolikelihood = -9181.9207 Iteration 6: log pseudolikelihood = -9172.0256 Iteration 7: log pseudolikelihood = -9170.8198 Iteration 8: log pseudolikelihood = -9170.7994 Iteration 9: log pseudolikelihood = -9170.7994 Maximum likelihood estimation Log pseudolikelihood = -9170.7994 Number of obs = 10,000 ------------------------------------------------------------------------------ | Robust | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- y | t#c.x | 0 | .5829362 .0223326 26.10 0.000 .5391651 .6267073 1 | .2745585 .0259477 10.58 0.000 .2237021 .325415 | t | 0 | -.7423227 .0788659 -9.41 0.000 -.896897 -.5877483 1 | -1.088765 .1488922 -7.31 0.000 -1.380589 -.7969419 -------------+---------------------------------------------------------------- t | x | .4900691 .0148391 33.03 0.000 .4609851 .5191532 b | -.1086717 .0132481 -8.20 0.000 -.1346375 -.0827059 z | .4135792 .0150112 27.55 0.000 .3841579 .4430006 _cons | -2.354418 .0640056 -36.78 0.000 -2.479867 -2.228969 -------------+---------------------------------------------------------------- /rho | .7146737 .0377255 18.94 0.000 .6407331 .7886143 ------------------------------------------------------------------------------

Our parameter estimates are close to their true values.

**Estimating the ATE**

The ATE of \(t\) is the expected value of the difference between \(y_{1i}\) and \(y_{0i}\), the average difference between the potential outcomes. Using the law of iterated expectations, we have

\[\begin{eqnarray*}

E(y_{1i}-y_{0i}) &=& E\{E(y_{1i}-y_{0i}|{\bf x}_i)\} \cr

&=& E\{\Phi({\bf x}_i{\boldsymbol \beta}_1)-

\Phi({\bf x}_i{\boldsymbol \beta}_0)\}

\end{eqnarray*}\]

This can be estimated as a predictive margin.

Now, we estimate the ATE using **margins**. We specify the normal probability expression in the **expression()** option. The **xb()** term refers to the linear prediction of the first equation, which we can now predict in Stata 14.1. We specify **r.t** so that **margins** will take the difference of the expression under \(t=1\) and \(t=0\). We specify **vce(unconditional)** to obtain standard errors for the population ATE rather than the sample ATE. The **contrast(nowald)** option is specified to omit the Wald test for the difference.

. margins r.t, expression(normal(xb())) vce(unconditional) contrast(nowald) Contrasts of predictive margins Expression : normal(xb()) -------------------------------------------------------------- | Unconditional | Contrast Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ t | (1 vs 0) | -.4112345 .0248909 -.4600197 -.3624493 --------------------------------------------------------------

We estimate that the ATE of \(t\) on \(y\) is \(-.41\). So taking the treatment decreases the probability of a positive outcome by \(.41\) on average over the population.

We will compare this estimate to the sample difference of \(y_{1}\) and \(y_{0}\).

. gen diff = y1 - y0 . sum diff Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- diff | 10,000 -.4132 .5303715 -1 1

In our sample, the average difference of \(y_{1}\) and \(y_{0}\) is also \(-.41\).

**Conclusion**

I have demonstrated how to estimate the parameters of a model with a complex likelihood function: the probit model with an endogenous treatment using **mlexp**. See [R] mlexp for more details about **mlexp**. I have also demonstrated how to use **margins** to estimate the ATE for the probit model with an endogenous treatment. See [R] margins for more details about **margins**.

**Reference**

Pindyck, R. S., and D. L. Rubinfeld. 1998. *Econometric Models and Economic Forecasts*. 4th ed. New York: McGraw-Hill.