Home > Statistics > Using mlexp to estimate endogenous treatment effects in a heteroskedastic probit model

Using mlexp to estimate endogenous treatment effects in a heteroskedastic probit model

I use features new to Stata 14.1 to estimate an average treatment effect (ATE) for a heteroskedastic probit model with an endogenous treatment. In 14.1, we added new prediction statistics after mlexp that margins can use to estimate an ATE.

I am building on a previous post in which I demonstrated how to use mlexp to estimate the parameters of a probit model with an endogenous treatment and used margins to estimate the ATE for the model Using mlexp to estimate endogenous treatment effects in a probit model. Currently, no official commands estimate the heteroskedastic probit model with an endogenous treatment, so in this post I show how mlexp can be used to extend the models estimated by Stata.

Heteroskedastic probit model

For binary outcome \(y_i\) and regressors \({\bf x}_i\), the probit model assumes

\[\begin{equation}
y_i = {\bf 1}({\bf x}_i{\boldsymbol \beta} + \epsilon_i > 0)
\end{equation}\]

The indicator function \({\bf 1}(\cdot)\) outputs 1 when its input is true and outputs 0 otherwise. The error \(\epsilon_i\) is standard normal.

Assuming that the error has constant variance may not always be wise. Suppose we are studying a certain business decision. Large firms, because they have the resources to take chances, may exhibit more variation in the factors that affect their decision than small firms.

In the heteroskedastic probit model, regressors \({\bf w}_i\) determine the variance of \(\epsilon_i\). Following Harvey (1976), we have

\[\begin{equation} \mbox{Var}\left(\epsilon_i\right) = \left\{\exp\left({\bf
w}_i{\boldsymbol \gamma}\right)\right\}^2 \nonumber \end{equation}\]

Heteroskedastic probit model with treatment

In this section, I review the potential-outcome framework used to define an ATE and extend it for the heteroskedastic probit model. For each treatment level, there is an outcome that we would observe if a person were to select that treatment level. When the outcome is binary and there are two treatment levels, we can specify how the potential outcomes \(y_{0i}\) and \(y_{1i}\) are generated from the regressors \({\bf x}_i\) and the error terms \(\epsilon_{0i}\) and \(\epsilon_{1i}\):

\[\begin{eqnarray*}
y_{0i} &=& {\bf 1}({\bf x}_i{\boldsymbol \beta}_0 + \epsilon_{0i} > 0) \cr
y_{1i} &=& {\bf 1}({\bf x}_i{\boldsymbol \beta}_1 + \epsilon_{1i} > 0)
\end{eqnarray*}\]

We assume a heteroskedastic probit model for the potential outcomes. The errors are normal with mean \(0\) and conditional variance generated by regressors \({\bf w}_i\). In this post, we assume equal variance of the potential outcome errors.

\[\begin{equation}
\mbox{Var}\left(\epsilon_{0i}\right) = \mbox{Var}\left(\epsilon_{1i}\right) =
\left\{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)\right\}^2 \nonumber
\end{equation}\]

The heteroskedastic probit model for potential outcomes \(y_{0i}\) and \(y_{1i}\) with treatment \(t_i\) assumes that we observe the outcome

\[\begin{equation}
y_i = (1-t_i) y_{0i} + t_i y_{1i}
\nonumber
\end{equation}\]

So we observe \(y_{1i}\) under the treatment (\(t_{i}=1\)) and \(y_{0i}\) when the treatment is withheld (\(t_{i}=0\)).

The treatment \(t_i\) is determined by regressors \({\bf z}_i\) and error \(u_i\):

\[\begin{equation}
t_i = {\bf 1}({\bf z}_i{\boldsymbol \psi} + u_i > 0)
\nonumber
\end{equation}\]

The treatment error \(u_i\) is normal with mean zero, and we allow its variance to be determined by another set of regressors \({\bf v}_i\):

\[\begin{equation}
\mbox{Var}\left(u_i\right) =
\left\{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)\right\}^2 \nonumber
\end{equation}\]

Heteroskedastic probit model with endogenous treatment

In the previous post, I described how to model endogeneity for the treatment \(t_i\) by correlating the outcome errors \(\epsilon_{0i}\) and \(\epsilon_{1i}\) with the treatment error \(u_i\). We use the same framework for modeling endogeneity here. The variance of the errors may change depending on the heteroskedasticity regressors \({\bf w}_i\) and \({\bf v}_i\), but their correlation remains constant. The errors \(\epsilon_{0i}\), \(\epsilon_{1i}\), and \(u_i\) are trivariate normal with correlation

\[\begin{equation}
\left[\begin{matrix}
1 & \rho_{01} & \rho_{t} \cr
\rho_{01} & 1 & \rho_{t} \cr
\rho_{t} & \rho_{t} & 1
\end{matrix}\right]
\nonumber
\end{equation}\]

Now we have all the pieces we need to write the log likelihood of the heteroskedastic probit model with an endogenous treatment. The form of the likelihood is similar to what was given in the previous post. Now the inputs to the bivariate normal cumulative distribution function, \(\Phi_2\), are standardized by dividing by the conditional standard deviations of the errors.

The log likelihood for observation \(i\) is

\[\begin{eqnarray*}
\ln L_i = & & {\bf 1}(y_i =1 \mbox{ and } t_i = 1) \ln \Phi_2\left\{\frac{{\bf x}_i{\boldsymbol \beta}_1}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},\rho_t\right\} + \cr
& & {\bf 1}(y_i=0 \mbox{ and } t_i=1)\ln \Phi_2\left\{\frac{-{\bf x}_i{\boldsymbol \beta}_1}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},-\rho_t\right\} + \cr
& & {\bf 1}(y_i=1 \mbox{ and } t_i=0) \ln \Phi_2\left\{\frac{{\bf x}_i{\boldsymbol \beta}_0}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{-{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},-\rho_t\right\} + \cr
& & {\bf 1}(y_i=0 \mbox{ and } t_i = 0)\ln \Phi_2\left\{\frac{-{\bf x}_i{\boldsymbol \beta}_0}{\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}, \frac{-{\bf z}_i{\boldsymbol \psi}}{\exp\left({\bf v}_i{\boldsymbol \alpha}\right)},\rho_t\right\}
\end{eqnarray*}\]

The data

We will simulate data from a heteroskedastic probit model with an endogenous treatment and then estimate the parameters of the model with mlexp. Then, we will use margins to estimate the ATE.

. set seed 323

. set obs 10000
number of observations (_N) was 0, now 10,000

. generate x = .8*rnormal() + 4

. generate b = rpoisson(1)

. generate z = rnormal()

. matrix cm = (1, .3,.7 \ .3, 1, .7 \ .7, .7, 1)

. drawnorm ey0 ey1 et, corr(cm)

We simulate a random sample of 10,000 observations. The treatment and outcome regressors are generated in a similar manner to their creation in the last post. As in the last post, we generate the errors with drawnorm to have correlation \(0.7\).

. generate g = runiform()

. generate h = rnormal()

. quietly replace ey0 = ey0*exp(.5*g)

. quietly replace ey1 = ey1*exp(.5*g)

. quietly replace et = et*exp(.1*h)

. generate t = .5*x - .1*b + .5*z - 2.4 + et > 0

. generate y0 = .6*x - .8 + ey0 > 0

. generate y1 = .3*x - 1.3 + ey1 > 0

. generate y = (1-t)*y0 + t*y1

The uniform variable g is generated as a regressor for the outcome error variance, while h is a regressor for the treatment error variance. We scale the errors by using the variance regressors so that they are heteroskedastic, and then we generate the treatment and outcome indicators.

Estimating the model parameters

Now, we will use mlexp to estimate the parameters of the heteroskedastic probit model with an endogenous treatment. As in the previous post, we use the cond() function to calculate different values of the likelihood based on the different values of \(y\) and \(t\). We use the factor-variable operator ibn on \(t\) in equation y to allow for a different intercept at each level of \(t\). An interaction between \(t\) and \(x\) is also specified in equation y. This allows for a different coefficient on \(x\) at each level of \(t\).

. mlexp (ln(cond(t, ///                                          
>         cond(y,binormal({y: i.t#c.x ibn.t}/exp({g:g}), ///
>             {t: x b z _cons}/exp({h:h}),{rho}), /// 
>                 binormal(-{y:}/exp({g:}),{t:}/exp({h:}),-{rho})), ///
>         cond(y,binormal({y:}/exp({g:}),-{t:}/exp({h:}),-{rho}), ///
>                 binormal(-{y:}/exp({g:}),-{t:}/exp({h:}),{rho}) ///
>         )))), vce(robust)

initial:       log pseudolikelihood = -13862.944
alternative:   log pseudolikelihood = -16501.619
rescale:       log pseudolikelihood = -13858.877
rescale eq:    log pseudolikelihood = -11224.877
Iteration 0:   log pseudolikelihood = -11224.877  (not concave)
Iteration 1:   log pseudolikelihood = -10644.625  
Iteration 2:   log pseudolikelihood = -10074.998  
Iteration 3:   log pseudolikelihood = -9976.6027  
Iteration 4:   log pseudolikelihood = -9973.0988  
Iteration 5:   log pseudolikelihood = -9973.0913  
Iteration 6:   log pseudolikelihood = -9973.0913  

Maximum likelihood estimation

Log pseudolikelihood = -9973.0913               Number of obs     =     10,000

------------------------------------------------------------------------------
             |               Robust
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
       t#c.x |
          0  |   .6178115   .0334521    18.47   0.000     .5522467    .6833764
          1  |   .2732094   .0365742     7.47   0.000     .2015253    .3448936
             |
           t |
          0  |  -.8403294   .1130197    -7.44   0.000    -1.061844   -.6188149
          1  |  -1.215177   .1837483    -6.61   0.000    -1.575317   -.8550371
-------------+----------------------------------------------------------------
g            |
           g |   .4993187   .0513297     9.73   0.000     .3987143    .5999232
-------------+----------------------------------------------------------------
t            |
           x |   .4985802   .0183033    27.24   0.000     .4627065    .5344539
           b |  -.1140255   .0132988    -8.57   0.000    -.1400908   -.0879603
           z |   .4993995   .0150844    33.11   0.000     .4698347    .5289643
       _cons |  -2.402772   .0780275   -30.79   0.000    -2.555703   -2.249841
-------------+----------------------------------------------------------------
h            |
           h |   .1011185   .0199762     5.06   0.000     .0619658    .1402713
-------------+----------------------------------------------------------------
        /rho |   .7036964   .0326734    21.54   0.000     .6396577    .7677351
------------------------------------------------------------------------------

Our parameter estimates are close to their true values.

Estimating the ATE

The ATE of \(t\) is the expected value of the difference between \(y_{1i}\) and \(y_{0i}\), the average difference between the potential outcomes. Using the law of iterated expectations, we have

\[\begin{eqnarray*}
E(y_{1i}-y_{0i})&=& E\left\{ E\left(y_{1i}-y_{0i}|{\bf x}_i,{\bf w}_i\right)\right\} \cr
&=& E\left\lbrack\Phi\left\{\frac{{\bf x}_i{\boldsymbol \beta}_1}{
\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}\right\}-
\Phi\left\{\frac{{\bf x}_i{\boldsymbol \beta}_0}{
\exp\left({\bf w}_i{\boldsymbol \gamma}\right)}\right\}\right\rbrack \cr
\end{eqnarray*}\]

This can be estimated as a mean of predictions.

Now, we estimate the ATE by using margins. We specify the normal probability expression in the expression() option. We use the expression function xb() to get the linear predictions for the outcome equation and the outcome error variance equation. We can now predict these linear forms after mlexp in Stata 14.1. We specify r.t so that margins will take the difference of the expression under t=1 and t=0. We specify vce(unconditional) to obtain standard errors for the population ATE rather than the sample ATE; we specified vce(robust) for mlexp so that we could specify vce(unconditional) for margins. The contrast(nowald) option is specified to omit the Wald test for the difference.

. margins r.t, expression(normal(xb(y)/exp(xb(g)))) ///
>     vce(unconditional) contrast(nowald)

Contrasts of predictive margins

Expression   : normal(xb(y)/exp(xb(g)))

--------------------------------------------------------------
             |            Unconditional
             |   Contrast   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
           t |
   (1 vs 0)  |  -.4183043   .0202635     -.4580202   -.3785885
--------------------------------------------------------------

We estimate that the ATE of \(t\) on \(y\) is \(-0.42\). So taking the treatment decreases the probability of a positive outcome by \(0.42\) on average over the population.

We will compare this estimate to the average difference of \(y_{1}\) and \(y_{0}\) in the sample. We can do this because we simulated the data. In practice, only one potential outcome is observed for every observation, and this average difference cannot be computed.

. generate diff = y1 - y0

. sum diff

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        diff |     10,000      -.4164    .5506736         -1          1

In our sample, the average difference of \(y_{1}\) and \(y_{0}\) is also \(-0.42\).

Conclusion

I have demonstrated how to estimate the parameters of a model that is not available in Stata: the heteroskedastic probit model with an endogenous treatment using mlexp. See [R] mlexp for more details about mlexp. I have also demonstrated how to use margins to estimate the ATE for the heteroskedastic probit model with an endogenous treatment. See [R] margins for more details about mlexp.

Reference

Harvey, A. C. 1976. Estimating regression models with multiplicative heteroscedasticity. Econometrica 44: 461-465.