Fitting ordered probit models with endogenous covariates with Stata’s gsem command
The new command gsem allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates.
Parameterizations for an ordinal probit model
The ordinal probit model is used to model ordinal dependent variables. In the usual parameterization, we assume that there is an underlying linear regression, which relates an unobserved continuous variable \(y^*\) to the covariates \(x\).
\[y^*_{i} = x_{i}\gamma + u_i\]
The observed dependent variable \(y\) relates to \(y^*\) through a series of cut-points \(-\infty =\kappa_0<\kappa_1<\dots< \kappa_m=+\infty\) , as follows:
\[y_{i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{i} \leq \kappa_j\]
Provided that the variance of \(u_i\) can’t be identified from the observed data, it is assumed to be equal to one. However, we can consider a re-scaled parameterization for the same model; a straightforward way of seeing this, is by noting that, for any positive number \(M\):
\[\kappa_{j-1} < y^*_{i} \leq \kappa_j \iff
M\kappa_{j-1} < M y^*_{i} \leq M\kappa_j
\]
that is,
\[\kappa_{j-1} < x_i\gamma + u_i \leq \kappa_j \iff
M\kappa_{j-1}< x_i(M\gamma) + Mu_i \leq M\kappa_j
\]
In other words, if the model is identified, it can be represented by multiplying the unobserved variable \(y\) by a positive number, and this will mean that the standard error of the residual component, the coefficients, and the cut-points will be multiplied by this number.
Let me show you an example; I will first fit a standard ordinal probit model, both with oprobit and with gsem. Then, I will use gsem to fit an ordinal probit model where the residual term for the underlying linear regression has a standard deviation equal to 2. I will do this by introducing a latent variable \(L\), with variance 1, and coefficient \(\sqrt 3\). This will be added to the underlying latent residual, with variance 1; then, the ‘new’ residual term will have variance equal to \(1+((\sqrt 3)^2\times Var(L))= 4\), so the standard deviation will be 2. We will see that as a result, the coefficients, as well as the cut-points, will be multiplied by 2.
. sysuse auto, clear (1978 Automobile Data) . oprobit rep mpg disp , nolog Ordered probit regression Number of obs = 69 LR chi2(2) = 14.68 Prob > chi2 = 0.0006 Log likelihood = -86.352646 Pseudo R2 = 0.0783 ------------------------------------------------------------------------------ rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858 displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252 -------------+---------------------------------------------------------------- /cut1 | -1.570496 1.146391 -3.81738 .6763888 /cut2 | -.7295982 1.122361 -2.929386 1.47019 /cut3 | .6580529 1.107838 -1.513269 2.829375 /cut4 | 1.60884 1.117905 -.5822132 3.799892 ------------------------------------------------------------------------------ . gsem (rep <- mpg disp, oprobit), nolog Generalized structural equation model Number of obs = 69 Log likelihood = -86.352646 -------------------------------------------------------------------------------- | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- rep78 <- | mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858 displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252 ---------------+---------------------------------------------------------------- rep78 | /cut1 | -1.570496 1.146391 -1.37 0.171 -3.81738 .6763888 /cut2 | -.7295982 1.122361 -0.65 0.516 -2.929386 1.47019 /cut3 | .6580529 1.107838 0.59 0.553 -1.513269 2.829375 /cut4 | 1.60884 1.117905 1.44 0.150 -.5822132 3.799892 -------------------------------------------------------------------------------- . local a = sqrt(3) . gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog Generalized structural equation model Number of obs = 69 Log likelihood = -86.353008 ( 1) [rep78]L = 1.732051 ( 2) [var(L)]_cons = 1 -------------------------------------------------------------------------------- | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- rep78 <- | mpg | .099532 .07113 1.40 0.162 -.0398802 .2389442 displacement | -.0059739 .0043002 -1.39 0.165 -.0144022 .0024544 L | 1.732051 (constrained) ---------------+---------------------------------------------------------------- rep78 | /cut1 | -3.138491 2.293613 -1.37 0.171 -7.63389 1.356907 /cut2 | -1.456712 2.245565 -0.65 0.517 -5.857938 2.944513 /cut3 | 1.318568 2.21653 0.59 0.552 -3.02575 5.662887 /cut4 | 3.220004 2.236599 1.44 0.150 -1.16365 7.603657 ---------------+---------------------------------------------------------------- var(L)| 1 (constrained) --------------------------------------------------------------------------------
Ordinal probit model with endogenous covariates
This model is defined analogously to the model fitted by -ivprobit- for probit models with endogenous covariates; we assume an underlying model with two equations,
\[
\begin{eqnarray}
y^*_{1i} =& y_{2i} \beta + x_{1i} \gamma + u_i & \\
y_{2i} =& x_{1i} \pi_1 + x_{2i} \pi_2 + v_i & \,\,\,\,\,\, (1)
\end{eqnarray}
\]
where \(u_i \sim N(0, 1) \), \(v_i\sim N(0,s^2) \), and \(corr(u_i, v_i) = \rho\).
We don’t observe \(y^*_{1i}\); instead, we observe a discrete variable \(y_{1i}\), such as, for a set of cut-points (to be estimated) \(\kappa_0 = -\infty < \kappa_1 < \kappa_2 \dots < \kappa_m = +\infty \),
\[y_{1i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{1i} \leq \kappa_j \]
The parameterization we will use
I will re-scale the first equation, preserving the correlation. That is, I will consider the following system:
\[
\begin{eqnarray}
z^*_{1i} =&
y_{2i}b +x_{1i}c + t_i + \alpha L_i &\\
y_{2i} = &x_{1i}\pi_1 + x_{2i}\pi_2 + w_i + \alpha L_i & \,\,\,\,\,\, (2)
\end{eqnarray}
\]
where \(t_i, w_i, L_i\) are independent, \(t_i \sim N(0, 1)\) , \(w_i \sim N(0,\sigma^2)\), \(L_i \sim N(0, 1)\)
\[y_{1i} = j {\mbox{ if }} \lambda_{j-1} < z^*_{1i} \leq \lambda_j \]
By introducing a latent variable in both equations, I am modeling a correlation between the error terms. The fist equation is a re-scaled version of the original equation, that is, \(z^*_1 = My^*_1\),
\[ y_{2i}b +x_{1i}c + t_i + \alpha_i L_i
= M(y_{2i}\beta) +M x_{1i}\gamma + M u_i \]
This implies that
\[M u_i = t_i + \alpha_i L_i, \]
where \(Var(u_i) = 1\) and \(Var(t_i + \alpha L_i) = 1 + \alpha^2\), so the scale is \(M = \sqrt{1+\alpha^2} \).
The second equation remains the same, we just express \(v_i\) as \(w_i + \alpha L_i\). Now, after estimating the system (2), we can recover the parameters in (1) as follows:
\[\beta = \frac{1}{\sqrt{1+ \alpha^2}} b\]
\[\gamma = \frac{1}{\sqrt{1+ \alpha^2}} c\]
\[\kappa_j = \frac{1}{\sqrt{1+ \alpha^2}} \lambda_j \]
\[V(v_i) = V(w_i + \alpha L_i) =V(w_i) + \alpha^2\].
\[\rho = Cov(t_i + \alpha L_i, w_i + \alpha L_i) =
\frac{\alpha^2}{(\sqrt{1+\alpha^2}\sqrt{V(w_i)+\alpha^2)}}\]
Note: This parameterization assumes that the correlation is positive; for negative values of the correlation, \(L\) should be included in the second equation with a negative sign (that is, L@(-a) instead of L@a). When trying to perform the estimation with the wrong sign, the model most likely won’t achieve convergence. Otherwise, you will see a coefficient for L that is virtually zero. In Stata 13.1 we have included features that allow you to fit the model without this restriction. However, this time we will use the older parameterization, which will allow you to visualize the different components more easily.
Simulating data, and performing the estimation
clear set seed 1357 set obs 10000 forvalues i = 1(1)5 { gen x`i' =2* rnormal() + _n/1000 } mat C = [1,.5 \ .5, 1] drawnorm z1 z2, cov(C) gen y2 = 0 forvalues i = 1(1)5 { replace y2 = y2 + x`i' } replace y2 = y2 + z2 gen y1star = y2 + x1 + x2 + z1 gen xb1 = y2 + x1 + x2 gen y1 = 4 replace y1 = 3 if xb1 + z1 <=.8 replace y1 = 2 if xb1 + z1 <=.3 replace y1 = 1 if xb1 + z1 <=-.3 replace y1 = 0 if xb1 + z1 <=-.8 gsem (y1 <- y2 x1 x2 L@a, oprobit) (y2 <- x1 x2 x3 x4 x5 L@a), var(L@1) local y1 y1 local y2 y2 local xaux x1 x2 x3 x4 x5 local xmain y2 x1 x2 local s2 sqrt(1+_b[`y1':L]^2) foreach v in `xmain'{ local trans `trans' (`y1'_`v': _b[`y1':`v']/`s2') } foreach v in `xaux' _cons { local trans `trans' (`y2'_`v': _b[`y2':`v']) } qui tab `y1' if e(sample) local ncuts = r(r)-1 forvalues i = 1(1) `ncuts'{ local trans `trans' (cut_`i': _b[`y1'_cut`i':_cons]/`s2') } local s1 sqrt( _b[var(e.`y2'):_cons] +_b[`y1':L]^2) local trans `trans' (sig_2: `s1') local trans `trans' (rho_12: _b[`y1':L]^2/(`s1'*`s2')) nlcom `trans'
Results
This is the output from gsem:
Generalized structural equation model Number of obs = 10000 Log likelihood = -14451.117 ( 1) [y1]L - [y2]L = 0 ( 2) [var(L)]_cons = 1 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- y1 <- | y2 | 1.379511 .0775028 17.80 0.000 1.227608 1.531414 x1 | 1.355687 .0851558 15.92 0.000 1.188785 1.522589 x2 | 1.346323 .0833242 16.16 0.000 1.18301 1.509635 L | .7786594 .0479403 16.24 0.000 .6846982 .8726206 -------------+---------------------------------------------------------------- y2 <- | x1 | .9901353 .0044941 220.32 0.000 .981327 .9989435 x2 | 1.006836 .0044795 224.76 0.000 .998056 1.015615 x3 | 1.004249 .0044657 224.88 0.000 .9954963 1.013002 x4 | .9976541 .0044783 222.77 0.000 .9888767 1.006431 x5 | .9987587 .0044736 223.26 0.000 .9899907 1.007527 L | .7786594 .0479403 16.24 0.000 .6846982 .8726206 _cons | .0002758 .0192417 0.01 0.989 -.0374372 .0379887 -------------+---------------------------------------------------------------- y1 | /cut1 | -1.131155 .1157771 -9.77 0.000 -1.358074 -.9042358 /cut2 | -.5330973 .1079414 -4.94 0.000 -.7446585 -.321536 /cut3 | .2722794 .1061315 2.57 0.010 .0642654 .4802933 /cut4 | .89394 .1123013 7.96 0.000 .6738334 1.114047 -------------+---------------------------------------------------------------- var(L)| 1 (constrained) -------------+---------------------------------------------------------------- var(e.y2)| .3823751 .074215 .2613848 .5593696 ------------------------------------------------------------------------------
These are the results we obtain when we transform the values reported by gsem to the original parameterization:
------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- y1_y2 | 1.088455 .0608501 17.89 0.000 .9691909 1.207719 y1_x1 | 1.069657 .0642069 16.66 0.000 .943814 1.195501 y1_x2 | 1.062269 .0619939 17.14 0.000 .940763 1.183774 y2_x1 | .9901353 .0044941 220.32 0.000 .981327 .9989435 y2_x2 | 1.006836 .0044795 224.76 0.000 .998056 1.015615 y2_x3 | 1.004249 .0044657 224.88 0.000 .9954963 1.013002 y2_x4 | .9976541 .0044783 222.77 0.000 .9888767 1.006431 y2_x5 | .9987587 .0044736 223.26 0.000 .9899907 1.007527 y2__cons | .0002758 .0192417 0.01 0.989 -.0374372 .0379887 cut_1 | -.892498 .0895971 -9.96 0.000 -1.068105 -.7168909 cut_2 | -.4206217 .0841852 -5.00 0.000 -.5856218 -.2556217 cut_3 | .2148325 .0843737 2.55 0.011 .0494632 .3802018 cut_4 | .705332 .0905974 7.79 0.000 .5277644 .8828997 sig_2 | .9943267 .007031 141.42 0.000 .9805462 1.008107 rho_12 | .4811176 .0477552 10.07 0.000 .3875191 .574716 ------------------------------------------------------------------------------
The estimates are quite close to the values used for the simulation. If you try to perform the estimation with the wrong sign for the coefficient for L, you will get a number that is virtually zero (if you get convergence at all). In this case, the evaluator is telling us that the best value it can find, provided the restrictions we have imposed, is zero. If you see such results, you may want to try the opposite sign. If both give a zero coefficient, it means that this is the solution, and there is not endogeneity at all. If one of them is not zero, it means that the non-zero value is the solution. As stated before, in Stata 13.1, the model can be fitted without this restriction.