Fitting ordered probit models with endogenous covariates with Stata’s gsem command
The new command gsem allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates.
Parameterizations for an ordinal probit model
The ordinal probit model is used to model ordinal dependent variables. In the usual parameterization, we assume that there is an underlying linear regression, which relates an unobserved continuous variable \(y^*\) to the covariates \(x\).
\[y^*_{i} = x_{i}\gamma + u_i\]
The observed dependent variable \(y\) relates to \(y^*\) through a series of cut-points \(-\infty =\kappa_0<\kappa_1<\dots< \kappa_m=+\infty\) , as follows:
\[y_{i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{i} \leq \kappa_j\]
Provided that the variance of \(u_i\) can’t be identified from the observed data, it is assumed to be equal to one. However, we can consider a re-scaled parameterization for the same model; a straightforward way of seeing this, is by noting that, for any positive number \(M\):
\[\kappa_{j-1} < y^*_{i} \leq \kappa_j \iff
M\kappa_{j-1} < M y^*_{i} \leq M\kappa_j
\]
that is,
\[\kappa_{j-1} < x_i\gamma + u_i \leq \kappa_j \iff
M\kappa_{j-1}< x_i(M\gamma) + Mu_i \leq M\kappa_j
\]
In other words, if the model is identified, it can be represented by multiplying the unobserved variable \(y\) by a positive number, and this will mean that the standard error of the residual component, the coefficients, and the cut-points will be multiplied by this number.
Let me show you an example; I will first fit a standard ordinal probit model, both with oprobit and with gsem. Then, I will use gsem to fit an ordinal probit model where the residual term for the underlying linear regression has a standard deviation equal to 2. I will do this by introducing a latent variable \(L\), with variance 1, and coefficient \(\sqrt 3\). This will be added to the underlying latent residual, with variance 1; then, the ‘new’ residual term will have variance equal to \(1+((\sqrt 3)^2\times Var(L))= 4\), so the standard deviation will be 2. We will see that as a result, the coefficients, as well as the cut-points, will be multiplied by 2.
. sysuse auto, clear
(1978 Automobile Data)
. oprobit rep mpg disp , nolog
Ordered probit regression Number of obs = 69
LR chi2(2) = 14.68
Prob > chi2 = 0.0006
Log likelihood = -86.352646 Pseudo R2 = 0.0783
------------------------------------------------------------------------------
rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858
displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252
-------------+----------------------------------------------------------------
/cut1 | -1.570496 1.146391 -3.81738 .6763888
/cut2 | -.7295982 1.122361 -2.929386 1.47019
/cut3 | .6580529 1.107838 -1.513269 2.829375
/cut4 | 1.60884 1.117905 -.5822132 3.799892
------------------------------------------------------------------------------
. gsem (rep <- mpg disp, oprobit), nolog
Generalized structural equation model Number of obs = 69
Log likelihood = -86.352646
--------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <- |
mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858
displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252
---------------+----------------------------------------------------------------
rep78 |
/cut1 | -1.570496 1.146391 -1.37 0.171 -3.81738 .6763888
/cut2 | -.7295982 1.122361 -0.65 0.516 -2.929386 1.47019
/cut3 | .6580529 1.107838 0.59 0.553 -1.513269 2.829375
/cut4 | 1.60884 1.117905 1.44 0.150 -.5822132 3.799892
--------------------------------------------------------------------------------
. local a = sqrt(3)
. gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog
Generalized structural equation model Number of obs = 69
Log likelihood = -86.353008
( 1) [rep78]L = 1.732051
( 2) [var(L)]_cons = 1
--------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <- |
mpg | .099532 .07113 1.40 0.162 -.0398802 .2389442
displacement | -.0059739 .0043002 -1.39 0.165 -.0144022 .0024544
L | 1.732051 (constrained)
---------------+----------------------------------------------------------------
rep78 |
/cut1 | -3.138491 2.293613 -1.37 0.171 -7.63389 1.356907
/cut2 | -1.456712 2.245565 -0.65 0.517 -5.857938 2.944513
/cut3 | 1.318568 2.21653 0.59 0.552 -3.02575 5.662887
/cut4 | 3.220004 2.236599 1.44 0.150 -1.16365 7.603657
---------------+----------------------------------------------------------------
var(L)| 1 (constrained)
--------------------------------------------------------------------------------
Ordinal probit model with endogenous covariates
This model is defined analogously to the model fitted by -ivprobit- for probit models with endogenous covariates; we assume an underlying model with two equations,
\[
\begin{eqnarray}
y^*_{1i} =& y_{2i} \beta + x_{1i} \gamma + u_i & \\
y_{2i} =& x_{1i} \pi_1 + x_{2i} \pi_2 + v_i & \,\,\,\,\,\, (1)
\end{eqnarray}
\]
where \(u_i \sim N(0, 1) \), \(v_i\sim N(0,s^2) \), and \(corr(u_i, v_i) = \rho\).
We don’t observe \(y^*_{1i}\); instead, we observe a discrete variable \(y_{1i}\), such as, for a set of cut-points (to be estimated) \(\kappa_0 = -\infty < \kappa_1 < \kappa_2 \dots < \kappa_m = +\infty \),
\[y_{1i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{1i} \leq \kappa_j \]
The parameterization we will use
I will re-scale the first equation, preserving the correlation. That is, I will consider the following system:
\[
\begin{eqnarray}
z^*_{1i} =&
y_{2i}b +x_{1i}c + t_i + \alpha L_i &\\
y_{2i} = &x_{1i}\pi_1 + x_{2i}\pi_2 + w_i + \alpha L_i & \,\,\,\,\,\, (2)
\end{eqnarray}
\]
where \(t_i, w_i, L_i\) are independent, \(t_i \sim N(0, 1)\) , \(w_i \sim N(0,\sigma^2)\), \(L_i \sim N(0, 1)\)
\[y_{1i} = j {\mbox{ if }} \lambda_{j-1} < z^*_{1i} \leq \lambda_j \]
By introducing a latent variable in both equations, I am modeling a correlation between the error terms. The fist equation is a re-scaled version of the original equation, that is, \(z^*_1 = My^*_1\),
\[ y_{2i}b +x_{1i}c + t_i + \alpha_i L_i
= M(y_{2i}\beta) +M x_{1i}\gamma + M u_i \]
This implies that
\[M u_i = t_i + \alpha_i L_i, \]
where \(Var(u_i) = 1\) and \(Var(t_i + \alpha L_i) = 1 + \alpha^2\), so the scale is \(M = \sqrt{1+\alpha^2} \).
The second equation remains the same, we just express \(v_i\) as \(w_i + \alpha L_i\). Now, after estimating the system (2), we can recover the parameters in (1) as follows:
\[\beta = \frac{1}{\sqrt{1+ \alpha^2}} b\]
\[\gamma = \frac{1}{\sqrt{1+ \alpha^2}} c\]
\[\kappa_j = \frac{1}{\sqrt{1+ \alpha^2}} \lambda_j \]
\[V(v_i) = V(w_i + \alpha L_i) =V(w_i) + \alpha^2\].
\[\rho = Cov(t_i + \alpha L_i, w_i + \alpha L_i) =
\frac{\alpha^2}{(\sqrt{1+\alpha^2}\sqrt{V(w_i)+\alpha^2)}}\]
Note: This parameterization assumes that the correlation is positive; for negative values of the correlation, \(L\) should be included in the second equation with a negative sign (that is, L@(-a) instead of L@a). When trying to perform the estimation with the wrong sign, the model most likely won’t achieve convergence. Otherwise, you will see a coefficient for L that is virtually zero. In Stata 13.1 we have included features that allow you to fit the model without this restriction. However, this time we will use the older parameterization, which will allow you to visualize the different components more easily.
Simulating data, and performing the estimation
clear
set seed 1357
set obs 10000
forvalues i = 1(1)5 {
gen x`i' =2* rnormal() + _n/1000
}
mat C = [1,.5 \ .5, 1]
drawnorm z1 z2, cov(C)
gen y2 = 0
forvalues i = 1(1)5 {
replace y2 = y2 + x`i'
}
replace y2 = y2 + z2
gen y1star = y2 + x1 + x2 + z1
gen xb1 = y2 + x1 + x2
gen y1 = 4
replace y1 = 3 if xb1 + z1 <=.8
replace y1 = 2 if xb1 + z1 <=.3
replace y1 = 1 if xb1 + z1 <=-.3
replace y1 = 0 if xb1 + z1 <=-.8
gsem (y1 <- y2 x1 x2 L@a, oprobit) (y2 <- x1 x2 x3 x4 x5 L@a), var(L@1)
local y1 y1
local y2 y2
local xaux x1 x2 x3 x4 x5
local xmain y2 x1 x2
local s2 sqrt(1+_b[`y1':L]^2)
foreach v in `xmain'{
local trans `trans' (`y1'_`v': _b[`y1':`v']/`s2')
}
foreach v in `xaux' _cons {
local trans `trans' (`y2'_`v': _b[`y2':`v'])
}
qui tab `y1' if e(sample)
local ncuts = r(r)-1
forvalues i = 1(1) `ncuts'{
local trans `trans' (cut_`i': _b[`y1'_cut`i':_cons]/`s2')
}
local s1 sqrt( _b[var(e.`y2'):_cons] +_b[`y1':L]^2)
local trans `trans' (sig_2: `s1')
local trans `trans' (rho_12: _b[`y1':L]^2/(`s1'*`s2'))
nlcom `trans'
Results
This is the output from gsem:
Generalized structural equation model Number of obs = 10000
Log likelihood = -14451.117
( 1) [y1]L - [y2]L = 0
( 2) [var(L)]_cons = 1
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 <- |
y2 | 1.379511 .0775028 17.80 0.000 1.227608 1.531414
x1 | 1.355687 .0851558 15.92 0.000 1.188785 1.522589
x2 | 1.346323 .0833242 16.16 0.000 1.18301 1.509635
L | .7786594 .0479403 16.24 0.000 .6846982 .8726206
-------------+----------------------------------------------------------------
y2 <- |
x1 | .9901353 .0044941 220.32 0.000 .981327 .9989435
x2 | 1.006836 .0044795 224.76 0.000 .998056 1.015615
x3 | 1.004249 .0044657 224.88 0.000 .9954963 1.013002
x4 | .9976541 .0044783 222.77 0.000 .9888767 1.006431
x5 | .9987587 .0044736 223.26 0.000 .9899907 1.007527
L | .7786594 .0479403 16.24 0.000 .6846982 .8726206
_cons | .0002758 .0192417 0.01 0.989 -.0374372 .0379887
-------------+----------------------------------------------------------------
y1 |
/cut1 | -1.131155 .1157771 -9.77 0.000 -1.358074 -.9042358
/cut2 | -.5330973 .1079414 -4.94 0.000 -.7446585 -.321536
/cut3 | .2722794 .1061315 2.57 0.010 .0642654 .4802933
/cut4 | .89394 .1123013 7.96 0.000 .6738334 1.114047
-------------+----------------------------------------------------------------
var(L)| 1 (constrained)
-------------+----------------------------------------------------------------
var(e.y2)| .3823751 .074215 .2613848 .5593696
------------------------------------------------------------------------------
These are the results we obtain when we transform the values reported by gsem to the original parameterization:
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1_y2 | 1.088455 .0608501 17.89 0.000 .9691909 1.207719
y1_x1 | 1.069657 .0642069 16.66 0.000 .943814 1.195501
y1_x2 | 1.062269 .0619939 17.14 0.000 .940763 1.183774
y2_x1 | .9901353 .0044941 220.32 0.000 .981327 .9989435
y2_x2 | 1.006836 .0044795 224.76 0.000 .998056 1.015615
y2_x3 | 1.004249 .0044657 224.88 0.000 .9954963 1.013002
y2_x4 | .9976541 .0044783 222.77 0.000 .9888767 1.006431
y2_x5 | .9987587 .0044736 223.26 0.000 .9899907 1.007527
y2__cons | .0002758 .0192417 0.01 0.989 -.0374372 .0379887
cut_1 | -.892498 .0895971 -9.96 0.000 -1.068105 -.7168909
cut_2 | -.4206217 .0841852 -5.00 0.000 -.5856218 -.2556217
cut_3 | .2148325 .0843737 2.55 0.011 .0494632 .3802018
cut_4 | .705332 .0905974 7.79 0.000 .5277644 .8828997
sig_2 | .9943267 .007031 141.42 0.000 .9805462 1.008107
rho_12 | .4811176 .0477552 10.07 0.000 .3875191 .574716
------------------------------------------------------------------------------
The estimates are quite close to the values used for the simulation. If you try to perform the estimation with the wrong sign for the coefficient for L, you will get a number that is virtually zero (if you get convergence at all). In this case, the evaluator is telling us that the best value it can find, provided the restrictions we have imposed, is zero. If you see such results, you may want to try the opposite sign. If both give a zero coefficient, it means that this is the solution, and there is not endogeneity at all. If one of them is not zero, it means that the non-zero value is the solution. As stated before, in Stata 13.1, the model can be fitted without this restriction.