## Fitting ordered probit models with endogenous covariates with Stata’s gsem command

The new command **gsem** allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates.

### Parameterizations for an ordinal probit model

The ordinal probit model is used to model ordinal dependent variables. In the usual parameterization, we assume that there is an underlying linear regression, which relates an unobserved continuous variable \(y^*\) to the covariates \(x\).

\[y^*_{i} = x_{i}\gamma + u_i\]

The observed dependent variable \(y\) relates to \(y^*\) through a series of cut-points \(-\infty =\kappa_0<\kappa_1<\dots< \kappa_m=+\infty\) , as follows:

\[y_{i} = j {\mbox{ if }} \kappa_{j-1} < y^*_{i} \leq \kappa_j\]

Provided that the variance of \(u_i\) can’t be identified from the observed data, it is assumed to be equal to one. However, we can consider a re-scaled parameterization for the same model; a straightforward way of seeing this, is by noting that, for any positive number \(M\):

\[\kappa_{j-1} < y^*_{i} \leq \kappa_j \iff

M\kappa_{j-1} < M y^*_{i} \leq M\kappa_j

\]

that is,

\[\kappa_{j-1} < x_i\gamma + u_i \leq \kappa_j \iff

M\kappa_{j-1}< x_i(M\gamma) + Mu_i \leq M\kappa_j

\]

In other words, if the model is identified, it can be represented by multiplying the unobserved variable \(y\) by a positive number, and this will mean that the standard error of the residual component, the coefficients, and the cut-points will be multiplied by this number.

Let me show you an example; I will first fit a standard ordinal probit model, both with **oprobit** and with **gsem**. Then, I will use **gsem** to fit an ordinal probit model where the residual term for the underlying linear regression has a standard deviation equal to 2. I will do this by introducing a latent variable \(L\), with variance 1, and coefficient \(\sqrt 3\). This will be added to the underlying latent residual, with variance 1; then, the ‘new’ residual term will have variance equal to \(1+((\sqrt 3)^2\times Var(L))= 4\), so the standard deviation will be 2. We will see that as a result, the coefficients, as well as the cut-points, will be multiplied by 2.

. sysuse auto, clear (1978 Automobile Data) . oprobit rep mpg disp , nolog Ordered probit regression Number of obs = 69 LR chi2(2) = 14.68 Prob > chi2 = 0.0006 Log likelihood = -86.352646 Pseudo R2 = 0.0783 ------------------------------------------------------------------------------ rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858 displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252 -------------+---------------------------------------------------------------- /cut1 | -1.570496 1.146391 -3.81738 .6763888 /cut2 | -.7295982 1.122361 -2.929386 1.47019 /cut3 | .6580529 1.107838 -1.513269 2.829375 /cut4 | 1.60884 1.117905 -.5822132 3.799892 ------------------------------------------------------------------------------ . gsem (rep <- mpg disp, oprobit), nolog Generalized structural equation model Number of obs = 69 Log likelihood = -86.352646 -------------------------------------------------------------------------------- | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- rep78 <- | mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858 displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252 ---------------+---------------------------------------------------------------- rep78 | /cut1 | -1.570496 1.146391 -1.37 0.171 -3.81738 .6763888 /cut2 | -.7295982 1.122361 -0.65 0.516 -2.929386 1.47019 /cut3 | .6580529 1.107838 0.59 0.553 -1.513269 2.829375 /cut4 | 1.60884 1.117905 1.44 0.150 -.5822132 3.799892 -------------------------------------------------------------------------------- . local a = sqrt(3) . gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog Generalized structural equation model Number of obs = 69 Log likelihood = -86.353008 ( 1) [rep78]L = 1.732051 ( 2) [var(L)]_cons = 1 -------------------------------------------------------------------------------- | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------+---------------------------------------------------------------- rep78 <- | mpg | .099532 .07113 1.40 0.162 -.0398802 .2389442 displacement | -.0059739 .0043002 -1.39 0.165 -.0144022 .0024544 L | 1.732051 (constrained) ---------------+---------------------------------------------------------------- rep78 | /cut1 | -3.138491 2.293613 -1.37 0.171 -7.63389 1.356907 /cut2 | -1.456712 2.245565 -0.65 0.517 -5.857938 2.944513 /cut3 | 1.318568 2.21653 0.59 0.552 -3.02575 5.662887 /cut4 | 3.220004 2.236599 1.44 0.150 -1.16365 7.603657 ---------------+---------------------------------------------------------------- var(L)| 1 (constrained) --------------------------------------------------------------------------------

### Ordinal probit model with endogenous covariates

This model is defined analogously to the model fitted by -ivprobit- for probit models with endogenous covariates; we assume Read more…