## regress, probit, or logit?

In a previous post I illustrated that the probit model and the logit model produce statistically equivalent estimates of marginal effects. In this post, I compare the marginal effect estimates from a linear probability model (linear regression) with marginal effect estimates from probit and logit models.

My simulations show that when the true model is a probit or a logit, using a linear probability model can produce inconsistent estimates of the marginal effects of interest to researchers. The conclusions hinge on the probit or logit model being the true model.

**Simulation results**

For all simulations below, I use a sample size of 10,000 and 5,000 replications. The true data-generating processes (DGPs) are constructed using one discrete covariate and one continuous covariate. I study the average effect of a change in the continuous variable on the conditional probability (AME) and the average effect of a change in the discrete covariate on the conditional probability (ATE). I also look at the effect of a change in the continuous variable on the conditional probability, evaluated at the mean value of the covariates (MEM), and the effect of a change in the discrete covariate on the conditional probability, evaluated at the mean value of the covariates (TEM).

In Table 1, I present the results of a simulation when the true DGP satisfies the assumptions of a logit model. I show the average of the AME and the ATE estimates and the 5% rejection rate of the true null hypotheses. I also provide an approximate true value of the AME and ATE. I obtain the approximate true values by computing the ATE and AME, at the true values of the coefficients, using a sample of 20 million observations. I will provide more details on the simulation in a later section.

**Table 1: Average Marginal and Treatment Effects: True DGP Logit**

Statistic | Approximate True Value | Logit | Regress (LPM) |
---|---|---|---|

AME of x1 |
-.084 | -.084 | -.094 |

5% Rejection Rate | .050 | .99 | |

ATE of x2 |
.092 | .091 | .091 |

5% Rejection Rate | .058 | .058 |

From Table 1, we see that the logit model estimates are close to the true value and that the rejection rate of the true null hypothesis is close to 5%. For the linear probability model, the rejection rate is 99% for the AME. For the ATE, the rejection rate and point estimates are close to what is estimated using a logit.

For the MEM and TEM, we have the following:

**Table 2: Marginal and Treatment Eects at Mean Values: True DGP Logit**

Statistic | Approximate True Value | Logit | Regress (LPM) |
---|---|---|---|

MEM of x1 |
-.099 | -.099 | -.094 |

5% Rejection Rate | .054 | .618 | |

TEM of x2 |
.109 | .109 | .092 |

5% Rejection Rate | .062 | .073 |

Again, logit estimates behave as expected. For the linear probability model, the rejection rate of the true null hypothesis is 62% for the MEM. For the TEM the rejection rate is 7.3%, and the estimated effect is smaller than the true effect.

For the AME and ATE, when the true GDP is a probit, we have the following:

**Table 3: Average Marginal and Treatment Effects: True DGP Probit**

Statistic | Approximate True Value | Probit | Regress (LPM) |
---|---|---|---|

AME of x1 |
-.094 | -.094 | -.121 |

5% Rejection Rate | .047 | 1 | |

ATE of x2 |
.111 | .111 | .111 |

5% Rejection Rate | .065 | .061 |

The probit model estimates are close to the true value, and the rejection rate of the true null hypothesis is close to 5%. For the linear probability model, the rejection rate is 100% for the AME. For the ATE, the rejection rate and point estimates are close to what is estimated using a probit.

For the MEM and TEM, we have the following:

**Table 4: Marginal and Treatment Effects at Mean Values: True DGP Probit**

Statistic | Approximate True Value | Probit | Regress (LPM) |
---|---|---|---|

MEM of x1 |
-.121 | -.122 | -.121 |

5% Rejection Rate | .063 | .054 | |

TEM of x2 |
.150 | .150 | .110 |

5% Rejection Rate | .059 | .158 |

For the MEM, the probit and linear probability model produce reliable inference. For the TEM, the probit marginal effects behave as expected, but the linear probability model has a rejection rate of 16%, and the point estimates are not close to the true value.

**Simulation design**

Below is the code I used to generate the data for my simulations. In the first part, lines 6 to 13, I generate outcome variables that satisfy the assumptions of the logit model, **y**, and the probit model, **yp**. In the second part, lines 15 to 19, I compute the marginal effects for the logit and probit models. I have a continuous and a discrete covariate. For the discrete covariate, the marginal effect is a treatment effect. In the third part, lines 21 to 29, I compute the marginal effects evaluated at the means. I will use these estimates later to compute approximations to the true values of the effects.

program define mkdata syntax, [n(integer 1000)] clear quietly set obs `n' // 1. Generating data from probit, logit, and misspecified generate x1 = rchi2(2)-2 generate x2 = rbeta(4,2)>.2 generate u = runiform() generate e = ln(u) -ln(1-u) generate ep = rnormal() generate xb = .5*(1 - x1 + x2) generate y = xb + e > 0 generate yp = xb + ep > 0 // 2. Computing probit & logit marginal and treatment effects generate m1 = exp(xb)*(-.5)/(1+exp(xb))^2 generate m2 = exp(1 -.5*x1)/(1+ exp(1 -.5*x1 )) - /// exp(.5 -.5*x1)/(1+ exp(.5 -.5*x1 )) generate m1p = normalden(xb)*(-.5) generate m2p = normal(1 -.5*x1 ) - normal(.5 -.5*x1) // 3. Computing marginal and treatment effects at means quietly mean x1 x2 matrix A = r(table) scalar a = .5 -.5*A[1,1] + .5*A[1,2] scalar b1 = 1 -.5*A[1,1] scalar b0 = .5 -.5*A[1,1] generate mean1 = exp(a)*(-.5)/(1+exp(a))^2 generate mean2 = exp(b1)/(1+ exp(b1)) - exp(b0)/(1+ exp(b0)) generate mean1p = normalden(a)*(-.5) generate mean2p = normal(b1) - normal(b0) end

I approximate the true marginal effects using a sample of 20 million observations. This is a reasonable strategy in this case. For example, take the average marginal effect for a continuous covariate, \(x_{k}\), in the case of the probit model:

\[\begin{equation*}

\frac{1}{N}\sum_{i=1}^N \phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}

\end{equation*}\]

The expression above is an approximation of \(E\left(\phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}\right)\). To obtain this expected value, we would need to integrate over the distribution of all the covariates. This is not practical and would limit my choice of covariates. Instead, I draw a sample of 20 million observations, compute \(\frac{1}{N}\sum_{i=1}^N \phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}\), and take it to be the true value. I follow the same logic for the other marginal effects.

Below is the code I use to compute the approximate true marginal effects. I draw the 20 million observations, compute the averages that I wil use in my simulation, and create locals for each approximate true value.

. mkdata, n(`L') (2 missing values generated) . local values "m1 m2 mean1 mean2 m1p m2p mean1p mean2p" . local means "mx1 mx2 meanx1 meanx2 mx1p mx2p meanx1p meanx2p" . local n : word count `values' . . forvalues i= 1/`n' { 2. local a: word `i' of `values' 3. local b: word `i' of `means' 4. sum `a', meanonly 5. local `b' = r(mean) 6. }

Now, I am ready to run all the simulations that I used to produce the results in the previous section. The code that I used for the simulations for the TEM and the MEM when the true DGP is a logit is given by:

. postfile lpm y1l y1l_r y1lp y1lp_r y2l y2l_r y2lp y2lp_r /// > using simslpm, replace . forvalues i=1/`R' { 2. quietly { 3. mkdata, n(`N') 4. logit y x1 i.x2, vce(robust) 5. margins, dydx(*) atmeans post vce(unconditional) 6. local y1l = _b[x1] 7. test _b[x1] = `meanx1' 8. local y1l_r = (r(p)<.05) 9. local y2l = _b[1.x2] 10. test _b[1.x2] = `meanx2' 11. local y2l_r = (r(p)<.05) 12. regress y x1 i.x2, vce(robust) 13. margins, dydx(*) atmeans post vce(unconditional) 14. local y1lp = _b[x1] 15. test _b[x1] = `meanx1' 16. local y1lp_r = (r(p)<.05) 17. local y2lp = _b[1.x2] 18. test _b[1.x2] = `meanx2' 19. local y2lp_r = (r(p)<.05) 20. post lpm (`y1l') (`y1l_r') (`y1lp') (`y1lp_r') /// > (`y2l') (`y2l_r') (`y2lp') (`y2lp_r') 21. } 22. } . postclose lpm . use simslpm, clear . sum Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- y1l | 5,000 -.0985646 .00288 -.1083639 -.0889075 y1l_r | 5,000 .0544 .226828 0 1 y1lp | 5,000 -.0939211 .0020038 -.1008612 -.0868043 y1lp_r | 5,000 .6182 .4858765 0 1 y2l | 5,000 .1084959 .065586 -.1065291 .3743112 -------------+--------------------------------------------------------- y2l_r | 5,000 .0618 .240816 0 1 y2lp | 5,000 .0915894 .055462 -.0975456 .3184061 y2lp_r | 5,000 .0732 .2604906 0 1

For the results for the AME and the ATE when the true DGP is a logit, I use **margins** without the **atmeans** option. The other cases are similar. I use robust standard errors for all computations because my likelihood model is an approximation to the true likelihood, and I use the option **vce(unconditional)** to account for the fact that I am using two-step M-estimation. See Wooldridge (2010) for more details on two-step M-estimation.

You can obtain the code used to produce these results here.

**Conclusion**

Using a probit or a logit model yields equivalent marginal effects. I provide evidence that the same cannot be said of the marginal effect estimates of the linear probability model when compared with those of the logit and probit models.

**Acknowledgment**

This post was inspired by a question posed by Stephen Jenkins after my previous post.

**Reference**

Wooldridge, J. M. 2010. *Econometric Analysis of Cross Section and Panel Data*. 2nd ed. Cambridge, Massachusetts: MIT Press.