regress, probit, or logit?
In a previous post I illustrated that the probit model and the logit model produce statistically equivalent estimates of marginal effects. In this post, I compare the marginal effect estimates from a linear probability model (linear regression) with marginal effect estimates from probit and logit models.
My simulations show that when the true model is a probit or a logit, using a linear probability model can produce inconsistent estimates of the marginal effects of interest to researchers. The conclusions hinge on the probit or logit model being the true model.
Simulation results
For all simulations below, I use a sample size of 10,000 and 5,000 replications. The true data-generating processes (DGPs) are constructed using one discrete covariate and one continuous covariate. I study the average effect of a change in the continuous variable on the conditional probability (AME) and the average effect of a change in the discrete covariate on the conditional probability (ATE). I also look at the effect of a change in the continuous variable on the conditional probability, evaluated at the mean value of the covariates (MEM), and the effect of a change in the discrete covariate on the conditional probability, evaluated at the mean value of the covariates (TEM).
In Table 1, I present the results of a simulation when the true DGP satisfies the assumptions of a logit model. I show the average of the AME and the ATE estimates and the 5% rejection rate of the true null hypotheses. I also provide an approximate true value of the AME and ATE. I obtain the approximate true values by computing the ATE and AME, at the true values of the coefficients, using a sample of 20 million observations. I will provide more details on the simulation in a later section.
Table 1: Average Marginal and Treatment Effects: True DGP Logit
Statistic | Approximate True Value | Logit | Regress (LPM) |
---|---|---|---|
AME of x1 | -.084 | -.084 | -.094 |
5% Rejection Rate | .050 | .99 | |
ATE of x2 | .092 | .091 | .091 |
5% Rejection Rate | .058 | .058 |
From Table 1, we see that the logit model estimates are close to the true value and that the rejection rate of the true null hypothesis is close to 5%. For the linear probability model, the rejection rate is 99% for the AME. For the ATE, the rejection rate and point estimates are close to what is estimated using a logit.
For the MEM and TEM, we have the following:
Table 2: Marginal and Treatment Eects at Mean Values: True DGP Logit
Statistic | Approximate True Value | Logit | Regress (LPM) |
---|---|---|---|
MEM of x1 | -.099 | -.099 | -.094 |
5% Rejection Rate | .054 | .618 | |
TEM of x2 | .109 | .109 | .092 |
5% Rejection Rate | .062 | .073 |
Again, logit estimates behave as expected. For the linear probability model, the rejection rate of the true null hypothesis is 62% for the MEM. For the TEM the rejection rate is 7.3%, and the estimated effect is smaller than the true effect.
For the AME and ATE, when the true GDP is a probit, we have the following:
Table 3: Average Marginal and Treatment Effects: True DGP Probit
Statistic | Approximate True Value | Probit | Regress (LPM) |
---|---|---|---|
AME of x1 | -.094 | -.094 | -.121 |
5% Rejection Rate | .047 | 1 | |
ATE of x2 | .111 | .111 | .111 |
5% Rejection Rate | .065 | .061 |
The probit model estimates are close to the true value, and the rejection rate of the true null hypothesis is close to 5%. For the linear probability model, the rejection rate is 100% for the AME. For the ATE, the rejection rate and point estimates are close to what is estimated using a probit.
For the MEM and TEM, we have the following:
Table 4: Marginal and Treatment Effects at Mean Values: True DGP Probit
Statistic | Approximate True Value | Probit | Regress (LPM) |
---|---|---|---|
MEM of x1 | -.121 | -.122 | -.121 |
5% Rejection Rate | .063 | .054 | |
TEM of x2 | .150 | .150 | .110 |
5% Rejection Rate | .059 | .158 |
For the MEM, the probit and linear probability model produce reliable inference. For the TEM, the probit marginal effects behave as expected, but the linear probability model has a rejection rate of 16%, and the point estimates are not close to the true value.
Simulation design
Below is the code I used to generate the data for my simulations. In the first part, lines 6 to 13, I generate outcome variables that satisfy the assumptions of the logit model, y, and the probit model, yp. In the second part, lines 15 to 19, I compute the marginal effects for the logit and probit models. I have a continuous and a discrete covariate. For the discrete covariate, the marginal effect is a treatment effect. In the third part, lines 21 to 29, I compute the marginal effects evaluated at the means. I will use these estimates later to compute approximations to the true values of the effects.
program define mkdata syntax, [n(integer 1000)] clear quietly set obs `n' // 1. Generating data from probit, logit, and misspecified generate x1 = rchi2(2)-2 generate x2 = rbeta(4,2)>.2 generate u = runiform() generate e = ln(u) -ln(1-u) generate ep = rnormal() generate xb = .5*(1 - x1 + x2) generate y = xb + e > 0 generate yp = xb + ep > 0 // 2. Computing probit & logit marginal and treatment effects generate m1 = exp(xb)*(-.5)/(1+exp(xb))^2 generate m2 = exp(1 -.5*x1)/(1+ exp(1 -.5*x1 )) - /// exp(.5 -.5*x1)/(1+ exp(.5 -.5*x1 )) generate m1p = normalden(xb)*(-.5) generate m2p = normal(1 -.5*x1 ) - normal(.5 -.5*x1) // 3. Computing marginal and treatment effects at means quietly mean x1 x2 matrix A = r(table) scalar a = .5 -.5*A[1,1] + .5*A[1,2] scalar b1 = 1 -.5*A[1,1] scalar b0 = .5 -.5*A[1,1] generate mean1 = exp(a)*(-.5)/(1+exp(a))^2 generate mean2 = exp(b1)/(1+ exp(b1)) - exp(b0)/(1+ exp(b0)) generate mean1p = normalden(a)*(-.5) generate mean2p = normal(b1) - normal(b0) end
I approximate the true marginal effects using a sample of 20 million observations. This is a reasonable strategy in this case. For example, take the average marginal effect for a continuous covariate, \(x_{k}\), in the case of the probit model:
\[\begin{equation*}
\frac{1}{N}\sum_{i=1}^N \phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}
\end{equation*}\]
The expression above is an approximation of \(E\left(\phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}\right)\). To obtain this expected value, we would need to integrate over the distribution of all the covariates. This is not practical and would limit my choice of covariates. Instead, I draw a sample of 20 million observations, compute \(\frac{1}{N}\sum_{i=1}^N \phi\left(x_{i}\mathbb{\beta}\right)\beta_{k}\), and take it to be the true value. I follow the same logic for the other marginal effects.
Below is the code I use to compute the approximate true marginal effects. I draw the 20 million observations, compute the averages that I wil use in my simulation, and create locals for each approximate true value.
. mkdata, n(`L') (2 missing values generated) . local values "m1 m2 mean1 mean2 m1p m2p mean1p mean2p" . local means "mx1 mx2 meanx1 meanx2 mx1p mx2p meanx1p meanx2p" . local n : word count `values' . . forvalues i= 1/`n' { 2. local a: word `i' of `values' 3. local b: word `i' of `means' 4. sum `a', meanonly 5. local `b' = r(mean) 6. }
Now, I am ready to run all the simulations that I used to produce the results in the previous section. The code that I used for the simulations for the TEM and the MEM when the true DGP is a logit is given by:
. postfile lpm y1l y1l_r y1lp y1lp_r y2l y2l_r y2lp y2lp_r /// > using simslpm, replace . forvalues i=1/`R' { 2. quietly { 3. mkdata, n(`N') 4. logit y x1 i.x2, vce(robust) 5. margins, dydx(*) atmeans post vce(unconditional) 6. local y1l = _b[x1] 7. test _b[x1] = `meanx1' 8. local y1l_r = (r(p)<.05) 9. local y2l = _b[1.x2] 10. test _b[1.x2] = `meanx2' 11. local y2l_r = (r(p)<.05) 12. regress y x1 i.x2, vce(robust) 13. margins, dydx(*) atmeans post vce(unconditional) 14. local y1lp = _b[x1] 15. test _b[x1] = `meanx1' 16. local y1lp_r = (r(p)<.05) 17. local y2lp = _b[1.x2] 18. test _b[1.x2] = `meanx2' 19. local y2lp_r = (r(p)<.05) 20. post lpm (`y1l') (`y1l_r') (`y1lp') (`y1lp_r') /// > (`y2l') (`y2l_r') (`y2lp') (`y2lp_r') 21. } 22. } . postclose lpm . use simslpm, clear . sum Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- y1l | 5,000 -.0985646 .00288 -.1083639 -.0889075 y1l_r | 5,000 .0544 .226828 0 1 y1lp | 5,000 -.0939211 .0020038 -.1008612 -.0868043 y1lp_r | 5,000 .6182 .4858765 0 1 y2l | 5,000 .1084959 .065586 -.1065291 .3743112 -------------+--------------------------------------------------------- y2l_r | 5,000 .0618 .240816 0 1 y2lp | 5,000 .0915894 .055462 -.0975456 .3184061 y2lp_r | 5,000 .0732 .2604906 0 1
For the results for the AME and the ATE when the true DGP is a logit, I use margins without the atmeans option. The other cases are similar. I use robust standard errors for all computations because my likelihood model is an approximation to the true likelihood, and I use the option vce(unconditional) to account for the fact that I am using two-step M-estimation. See Wooldridge (2010) for more details on two-step M-estimation.
You can obtain the code used to produce these results here.
Conclusion
Using a probit or a logit model yields equivalent marginal effects. I provide evidence that the same cannot be said of the marginal effect estimates of the linear probability model when compared with those of the logit and probit models.
Acknowledgment
This post was inspired by a question posed by Stephen Jenkins after my previous post.
Reference
Wooldridge, J. M. 2010. Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, Massachusetts: MIT Press.