## Flexible discrete choice modeling using a multinomial probit model, part 2

**Overview**

In the first part of this post, I discussed the multinomial probit model from a random utility model perspective. In this part, we will have a closer look at how to interpret our estimation results.

**How do we interpret our estimation results?**

We created a fictitious dataset of individuals who were presented a set of three health insurance plans (**Sickmaster**, **Allgood**, and **Cowboy Health**). We pretended to have a random sample of 20- to 60-year-old persons who were asked which plan they would choose if they had to enroll in one of them. We expected a person’s utility related to each of the three alternatives to be a function of both personal characteristics (household income and age) and characteristics of the insurance plan (insurance price). We used Stata’s **asmprobit** command to fit our model, and these were the results:

. asmprobit choice price, case(id) alternatives(alt) casevars(hhinc age) > basealternative(1) scalealternative(2) nolog Alternative-specific multinomial probit Number of obs = 60,000 Case variable: id Number of cases = 20,000 Alternative variable: alt Alts per case: min = 3 avg = 3.0 max = 3 Integration sequence: Hammersley Integration points: 150 Wald chi2(5) = 4577.15 Log simulated-likelihood = -11219.181 Prob > chi2 = 0.0000 ---------------------------------------------------------------------------- choice | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+-------------------------------------------------------------- alt | price | -.4896106 .0523626 -9.35 0.000 -.5922394 -.3869818 -------------+-------------------------------------------------------------- Sickmaster | (base alternative) -------------+-------------------------------------------------------------- Allgood | hhinc | -.5006212 .0302981 -16.52 0.000 -.5600043 -.441238 age | 2.001367 .0306663 65.26 0.000 1.941262 2.061472 _cons | -4.980841 .1968765 -25.30 0.000 -5.366711 -4.59497 -------------+-------------------------------------------------------------- Cowboy_Hea~h | hhinc | -1.991202 .1092118 -18.23 0.000 -2.205253 -1.77715 age | 1.494056 .0446662 33.45 0.000 1.406512 1.581601 _cons | 3.038869 .4066901 7.47 0.000 2.241771 3.835967 -------------+-------------------------------------------------------------- /lnl2_2 | .5550228 .0742726 7.47 0.000 .4094512 .7005944 -------------+-------------------------------------------------------------- /l2_1 | .667308 .1175286 5.68 0.000 .4369562 .8976598 ---------------------------------------------------------------------------- (alt=Sickmaster is the alternative normalizing location) (alt=Allgood is the alternative normalizing scale)

And this was our estimated variance–covariance matrix of error differences:

. estat covariance +-------------------------------------+ | | Allgood Cowboy_~h | |--------------+----------------------| | Allgood | 2 | | Cowboy_Hea~h | .943716 3.479797 | +-------------------------------------+ Note: Covariances are for alternatives differenced with Sickmaster.

Although these parameters determine the effects of interest, the nonlinear mapping from parameters to effects means that the parameters themselves are difficult to interpret. The normalized covariance matrix provides little substantial information because of the error differencing. The coefficients do not convey much information either, and they arbitrarily depend on the set scale. For example, if we used the third alternative instead of the second for setting the scale, we would get different parameter estimates simply because of the different scaling. To get something more informative, we will focus on estimating response probabilities and marginal effects.

**Predicted probabilities**

Let’s focus on response probabilities first. After fitting our model, we predict the probability that the \(i\)th individual chooses alternative \(j\). That is, for each individual, we will have a probability related to each alternative. Let’s take a look at this:

. predict double pr (option pr assumed; Pr(alt)) . list id alt choice pr in 1/9, sepby(id) +-----------------------------------------+ | id alt choice pr | |-----------------------------------------| 1. | 1 Sickmaster 1 .62054511 | 2. | 1 Allgood 0 .01856341 | 3. | 1 Cowboy Health 0 .36088805 | |-----------------------------------------| 4. | 2 Sickmaster 0 .01680147 | 5. | 2 Allgood 1 .39319731 | 6. | 2 Cowboy Health 0 .5899949 | |-----------------------------------------| 7. | 3 Sickmaster 0 .07440388 | 8. | 3 Allgood 0 .02010558 | 9. | 3 Cowboy Health 1 .90549014 | +-----------------------------------------+

Looking at the first individual (**id==1**), we predict that this person has a 62% chance of choosing **Sickmaster**, a 2% chance of choosing **Allgood**, and a 36% chance of choosing **Cowboy Health**. If we were doing a classification based on the most likely choice, we would find that this person is correctly classified because he or she actually chose **Sickmaster**. If we average these probabilities over individuals for each alternative, we obtain the unconditional mean probabilities for choosing each alternative, and we will notice that these averages reflect our marginal distribution of cases across alternatives:

. predict double pr (option pr assumed; Pr(alt)) . bysort alt : summarize pr ------------------------------------------------------------------------------- -> alt = Sickmaster Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pr | 20,000 .3158523 .3549359 5.70e-14 .9999658 ------------------------------------------------------------------------------- -> alt = Allgood Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pr | 20,000 .4155579 .3305044 .0000342 .9972706 ------------------------------------------------------------------------------- -> alt = Cowboy Health Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pr | 20,000 .2685856 .2892927 1.62e-14 .9998705

Now, we typically would like to summarize the probabilities in a way that allows us to learn something about how the covariates affect choice probabilities. We begin by estimating the choice probabilities for an average person in the population. In this case, an average person could be defined as one of average age with average income and who was offered average prices per plan. If we had a special interest in the effect of age, we could use several evaluation points for **age**. In the example below, we predict the probabilities at the sample mean of age and for 60-year-olds, holding household income at its sample mean and setting prices to their alternative specific means:

. preserve . collapse (mean) age hhinc price, by(alt) . generate id=1 . quietly expand 2 . quietly replace id = 2 in 4/6 . quietly replace age = 6 if id == 2 . predictnl pr_at = predict(pr), ci(ci95_lo ci95_hi) force note: confidence intervals calculated using Z critical values . format %5.3f age hhinc price price pr_at ci95_lo ci95_hi . list, sepby(id) +---------------------------------------------------------------------+ | alt age hhinc price id pr_at ci95_lo ci95_hi | |---------------------------------------------------------------------| 1. | Sickmaster 3.995 4.982 2.000 1 0.195 0.185 0.206 | 2. | Allgood 3.995 4.982 1.249 1 0.586 0.574 0.599 | 3. | Cowboy Health 3.995 4.982 0.751 1 0.218 0.208 0.229 | |---------------------------------------------------------------------| 4. | Sickmaster 6.000 4.982 2.000 2 0.000 0.000 0.000 | 5. | Allgood 6.000 4.982 1.249 2 0.878 0.867 0.889 | 6. | Cowboy Health 6.000 4.982 0.751 2 0.122 0.111 0.133 | +---------------------------------------------------------------------+ . restore

Using **collapse** results in a new dataset that has only three observations, one for each alternative. Prior to using **expand**, the variables **age** and **income** contain the sample means, and the variable price stores the alternative-specific average prices. By using **expand 2**, we tell Stata to duplicate each of the three observations in the dataset, and then we replace **age** with the value 6 (to specify 60 years of age) in the newly added set of observations. The variable **id** now identifies our prediction scenario, and we use **preserve** and **restore** to not mess up our dataset. Also, instead of **predict**, we use **predictnl** here because this will allow us to estimate confidence intervals for the predicted probabilities. The newly created variable **pr_at** stores the predicted probabilities: for our average person, we predict a 20% chance of choosing *Sickmaster*, a 59% chance of choosing *Allgood*, and a 22% chance of choosing **Cowboy Health**. If we look at the second set of predictions (**id==2**), we see that the chance of choosing **Allgood** increases with age, at least when holding household income and prices at their means. Consequently, the chances of choosing **Sickmaster** and **Cowboy Health** decrease, but the chances of choosing **Sickmaster** decrease more drastically: we would not really expect anyone at age 60 with average household income to choose **Sickmaster** when offered average insurance prices.

**Marginal effects**

While looking at the predicted probabilities in this way can be useful, we are often interested in estimating the expected *change* in probability per unit change in a predictor variable, which we approximate by marginal effects. Marginal effects are the first derivatives of the predicted probabilities with respect to both alternative- and case-specific covariates. Let’s look at our case-specific variable **age** first. We start by evaluating the marginal effects of age at the means of the covariates, including age. Here we use the postestimation command **estat mfx**:

. estat mfx, varlist(age) Equation Name Alternative -------------------------------------------------- Sickmaster Sickmaster Allgood Allgood Cowboy_Health Cowboy Health Pr(choice = Sickmaster) = .195219 ---------------------------------------------------------------------------- variable | dp/dx Std. Err. z P>|z| [ 95% C.I. ] X -------------+-------------------------------------------------------------- casevars | age | -.378961 .00669 -56.64 0.000 -.392073 -.365848 3.9953 ---------------------------------------------------------------------------- Pr(choice = Allgood) = .5864454 ---------------------------------------------------------------------------- variable | dp/dx Std. Err. z P>|z| [ 95% C.I. ] X -------------+-------------------------------------------------------------- casevars | age | .363996 .006009 60.58 0.000 .352218 .375773 3.9953 ----------------------------------------------------------------------------- Pr(choice = Cowboy Health) = .21831866 ---------------------------------------------------------------------------- variable | dp/dx Std. Err. z P>|z| [ 95% C.I. ] X -------------+-------------------------------------------------------------- casevars | age | .015001 .004654 3.22 0.001 .00588 .024123 3.9953 ----------------------------------------------------------------------------

Inspecting the above output, we see that we estimated a marginal effect for each alternative. If we increase the age of our average person by 10 years (which corresponds to one unit in **age**), we expect the chance of choosing **Sickmaster** to decrease by 38 percentage points and the chance of choosing **Allgood** to increase by 36 percentage points. We observe no substantial change in the probability of choosing **Cowboy Health**.

For illustrative purposes, and to better understand the quantities that we are estimating here, let’s look at a manual calculation of these effects:

. preserve . * Sample and alternative-specific means: . collapse (mean) age hhinc price, by(alt) . generate id=1 . * Computing numerical derivative of the predicted . * probability with respect to -age-: . scalar h = 1e-5 . clonevar age_clone = age . qui replace age = age_clone + h . qui predict double pr_ph . qui replace age = age_clone - h . qui predict double pr_mh . qui generate dpdx = (pr_ph-pr_mh)/(2*h) . * Results: . list alt dpdx in 1/3, sepby(id) +---------------------------+ | alt dpdx | |---------------------------| 1. | Sickmaster -.3789608 | 2. | Allgood .3639956 | 3. | Cowboy Health .0150014 | +---------------------------+ . restore

In the above piece of code, we first set the case-specific variables to their sample means and **price** to its alternative-specific means, again by using **collapse**. We then calculate the numerical derivative of the predicted probability with respect to age. We do this by evaluating our prediction function twice: one time, we add a small amount to the mean of age, and the other time, we subtract the same amount prior to using **predict**. In other words, we predict the probabilities at two points right around our point of interest and then divide the difference between these two predictions by the difference between the two evaluation points. This gives us an approximation of the derivative at the point right in the middle, in this case the mean of age. We see that our manual calculation of the marginal effects matches the **estat mfx** results.

Finally, let’s look at our alternative-specific variable **price**. For this variable, we can estimate the expected change in the probability that the \(i\)th case chooses the \(j\)th alternative with respect to *each* of the alternative-specific variables. This means that in our example, we can estimate \(3 \times 3\) marginal effects for **price**. That is, we can estimate the marginal effect of **Sickmaster** prices on the probability of choosing **Sickmaster**, **Allgood**, and **Cowboy Health**, the effect of **Allgood** prices on the probability of choosing **Sickmaster**, **Allgood**, and **Cowboy Health**, and so on. Let’s do this for the effect of the **Sickmaster** price on the probability of choosing **Sickmaster**, **Allgood**, and **Cowboy Health**. Again we use **estat mfx** first:

. estat mfx, varlist(price) Equation Name Alternative -------------------------------------------------- Sickmaster Sickmaster Allgood Allgood Cowboy_Health Cowboy Health Pr(choice = Sickmaster) = .195219 ---------------------------------------------------------------------------- variable | dp/dx Std. Err. z P>|z| [ 95% C.I. ] X -------------+-------------------------------------------------------------- price | Sickmaster | -.098769 .010944 -9.02 0.000 -.12022 -.077318 1.9999 Allgood | .074859 .008579 8.73 0.000 .058044 .091673 1.2493 Cowboy_Hea~h | .02391 .003151 7.59 0.000 .017734 .030087 .75072 ---------------------------------------------------------------------------- Pr(choice = Allgood) = .5864454 ---------------------------------------------------------------------------- variable | dp/dx Std. Err. z P>|z| [ 95% C.I. ] X -------------+-------------------------------------------------------------- price | Sickmaster | .07487 .00858 8.73 0.000 .058053 .091687 1.9999 Allgood | -.130799 .013278 -9.85 0.000 -.156823 -.104774 1.2493 Cowboy_Hea~h | .055928 .006829 8.19 0.000 .042543 .069314 .75072 ---------------------------------------------------------------------------- Pr(choice = Cowboy Health) = .21831866 ----------------------------------------------------------------------------- variable | dp/dx Std. Err. z P>|z| [ 95% C.I. ] X -------------+--------------------------------------------------------------- price | Sickmaster | .023907 .003151 7.59 0.000 .017731 .030083 1.9999 Allgood | .05593 .00683 8.19 0.000 .042544 .069315 1.2493 Cowboy_Hea~h | -.079837 .008946 -8.92 0.000 -.09737 -.062303 .75072 -----------------------------------------------------------------------------

Inspecting the output, we observe a reduced chance of choosing **Sickmaster** by 10 percentage points per one-unit increase (here units are in $100/month) in the **Sickmaster** price. The result would appear to be reasonable because price typically has a negative effect on utility. The effects for **Sickmaster** price on the probability of choosing one of the other plans are both positive, which means that one of the other plans is chosen more likely with increasing **Sickmaster** prices. Also, because the effect of the **Sickmaster** price is stronger for **Allgood**, we could conclude that the average person would be more likely to choose **Allgood** over **Cowboy Health** if prices of **Sickmaster** go up. Again we replicate these results by performing some manual calculations:

. preserve . * Sample and alternative-specific means: . collapse (mean) age hhinc price, by(alt) . generate id=1 . * Derivative . scalar h = 1e-5 . clonevar price_clone = price . qui replace price = price_clone + h if alt==1 . qui predict double pr_ph . qui replace price = price_clone - h if alt==1 . qui predict double pr_mh . gen dpdx = (pr_ph-pr_mh)/(2*h) . * Results . list alt dpdx in 1/3, sepby(id) +--------------------------+ | alt dpdx | |--------------------------| 1. | Sickmaster -.098769 | 2. | Allgood .0748703 | 3. | Cowboy Health .0239071 | +--------------------------+ . . restore

Notice that our manual calculations correspond to the **Sickmaster** effects shown at the top of each of the three table panels from the **estat mfx** output. The effects shown in the first panel are actually similar, but they have a different interpretation: the estimates for **Allgood** and **Cowboy Health** in this panel are the effects on the probability of choosing **Sickmaster** per unit increase in **Allgood** and **Cowboy Health** prices, respectively.

**Conclusion**

In this post, I showed how we can interpret the results of the multinomial probit model using predicted probabilities and marginal effects. We used a model with flexible covariance structure to allow for unequal variances, correlation across alternatives, and alternative-specific variables in a discrete choice setting. While we employed the most general covariance structure in our example, one needs to keep in mind that this is not always the most appropriate one. Stata’s **asmprobit** allows for fully customizable structures, and researchers are well advised to carefully consider which structure to impose.