Home > Statistics > Flexible discrete choice modeling using a multinomial probit model, part 2

Flexible discrete choice modeling using a multinomial probit model, part 2

Overview

In the first part of this post, I discussed the multinomial probit model from a random utility model perspective. In this part, we will have a closer look at how to interpret our estimation results.

How do we interpret our estimation results?

We created a fictitious dataset of individuals who were presented a set of three health insurance plans (Sickmaster, Allgood, and Cowboy Health). We pretended to have a random sample of 20- to 60-year-old persons who were asked which plan they would choose if they had to enroll in one of them. We expected a person’s utility related to each of the three alternatives to be a function of both personal characteristics (household income and age) and characteristics of the insurance plan (insurance price). We used Stata’s asmprobit command to fit our model, and these were the results:

. asmprobit choice price, case(id) alternatives(alt) casevars(hhinc age)
> basealternative(1) scalealternative(2) nolog

Alternative-specific multinomial probit      Number of obs      =     60,000
Case variable: id                            Number of cases    =     20,000

Alternative variable: alt                    Alts per case: min =          3
                                                            avg =        3.0
                                                            max =          3
Integration sequence:      Hammersley
Integration points:               150           Wald chi2(5)    =    4577.15
Log simulated-likelihood = -11219.181           Prob > chi2     =     0.0000

----------------------------------------------------------------------------
      choice |      Coef.   Std. Err.     z   P>|z|     [95% Conf. Interval]
-------------+--------------------------------------------------------------
alt          |
       price |  -.4896106   .0523626   -9.35  0.000    -.5922394   -.3869818
-------------+--------------------------------------------------------------
Sickmaster   |  (base alternative)
-------------+--------------------------------------------------------------
Allgood      |
       hhinc |  -.5006212   .0302981  -16.52  0.000    -.5600043    -.441238
         age |   2.001367   .0306663   65.26  0.000     1.941262    2.061472
       _cons |  -4.980841   .1968765  -25.30  0.000    -5.366711    -4.59497
-------------+--------------------------------------------------------------
Cowboy_Hea~h |
       hhinc |  -1.991202   .1092118  -18.23  0.000    -2.205253    -1.77715
         age |   1.494056   .0446662   33.45  0.000     1.406512    1.581601
       _cons |   3.038869   .4066901    7.47  0.000     2.241771    3.835967
-------------+--------------------------------------------------------------
     /lnl2_2 |   .5550228   .0742726    7.47  0.000     .4094512    .7005944
-------------+--------------------------------------------------------------
       /l2_1 |    .667308   .1175286    5.68  0.000     .4369562    .8976598
----------------------------------------------------------------------------
(alt=Sickmaster is the alternative normalizing location)
(alt=Allgood is the alternative normalizing scale)

And this was our estimated variance–covariance matrix of error differences:

. estat covariance

  +-------------------------------------+
  |              |   Allgood  Cowboy_~h |
  |--------------+----------------------|
  |      Allgood |         2            |
  | Cowboy_Hea~h |   .943716   3.479797 |
  +-------------------------------------+
Note: Covariances are for alternatives differenced with Sickmaster.

Although these parameters determine the effects of interest, the nonlinear mapping from parameters to effects means that the parameters themselves are difficult to interpret. The normalized covariance matrix provides little substantial information because of the error differencing. The coefficients do not convey much information either, and they arbitrarily depend on the set scale. For example, if we used the third alternative instead of the second for setting the scale, we would get different parameter estimates simply because of the different scaling. To get something more informative, we will focus on estimating response probabilities and marginal effects.

Predicted probabilities

Let’s focus on response probabilities first. After fitting our model, we predict the probability that the \(i\)th individual chooses alternative \(j\). That is, for each individual, we will have a probability related to each alternative. Let’s take a look at this:

. predict double pr
(option pr assumed; Pr(alt))

. list id alt choice pr in 1/9, sepby(id)

     +-----------------------------------------+
     | id             alt   choice          pr |
     |-----------------------------------------|
  1. |  1      Sickmaster        1   .62054511 |
  2. |  1         Allgood        0   .01856341 |
  3. |  1   Cowboy Health        0   .36088805 |
     |-----------------------------------------|
  4. |  2      Sickmaster        0   .01680147 |
  5. |  2         Allgood        1   .39319731 |
  6. |  2   Cowboy Health        0    .5899949 |
     |-----------------------------------------|
  7. |  3      Sickmaster        0   .07440388 |
  8. |  3         Allgood        0   .02010558 |
  9. |  3   Cowboy Health        1   .90549014 |
     +-----------------------------------------+

Looking at the first individual (id==1), we predict that this person has a 62% chance of choosing Sickmaster, a 2% chance of choosing Allgood, and a 36% chance of choosing Cowboy Health. If we were doing a classification based on the most likely choice, we would find that this person is correctly classified because he or she actually chose Sickmaster. If we average these probabilities over individuals for each alternative, we obtain the unconditional mean probabilities for choosing each alternative, and we will notice that these averages reflect our marginal distribution of cases across alternatives:

. predict double pr
(option pr assumed; Pr(alt))

. bysort alt : summarize pr

-------------------------------------------------------------------------------
-> alt = Sickmaster

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          pr |     20,000    .3158523    .3549359   5.70e-14   .9999658

-------------------------------------------------------------------------------
-> alt = Allgood

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          pr |     20,000    .4155579    .3305044   .0000342   .9972706

-------------------------------------------------------------------------------
-> alt = Cowboy Health

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          pr |     20,000    .2685856    .2892927   1.62e-14   .9998705

Now, we typically would like to summarize the probabilities in a way that allows us to learn something about how the covariates affect choice probabilities. We begin by estimating the choice probabilities for an average person in the population. In this case, an average person could be defined as one of average age with average income and who was offered average prices per plan. If we had a special interest in the effect of age, we could use several evaluation points for age. In the example below, we predict the probabilities at the sample mean of age and for 60-year-olds, holding household income at its sample mean and setting prices to their alternative specific means:

. preserve

. collapse (mean) age hhinc price, by(alt)

. generate id=1

. quietly expand 2

. quietly replace id = 2 in 4/6

. quietly replace age = 6 if id == 2

. predictnl pr_at = predict(pr), ci(ci95_lo ci95_hi) force
note: confidence intervals calculated using Z critical values

. format %5.3f age hhinc price price pr_at ci95_lo ci95_hi

. list, sepby(id)

     +---------------------------------------------------------------------+
     |           alt    age   hhinc   price   id   pr_at  ci95_lo  ci95_hi |
     |---------------------------------------------------------------------|
  1. |    Sickmaster  3.995   4.982   2.000    1   0.195    0.185    0.206 |
  2. |       Allgood  3.995   4.982   1.249    1   0.586    0.574    0.599 |
  3. | Cowboy Health  3.995   4.982   0.751    1   0.218    0.208    0.229 |
     |---------------------------------------------------------------------|
  4. |    Sickmaster  6.000   4.982   2.000    2   0.000    0.000    0.000 |
  5. |       Allgood  6.000   4.982   1.249    2   0.878    0.867    0.889 |
  6. | Cowboy Health  6.000   4.982   0.751    2   0.122    0.111    0.133 |
     +---------------------------------------------------------------------+

. restore

Using collapse results in a new dataset that has only three observations, one for each alternative. Prior to using expand, the variables age and income contain the sample means, and the variable price stores the alternative-specific average prices. By using expand 2, we tell Stata to duplicate each of the three observations in the dataset, and then we replace age with the value 6 (to specify 60 years of age) in the newly added set of observations. The variable id now identifies our prediction scenario, and we use preserve and restore to not mess up our dataset. Also, instead of predict, we use predictnl here because this will allow us to estimate confidence intervals for the predicted probabilities. The newly created variable pr_at stores the predicted probabilities: for our average person, we predict a 20% chance of choosing Sickmaster, a 59% chance of choosing Allgood, and a 22% chance of choosing Cowboy Health. If we look at the second set of predictions (id==2), we see that the chance of choosing Allgood increases with age, at least when holding household income and prices at their means. Consequently, the chances of choosing Sickmaster and Cowboy Health decrease, but the chances of choosing Sickmaster decrease more drastically: we would not really expect anyone at age 60 with average household income to choose Sickmaster when offered average insurance prices.

Marginal effects

While looking at the predicted probabilities in this way can be useful, we are often interested in estimating the expected change in probability per unit change in a predictor variable, which we approximate by marginal effects. Marginal effects are the first derivatives of the predicted probabilities with respect to both alternative- and case-specific covariates. Let’s look at our case-specific variable age first. We start by evaluating the marginal effects of age at the means of the covariates, including age. Here we use the postestimation command estat mfx:

. estat mfx, varlist(age)

Equation Name           Alternative
--------------------------------------------------
Sickmaster              Sickmaster
Allgood                 Allgood
Cowboy_Health           Cowboy Health


Pr(choice = Sickmaster) =   .195219
----------------------------------------------------------------------------
variable     |   dp/dx  Std. Err.   z     P>|z|  [    95% C.I.    ]      X
-------------+--------------------------------------------------------------
casevars     |
         age | -.378961   .00669 -56.64   0.000  -.392073  -.365848   3.9953
----------------------------------------------------------------------------

Pr(choice = Allgood) =  .5864454
----------------------------------------------------------------------------
variable     |   dp/dx  Std. Err.   z     P>|z|  [    95% C.I.    ]      X
-------------+--------------------------------------------------------------
casevars     |
         age |  .363996  .006009  60.58   0.000   .352218   .375773   3.9953
-----------------------------------------------------------------------------

Pr(choice = Cowboy Health) = .21831866
----------------------------------------------------------------------------
variable     |   dp/dx  Std. Err.   z     P>|z|  [    95% C.I.    ]      X
-------------+--------------------------------------------------------------
casevars     |
         age |  .015001  .004654   3.22   0.001    .00588   .024123   3.9953
----------------------------------------------------------------------------

Inspecting the above output, we see that we estimated a marginal effect for each alternative. If we increase the age of our average person by 10 years (which corresponds to one unit in age), we expect the chance of choosing Sickmaster to decrease by 38 percentage points and the chance of choosing Allgood to increase by 36 percentage points. We observe no substantial change in the probability of choosing Cowboy Health.

For illustrative purposes, and to better understand the quantities that we are estimating here, let’s look at a manual calculation of these effects:

. preserve

. * Sample and alternative-specific means:
. collapse (mean) age hhinc price, by(alt)

. generate id=1

. * Computing numerical derivative of the predicted
. * probability with respect to -age-:
. scalar h = 1e-5

. clonevar age_clone = age

. qui replace age = age_clone + h

. qui predict double pr_ph

. qui replace age = age_clone - h

. qui predict double pr_mh

. qui generate dpdx = (pr_ph-pr_mh)/(2*h)

. * Results:
. list alt dpdx in 1/3, sepby(id)

     +---------------------------+
     |           alt        dpdx |
     |---------------------------|
  1. |    Sickmaster   -.3789608 |
  2. |       Allgood    .3639956 |
  3. | Cowboy Health    .0150014 |
     +---------------------------+

. restore

In the above piece of code, we first set the case-specific variables to their sample means and price to its alternative-specific means, again by using collapse. We then calculate the numerical derivative of the predicted probability with respect to age. We do this by evaluating our prediction function twice: one time, we add a small amount to the mean of age, and the other time, we subtract the same amount prior to using predict. In other words, we predict the probabilities at two points right around our point of interest and then divide the difference between these two predictions by the difference between the two evaluation points. This gives us an approximation of the derivative at the point right in the middle, in this case the mean of age. We see that our manual calculation of the marginal effects matches the estat mfx results.

Finally, let’s look at our alternative-specific variable price. For this variable, we can estimate the expected change in the probability that the \(i\)th case chooses the \(j\)th alternative with respect to each of the alternative-specific variables. This means that in our example, we can estimate \(3 \times 3\) marginal effects for price. That is, we can estimate the marginal effect of Sickmaster prices on the probability of choosing Sickmaster, Allgood, and Cowboy Health, the effect of Allgood prices on the probability of choosing Sickmaster, Allgood, and Cowboy Health, and so on. Let’s do this for the effect of the Sickmaster price on the probability of choosing Sickmaster, Allgood, and Cowboy Health. Again we use estat mfx first:

. estat mfx, varlist(price)

Equation Name           Alternative
--------------------------------------------------
Sickmaster              Sickmaster
Allgood                 Allgood
Cowboy_Health           Cowboy Health


Pr(choice = Sickmaster) =   .195219
----------------------------------------------------------------------------
variable     |   dp/dx  Std. Err.   z     P>|z|  [    95% C.I.    ]      X
-------------+--------------------------------------------------------------
price        |
  Sickmaster | -.098769  .010944  -9.02   0.000   -.12022  -.077318   1.9999
     Allgood |  .074859  .008579   8.73   0.000   .058044   .091673   1.2493
Cowboy_Hea~h |   .02391  .003151   7.59   0.000   .017734   .030087   .75072
----------------------------------------------------------------------------

Pr(choice = Allgood) =  .5864454
----------------------------------------------------------------------------
variable     |   dp/dx  Std. Err.   z     P>|z|  [    95% C.I.    ]      X
-------------+--------------------------------------------------------------
price        |
  Sickmaster |   .07487   .00858   8.73   0.000   .058053   .091687   1.9999
     Allgood | -.130799  .013278  -9.85   0.000  -.156823  -.104774   1.2493
Cowboy_Hea~h |  .055928  .006829   8.19   0.000   .042543   .069314   .75072
----------------------------------------------------------------------------

Pr(choice = Cowboy Health) = .21831866
-----------------------------------------------------------------------------
variable     |   dp/dx  Std. Err.   z     P>|z|  [    95% C.I.    ]      X
-------------+---------------------------------------------------------------
price        |
  Sickmaster |  .023907  .003151   7.59   0.000   .017731   .030083   1.9999
     Allgood |   .05593   .00683   8.19   0.000   .042544   .069315   1.2493
Cowboy_Hea~h | -.079837  .008946  -8.92   0.000   -.09737  -.062303   .75072
-----------------------------------------------------------------------------

Inspecting the output, we observe a reduced chance of choosing Sickmaster by 10 percentage points per one-unit increase (here units are in $100/month) in the Sickmaster price. The result would appear to be reasonable because price typically has a negative effect on utility. The effects for Sickmaster price on the probability of choosing one of the other plans are both positive, which means that one of the other plans is chosen more likely with increasing Sickmaster prices. Also, because the effect of the Sickmaster price is stronger for Allgood, we could conclude that the average person would be more likely to choose Allgood over Cowboy Health if prices of Sickmaster go up. Again we replicate these results by performing some manual calculations:

. preserve

. * Sample and alternative-specific means:
. collapse (mean) age hhinc price, by(alt)

. generate id=1

. * Derivative
. scalar h = 1e-5

. clonevar price_clone = price

. qui replace price = price_clone + h if alt==1

. qui predict double pr_ph

. qui replace price = price_clone - h if alt==1

. qui predict double pr_mh

. gen dpdx = (pr_ph-pr_mh)/(2*h)

. * Results
. list alt dpdx in 1/3, sepby(id)

     +--------------------------+
     |           alt       dpdx |
     |--------------------------|
  1. |    Sickmaster   -.098769 |
  2. |       Allgood   .0748703 |
  3. | Cowboy Health   .0239071 |
     +--------------------------+

.
. restore

Notice that our manual calculations correspond to the Sickmaster effects shown at the top of each of the three table panels from the estat mfx output. The effects shown in the first panel are actually similar, but they have a different interpretation: the estimates for Allgood and Cowboy Health in this panel are the effects on the probability of choosing Sickmaster per unit increase in Allgood and Cowboy Health prices, respectively.

Conclusion

In this post, I showed how we can interpret the results of the multinomial probit model using predicted probabilities and marginal effects. We used a model with flexible covariance structure to allow for unequal variances, correlation across alternatives, and alternative-specific variables in a discrete choice setting. While we employed the most general covariance structure in our example, one needs to keep in mind that this is not always the most appropriate one. Stata’s asmprobit allows for fully customizable structures, and researchers are well advised to carefully consider which structure to impose.