Home > Statistics > Bayesian binary item response theory models using bayesmh

Bayesian binary item response theory models using bayesmh

This post was written jointly with Yulia Marchenko, Executive Director of Statistics, StataCorp.

Table of Contents

Overview
1PL model
2PL model
3PL model
4PL model
5PL model
Conclusion

Overview

Item response theory (IRT) is used for modeling the relationship between the latent abilities of a group of subjects and the examination items used for measuring their abilities. Stata 14 introduced a suite of commands for fitting IRT models using maximum likelihood; see, for example, the blog post Spotlight on irt by Rafal Raciborski and the [IRT] Item Response Theory manual for more details. In this post, we demonstrate how to fit Bayesian binary IRT models by using the redefine() option introduced for the bayesmh command in Stata 14.1. We also use the likelihood option dbernoulli() available as of the update on 03 Mar 2016 for fitting Bernoulli distribution. If you are not familiar with the concepts and jargon of Bayesian statistics, you may want to watch the introductory videos on the Stata Youtube channel before proceeding.

Introduction to Bayesian analysis, part 1 : The basic concepts
Introduction to Bayesian analysis, part 2: MCMC and the Metropolis-Hastings algorithm

We use the abridged version of the mathematics and science data from DeBoeck and Wilson (2004), masc1. The dataset includes 800 student responses to 9 test questions intended to measure mathematical ability.

The irt suite fits IRT models using data in the wide form – one observation per subject with items recorded in separate variables. To fit IRT models using bayesmh, we need data in the long form, where items are recorded as multiple observations per subject. We thus reshape the dataset in a long form: we have a single binary response variable, y, and two index variables, item and id, which identify the items and subjects, respectively. This allows us to formulate our IRT models as multilevel models. The following commands load and prepare the dataset.

. webuse masc1
(Data from De Boeck & Wilson (2004))

. generate id = _n

. quietly reshape long q, i(id) j(item)

. rename q y

To ensure that we include all levels of item and id in our models, we use fvset base none to keep the base categories.

. fvset base none id item

In what follows, we present eight Bayesian binary IRT models increasing in complexity and explanatory power. We perform Bayesian model comparison to gain insight into what would be the more appropriate model for the data at hand.

For high-dimensional models such as IRT models, you may see differences in the estimation results between different platforms or different flavors of Stata because of the nature of the Markov chain Monte Carlo (MCMC) sampling and finite numerical precision. These differences are not a source of concern; they will be within the range of the MCMC variability and will lead to similar inferential conclusions. The differences will diminish as the MCMC sample size increases. The results in this post are obtained from Stata/SE on the 64-bit Linux platform using the default 10,000 MCMC sample size.

Let the items be indexed by \(i=1,\dots,9\) and the subjects by \(j=1,\dots,800\). Let \(\theta_j\) be the latent mathematical ability of subject \(j\), and let \(Y_{ij}\) be the response of subject \(j\) to item \(i\).

Back to table of contents

1PL model

In the one-parameter logistic (1PL) model, the probability of getting a correct response is modeled as an inverse-logit function of location parameters \(b_i\), also called item difficulties, and a common slope parameter \(a\), also called item discrimination:

\[
P(Y_{ij}=1) = {\rm InvLogit}\{a(\theta_j-b_i)\} =
\frac{\exp\{a(\theta_j-b_i)\}}{1+\exp\{a(\theta_j-b_i)\}}
\]

Typically, the abilities are assumed to be normally distributed:
\[
\theta_j \sim {\rm N}(0,1)
\]
In a multilevel framework, the \(\theta_j\)’s represent random effects. In a Bayesian framework, we use the term “random effects” to refer to the parameters corresponding to levels of grouping variables identifying the hierarchy of the data.

A Bayesian formulation of the 1PL model also requires prior specification for the model parameters \(a\) and \(b_i\). The discrimination parameter \(a\) is assumed to be positive and is often modeled in the log scale. Because we have no prior knowledge about the discrimination and difficulty parameters, we assume that the prior distributions of \(\ln(a)\) and \(b_i\) have support on the whole real line, are symmetric, and are centered at 0. A normal prior distribution is thus a natural choice. We furthermore assume that \(\ln(a)\) and \(b_i\) are close to 0 and have prior variance of 1, which is an entirely subjective decision. We thus assign \(\ln(a)\) and \(b_i\) standard normal prior distributions:

\[\ln(a) \sim {\rm N}(0, 1)\] \[b_i \sim {\rm N}(0, 1) \]

To specify the likelihood function of the 1PL model in bayesmh, we use a nonlinear equation specification for the response variable y. The direct nonlinear specification for this model is

bayesmh y = ({discrim}*({subj:i.id}-{diff:i.item})), likelihood(logit) ...

where {discrim} is the discrimination parameter \(a\), {subj:i.id} are latent abilities \(\theta_j\), and {diff:i.item} are item difficulties \(b_i\). The logit model is used for the probability of a success, \(P(Y_{ij}=1)\). Specification {subj:i.id} in the above nonlinear expression is viewed as a substitutable expression for linear combinations of indicators associated with the id variable and parameters \(\theta_j\). This specification may be computationally prohibitive with a large number of subjects. A more efficient solution is to use the redefine() option to include subject random effects \(\theta_j\) in the model. The same argument may apply to the {diff:i.item} specification when there are many items. Thus, it may be computationally convenient to treat the \(b_i\) parameters as “random effects” in the specification and use the redefine() option to include them in the model.

A more efficient specification is thus

bayesmh y = ({discrim}*({subj:}-{diff:})), likelihood(logit) ///
               redefine(subj:i.id) redefine(diff:i.item) ... ///

where {subj:} and {diff:} in the nonlinear specification now represent the \(\theta_j\) and \(b_i\) parameters, respectively, without using expansions into linear combinations of indicator variables.

Below, we show the full bayesmh specification of the 1PL model and the output summary. In our examples, we treat the abilities {subj:i.id} as nuisance parameters and exclude them from the final results. The discrimination model parameter {discrim} must be positive and is thus initialized with 1. A longer burn-in period, burnin(5000), allows for longer adaptation of the MCMC sampler, which is needed given the large number of parameters in the model. Finally, the estimation results are stored for later model comparison.

. set seed 14

. bayesmh y = ({discrim}*({subj:}-{diff:})), likelihood(logit) ///
>         redefine(diff:i.item) redefine(subj:i.id)            ///
>         prior({subj:i.id},    normal(0, 1))                  ///
>         prior({discrim},      lognormal(0, 1))               ///
>         prior({diff:i.item},  normal(0, 1))                  ///
>         init({discrim} 1) exclude({subj:i.id})               ///
>         burnin(5000) saving(sim1pl, replace)
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ logit({discrim}*(xb_subj-xb_diff))

Priors: 
  {diff:i.item} ~ normal(0,1)                                              (1)
    {subj:i.id} ~ normal(0,1)                                              (2)
      {discrim} ~ lognormal(0,1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_diff.
(2) Parameters are elements of the linear form xb_subj.

Bayesian logistic regression                     MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3074
                                                 Efficiency:  min =     .02691
                                                              avg =     .06168
Log marginal likelihood =          .                          max =     .09527
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
diff         |
        item |
          1  | -.6934123   .0998543   .003576  -.6934789  -.8909473  -.4917364
          2  | -.1234553   .0917187   .002972  -.1241642  -.3030341   .0597863
          3  | -1.782762   .1323252    .00566  -1.781142   -2.05219  -1.534451
          4  |  .3152835   .0951978   .003289   .3154714   .1279147   .4981263
          5  |  1.622545    .127213   .005561   1.619388   1.377123   1.883083
          6  |  .6815517   .0978777   .003712   .6788345   .4911366    .881128
          7  |  1.303482   .1173994   .005021   1.302328   1.084295   1.544913
          8  | -2.353975   .1620307   .008062  -2.351207  -2.672983  -2.053112
          9  | -1.168668   .1120243   .004526  -1.163922  -1.392936  -.9549209
-------------+----------------------------------------------------------------
     discrim |  .8644787   .0439804   .002681   .8644331   .7818035   .9494433
------------------------------------------------------------------------------

file sim1pl.dta saved

. estimates store est1pl

The sampling efficiency is acceptable, about 6% on average, with no indication of convergence problems. Although detailed convergence inspection of all parameters is outside the scope of this post, we recommend that you do so by using, for example, the bayesgraph diagnostics command.

Though we used informative priors for the model parameters, the estimation results from our Bayesian model are not that different from the maximum likelihood estimates obtained using the irt 1pl command (see example 1 in [IRT] irt 1pl). For example, the posterior mean estimate for {discrim} is 0.86 with an MCMC standard error of 0.003, whereas irt 1pl reports 0.85 with a standard error of 0.05.

The log-marginal likelihood is reported missing because we have excluded the {subj:i.id} parameters from the simulation results and the Laplace-Metropolis estimator of the log-marginal likelihood is not available in such cases. This estimator requires simulation results for all model parameters to compute the log-marginal likelihood.

Back to table of contents

2PL model

The two-parameter logistic (2PL) model extends the 1PL model by allowing for item-specific discrimination. The probability of correct response is now modeled as a function of item-specific slope parameters \(a_i\):
\[
P(Y_{ij}=1) = {\rm InvLogit}\{a_i(\theta_j-b_i)\} =
\frac{\exp\{a_i(\theta_j-b_i)\}}{1+\exp\{a_i(\theta_j-b_i)\}}
\]

The prior specification for \(\theta_j\) stays the same as in the 1PL model. We will, however, apply more elaborate prior specifications for the \(a_i\)’s and \(b_i\)’s. It is a good practice to use proper prior specifications without overwhelming the evidence from the data. The impact of the priors can be controlled by introducing additional hyperparameters. For example, Kim and Bolt (2007) proposed the use of a normal prior for the difficulty parameters with unknown mean and variance. Extending this approach to the discrimination parameters as well, we apply a hierarchical Bayesian model in which the \(\ln(a_i)\) and \(b_i\) parameters have the following prior specifications:

\[ \ln(a_i) \sim {\rm N}(\mu_a, \sigma_a^2) \] \[ b_i \sim {\rm N}(\mu_b, \sigma_b^2) \]

The mean hyperparameters, \(\mu_a\) and \(\mu_b\), and variance hyperparameters, \(\sigma_a^2\) and \(\sigma_b^2\), require informative prior specifications. We assume that the means are centered at 0 with a variation of 0.1:
\[
\mu_a, \mu_b \sim {\rm N}(0, 0.1)
\]

To lower the variability of the \(\ln(a_i)\) and \(b_i\) parameters, we apply an inverse-gamma prior with shape 10 and scale 1 for the variance parameters:

\[
\sigma_a^2, \sigma_b^2 \sim {\rm InvGamma}(10, 1)
\]

Thus, the prior mean of \(\sigma_a^2\) and \(\sigma_b^2\) is about 0.1.

In the bayesmh specification, the hyperparameters \(\mu_a\), \(\mu_b\), \(\sigma_a^2\), and \(\sigma_a^2\) are denoted as {mu_a}, {mu_b}, {var_a}, and {var_b}, respectively. We use the redefine(discrim:i.item) option to include in the model the discrimination parameters \(a_i\), referred to as {discrim:} in the likelihood specification.

Regarding the MCMC simulation, we change some of the default options. The hyperparameters {mu_a}, {mu_b}, {var_a}, and {var_b} are placed in separate blocks to improve the simulation efficiency. The discrimination parameters {discrim:i.item} must be positive and are thus initialized with 1s.

. set seed 14

. bayesmh y = ({discrim:}*({subj:}-{diff:})), likelihood(logit) ///
>         redefine(discrim:i.item) redefine(diff:i.item)        ///
>         redefine(subj:i.id)                                   ///
>         prior({subj:i.id},      normal(0, 1))                 ///
>         prior({discrim:i.item}, lognormal({mu_a}, {var_a}))   ///
>         prior({diff:i.item},    normal({mu_b}, {var_b}))      ///
>         prior({mu_a} {mu_b},    normal(0, 0.1))               ///
>         prior({var_a} {var_b},  igamma(10, 1))                ///
>         block({mu_a mu_b var_a var_b}, split)                 ///
>         init({discrim:i.item} 1)                              ///
>         exclude({subj:i.id}) burnin(5000) saving(sim2pl, replace)
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ logit(xb_discrim*(xb_subj-xb_diff))

Priors: 
  {discrim:i.item} ~ lognormal({mu_a},{var_a})                             (1)
     {diff:i.item} ~ normal({mu_b},{var_b})                                (2)
       {subj:i.id} ~ normal(0,1)                                           (3)

Hyperpriors: 
    {mu_a mu_b} ~ normal(0,0.1)
  {var_a var_b} ~ igamma(10,1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_discrim.
(2) Parameters are elements of the linear form xb_diff.
(3) Parameters are elements of the linear form xb_subj.

Bayesian logistic regression                     MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3711
                                                 Efficiency:  min =     .01617
                                                              avg =     .04923
Log marginal likelihood =          .                          max =      .1698
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
discrim      |
        item |
          1  |  1.430976   .1986011   .010953   1.413063   1.089405   1.850241
          2  |  .6954823   .1081209   .004677   .6897267   .4985004   .9276975
          3  |  .9838528   .1343908   .009079   .9780275   .7506566   1.259427
          4  |  .8167792   .1169157   .005601   .8136229   .5992495   1.067578
          5  |  .9402715   .1351977   .010584   .9370298   .6691103   1.214885
          6  |  .9666747   .1420065   .008099   .9616285   .7038868   1.245007
          7  |  .5651287   .0864522   .006201   .5617302   .3956216   .7431265
          8  |  1.354053   .2048404   .015547   1.344227   .9791096   1.761437
          9  |  .7065096   .1060773   .006573   .6999745   .5102749   .9271799
-------------+----------------------------------------------------------------
diff         |
        item |
          1  | -.5070314   .0784172   .003565   -.507922   -.671257  -.3596057
          2  | -.1467198    .117422   .003143  -.1456633  -.3895978   .0716841
          3  | -1.630259   .1900103   .013494  -1.612534  -2.033169  -1.304171
          4  |  .3273735   .1073891   .003565   .3231703   .1248782   .5492114
          5  |  1.529584   .1969554    .01549   1.507982   1.202271   1.993196
          6  |  .6325194    .115724   .005613   .6243691   .4272131   .8851649
          7  |  1.827013   .2884057   .019582    1.79828   1.349654   2.490633
          8  | -1.753744   .1939559   .014743  -1.738199  -2.211475  -1.438146
          9  | -1.384486   .2059005   .012105  -1.361195  -1.838918  -1.059687
-------------+----------------------------------------------------------------
        mu_a | -.1032615   .1148176   .003874   -.102376  -.3347816   .1277031
       var_a |  .1129835   .0356735   .001269   .1056105    .063403   .1981331
        mu_b | -.0696525   .2039387   .004949   -.072602  -.4641566   .3298393
       var_b |  .6216005   .2023137   .008293   .5843444   .3388551   1.101153
------------------------------------------------------------------------------

file sim2pl.dta saved

. estimates store est2pl

The average simulation efficiency is about 5%, but some of the parameters converge slower than the others, such as {diff:7.item}, which has the largest MCMC standard error (0.02) among the difficulty parameters. If this was a rigorous study, to lower the MCMC standard errors, we would recommend longer simulations with MCMC sample sizes of at least 50,000.

We can compare the 1PL and 2PL models by using the deviance information criterion (DIC) available with the bayesstats ic command.

. bayesstats ic est1pl est2pl, diconly

Deviance information criterion

------------------------
             |       DIC
-------------+----------
      est1pl |  8122.428
      est2pl |  8055.005
------------------------

DIC is often used in Bayesian model selection as an alternative to AIC and BIC criteria and can be easily obtained from an MCMC sample. Larger MCMC samples produce more reliable DIC estimates. Because different MCMC samples produce different sample DIC values and the sample approximation error in calculating DIC is not known, one should not rely solely on DIC when choosing a model.

Lower DIC values indicate better fit. The DIC of the 2PL model (8,055) is markedly lower than the DIC of the 1PL model (8,122), implying better fit of the 2PL model.

Back to table of contents

3PL model

The three-parameter logistic (3PL) model introduces lower asymptote parameters \(c_i\), also called guessing parameters. The probability of giving a correct response is given by

\[
P(Y_{ij}=1) = c_i + (1-c_i){\rm InvLogit}\{a_i(\theta_j-b_i)\} ,\ c_i > 0
\]

The guessing parameters may be difficult to estimate using maximum likelihood. Indeed, the irt 3pl command with the sepguessing option fails to converge, as you can verify by typing

. irt 3pl q1-q9, sepguessing

on the original dataset.

It is thus important to specify an informative prior for \(c_i\). We assume that the prior mean of the guessing parameters is about 0.1 and thus apply
\[
c_i \sim {\rm InvGamma}(10, 1)
\]

Similarly to the discrimination and difficulty parameters, the \(c_i\)’s are introduced as random-effects parameters in the bayesmh specification and are referred to as {gues:} in the likelihood specification.

Unlike 1PL and 2PL models, we cannot use the likelihood(logit) option to model the probability of success because the probability of correct response is no longer an inverse-logit transformation of the parameters. Instead, we use likelihood(dbernoulli()) to model the probability of success of a Bernoulli outcome directly.

To have a valid initialization of the MCMC sampler, we assign the \(c_i\)’s positive starting values, 0.1.

. set seed 14

. bayesmh y, likelihood(dbernoulli({gues:}+(1-{gues:})*                     ///
>                                  invlogit({discrim:}*({subj:}-{diff:})))) ///
>         redefine(discrim:i.item) redefine(diff:i.item)                    ///
>         redefine(gues:i.item)    redefine(subj:i.id)                      ///
>         prior({subj:i.id},      normal(0, 1))                             ///
>         prior({discrim:i.item}, lognormal({mu_a}, {var_a}))               ///
>         prior({diff:i.item},    normal({mu_b}, {var_b}))                  ///
>         prior({gues:i.item},    igamma(10, 1))                            ///
>         prior({mu_a} {mu_b},    normal(0, 0.1))                           ///
>         prior({var_a} {var_b},  igamma(10, 1))                            ///
>         block({mu_a mu_b var_a var_b}, split)                             ///
>         init({discrim:i.item} 1 {gues:i.item} 0.1)                        ///
>         exclude({subj:i.id}) burnin(5000) saving(sim3pls, replace)
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ binomial(xb_gues+(1-xb_gues)*invlogit(xb_discrim*(xb_subj-xb_diff)),1)

Priors: 
  {discrim:i.item} ~ lognormal({mu_a},{var_a})                             (1)
     {diff:i.item} ~ normal({mu_b},{var_b})                                (2)
     {gues:i.item} ~ igamma(10,1)                                          (3)
       {subj:i.id} ~ normal(0,1)                                           (4)

Hyperpriors: 
    {mu_a mu_b} ~ normal(0,0.1)
  {var_a var_b} ~ igamma(10,1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_discrim.
(2) Parameters are elements of the linear form xb_diff.
(3) Parameters are elements of the linear form xb_gues.
(4) Parameters are elements of the linear form xb_subj.

Bayesian Bernoulli model                         MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3496
                                                 Efficiency:  min =      .0148
                                                              avg =     .03748
Log marginal likelihood =          .                          max =      .2044
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
discrim      |
        item |
          1  |  1.712831   .2839419   .018436   1.681216   1.232644   2.351383
          2  |  .8540871   .1499645   .008265   .8414399   .6058463   1.165732
          3  |  1.094723   .1637954    .01126   1.081756    .817031   1.454845
          4  |  1.090891   .2149095   .013977   1.064651   .7488589   1.588164
          5  |  1.363236   .2525573   .014858   1.338075   .9348136   1.954695
          6  |  1.388325   .3027436   .024245   1.336303   .9466695   2.068181
          7  |  .9288217   .2678741   .021626   .8750048   .5690308   1.603375
          8  |  1.457763   .2201065    .01809   1.438027   1.068937   1.940431
          9  |  .7873631    .127779   .007447   .7796568    .563821    1.06523
-------------+----------------------------------------------------------------
diff         |
        item |
          1  | -.2933734   .0976177   .006339  -.2940499  -.4879558  -.0946848
          2  |  .2140365    .157158   .008333   .2037788  -.0553537   .5550411
          3  | -1.326351   .1981196   .013101  -1.326817  -1.706671  -.9307443
          4  |  .6367877   .1486799   .007895   .6277349   .3791045   .9509913
          5  |  1.616056   .1799378    .00966   1.606213   1.303614   2.006817
          6  |  .8354059    .124184    .00656   .8191839    .614221   1.097801
          7  |  2.066205   .3010858   .018377   2.034757   1.554484   2.709601
          8  | -1.555583   .1671435   .012265   -1.54984   -1.89487  -1.267001
          9  | -.9775626   .2477279   .016722  -.9936727  -1.431964  -.4093629
-------------+----------------------------------------------------------------
gues         |
        item |
          1  |  .1078598   .0337844     .0019   .1020673   .0581353   .1929404
          2  |  .1128113   .0372217   .002162   .1065996   .0596554   .2082417
          3  |   .123031   .0480042   .002579   .1127147   .0605462   .2516237
          4  |  .1190103   .0390721   .002369   .1123544   .0617698   .2095427
          5  |  .0829503   .0185785   .001275   .0807116   .0514752   .1232547
          6  |  .1059315   .0289175   .001708   .1022741   .0584959   .1709483
          7  |  .1235553   .0382661   .002964   .1186648   .0626495   .2067556
          8  |  .1142118   .0408348   .001733   .1062507   .0592389   .2134006
          9  |  .1270767   .0557821   .003939    .113562   .0621876   .2825752
-------------+----------------------------------------------------------------
        mu_a |   .109161   .1218499   .005504   .1126253   -.135329   .3501061
       var_a |   .108864   .0331522   .001053   .1030106   .0604834   .1860996
        mu_b |  .0782094   .1974657   .004367   .0755023  -.3067717   .4638104
       var_b |  .5829738   .1803167   .006263   .5562159   .3260449   1.034225
------------------------------------------------------------------------------

file sim3pls.dta saved

. estimates store est3pls

The estimated posterior means of the \(c_i\)’s range between 0.08 and 0.13. Clearly, the introduction of guessing parameters has an impact on the item discrimination and difficulty parameters. For example, the estimated posterior means of \(\mu_a\) and \(\mu_b\) shift from -0.10 and -0.07, respectively, for the 2PL model to 0.11 and 0.08, respectively, for the 3PL model.

Because the estimated guessing parameters are not that different, one may ask whether item-specific guessing parameters are really necessary. To answer this question, we fit a model with a common guessing parameter, {gues}, and compare it with the previous model.

. set seed 14

. bayesmh y, likelihood(dbernoulli({gues}+(1-{gues})*                       ///
>                                  invlogit({discrim:}*({subj:}-{diff:})))) ///
>         redefine(discrim:i.item) redefine(diff:i.item)                    ///
>         redefine(subj:i.id)                                               ///
>         prior({subj:i.id},      normal(0, 1))                             ///
>         prior({discrim:i.item}, lognormal({mu_a}, {var_a}))               ///
>         prior({diff:i.item},    normal({mu_b}, {var_b}))                  ///
>         prior({gues},           igamma(10, 1))                            ///
>         prior({mu_a} {mu_b},    normal(0, 0.1))                           ///
>         prior({var_a} {var_b},  igamma(10, 1))                            ///
>         block({mu_a mu_b var_a var_b gues}, split)                        ///
>         init({discrim:i.item} 1 {gues} 0.1)                               ///
>         exclude({subj:i.id}) burnin(5000) saving(sim3pl, replace)
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ binomial({gues}+(1-{gues})*invlogit(xb_discrim*(xb_subj-xb_diff)),1)

Priors: 
  {discrim:i.item} ~ lognormal({mu_a},{var_a})                             (1)
     {diff:i.item} ~ normal({mu_b},{var_b})                                (2)
       {subj:i.id} ~ normal(0,1)                                           (3)
            {gues} ~ igamma(10,1)

Hyperpriors: 
    {mu_a mu_b} ~ normal(0,0.1)
  {var_a var_b} ~ igamma(10,1)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_discrim.
(2) Parameters are elements of the linear form xb_diff.
(3) Parameters are elements of the linear form xb_subj.

Bayesian Bernoulli model                         MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3753
                                                 Efficiency:  min =     .01295
                                                              avg =     .03714
Log marginal likelihood =          .                          max =      .1874
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
discrim      |
        item |
          1  |  1.692894   .2748163   .021944   1.664569   1.232347   2.299125
          2  |  .8313512   .1355267    .00606   .8218212   .5928602   1.125729
          3  |  1.058833   .1611742   .014163   1.054126   .7676045   1.393611
          4  |  1.041808   .1718472   .008782   1.029867   .7398569   1.397073
          5  |  1.534997   .3208687   .023965   1.497019   1.019998   2.266078
          6  |   1.38296   .2581948   .019265   1.355706   .9559487   1.979358
          7  |  .8310222   .1698206   .012896   .8107371   .5736484   1.248736
          8  |  1.442949   .2266268   .017562   1.431204   1.066646   1.930829
          9  |    .77944   .1159669   .007266   .7750891   .5657258   1.014941
-------------+----------------------------------------------------------------
diff         |
        item |
          1  | -.3043161   .0859905   .005373  -.2968324  -.4870583  -.1407109
          2  |  .1814508   .1289251   .006543   .1832146  -.0723988   .4313265
          3  | -1.391216   .1924384   .014986  -1.373093  -1.809343  -1.050919
          4  |  .5928491   .1262631   .006721   .5829347    .356614    .857743
          5  |  1.617348   .1929263   .011604   1.601534   1.293032   2.061096
          6  |   .817635   .1172884   .006125    .812838   .5990503   1.064322
          7  |  2.006949   .2743517    .01785   1.981052   1.556682   2.594236
          8  | -1.576235   .1747855   .013455  -1.559435  -1.952676  -1.272108
          9  | -1.039362   .1840773    .01138   -1.02785  -1.432058  -.7160181
-------------+----------------------------------------------------------------
        gues |  .1027336   .0214544   .001753   .1022211   .0627299   .1466367
        mu_a |  .1009741    .123915   .006567   .0965353  -.1343028   .3510697
       var_a |  .1121003   .0344401   .001154   .1059563   .0628117   .1970842
        mu_b |  .0632173   .1979426   .004572   .0666684  -.3292497   .4482957
       var_b |  .5861236   .1818885   .006991   .5574743   .3239369   1.053172
------------------------------------------------------------------------------

file sim3pl.dta saved

. estimates store est3pl

We can again compare the two 3PL models by using the bayesstats ic command:

. bayesstats ic est3pls est3pl, diconly

Deviance information criterion

------------------------
             |       DIC
-------------+----------
     est3pls |  8049.425
      est3pl |  8049.426
------------------------

Although the estimated DICs of the two 3PL models are essentially the same, we decide for demonstration purposes to proceed with the model with item-specific guessing parameters.

Back to table of contents

4PL model

The four-parameter logistic (4PL) model extends the 3PL model by adding item-specific upper asymptote parameters \(d_i\):
\[
P(Y_{ij}=1) = c_i + (d_i-c_i){\rm InvLogit}\{a_i(\theta_j-b_i)\}
,\ c_i < d_i < 1 \] The \(d_i\) parameter can be viewed as an upper limit on the probability of correct response to the \(i\)th item. The probability of giving correct answers by subjects with very high ability can thus be no greater than \(d_i\). We restrict the \(d_i\)'s to the (0.8,1) range and assign them a \({\rm Uniform}(0.8,1)\) prior. For other parameters, we use the same priors as in the 3PL model. In the bayesmh specification of the model, the condition \(c_i < d_i\) is incorporated in the likelihood, and the condition \(d_i < 1\) is implied by the specified prior for the \(d_i\)'s. We initialize the \(d_i\)'s to 0.9. We use the notable option to suppress the long table output.

. set seed 14

. bayesmh y, likelihood(dbernoulli(({gues:}+({d:}-{gues:})*                 ///
>                                  invlogit({discrim:}*({subj:}-{diff:})))* ///
>                                  cond({gues:}<{d:},1,.)))                 ///
>         redefine(discrim:i.item) redefine(diff:i.item)                    ///
>         redefine(gues:i.item)    redefine(d:i.item)  redefine(subj:i.id)  ///
>         prior({subj:i.id},      normal(0, 1))                             ///
>         prior({discrim:i.item}, lognormal({mu_a}, {var_a}))               ///
>         prior({diff:i.item},    normal({mu_b}, {var_b}))                  ///
>         prior({gues:i.item},    igamma(10, 1))                            ///
>         prior({d:i.item},       uniform(0.8, 1))                          ///
>         prior({mu_a} {mu_b},    normal(0, 0.1))                           ///
>         prior({var_a} {var_b},  igamma(10, 1))                            ///
>         block({mu_a mu_b var_a var_b}, split)                             ///
>         init({discrim:i.item} 1 {gues:i.item} 0.1 {d:i.item} 0.9)         ///
>         exclude({subj:i.id}) burnin(5000) saving(sim4pls, replace) notable
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ binomial(<expr1>,1)

Priors: 
  {discrim:i.item} ~ lognormal({mu_a},{var_a})                             (1)
     {diff:i.item} ~ normal({mu_b},{var_b})                                (2)
     {gues:i.item} ~ igamma(10,1)                                          (3)
        {d:i.item} ~ uniform(0.8,1)                                        (4)
       {subj:i.id} ~ normal(0,1)                                           (5)

Hyperpriors: 
    {mu_a mu_b} ~ normal(0,0.1)
  {var_a var_b} ~ igamma(10,1)

Expression: 
  expr1 : (xb_gues+(xb_d-xb_gues)*invlogit(xb_discrim*(xb_subj-xb_diff)))* con
          d(xb_gues<xb_d,1,.)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_discrim.
(2) Parameters are elements of the linear form xb_diff.
(3) Parameters are elements of the linear form xb_gues.
(4) Parameters are elements of the linear form xb_d.
(5) Parameters are elements of the linear form xb_subj.

Bayesian Bernoulli model                         MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3639
                                                 Efficiency:  min =    .004825
                                                              avg =      .0219
Log marginal likelihood =          .                          max =      .1203
Note: There is a high autocorrelation after 500 lags.

file sim4pls.dta saved

. estimates store est4pls
 

We use bayesstats summary to display results of selected model parameters.

. bayesstats summary {d:i.item} {mu_a var_a mu_b var_b}

Posterior summary statistics                      MCMC sample size =    10,000
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
d            |
        item |
          1  |  .9598183   .0255321   .001948   .9621874   .9044441   .9981723
          2  |  .9024564   .0565702   .007407   .9019505   .8066354   .9944216
          3  |  .9525519   .0281878   .002845   .9551054   .8972454   .9971564
          4  |  .8887963   .0561697   .005793   .8859503   .8036236   .9916784
          5  |  .8815547   .0588907   .007215   .8708021   .8031737   .9926549
          6  |  .8891188   .0586482   .006891    .881882   .8024593   .9935512
          7  |   .874271   .0561718   .008087   .8635082   .8018176   .9880433
          8  |  .9663644   .0147606   .001121   .9667563   .9370666   .9950912
          9  |   .889164   .0486038   .005524   .8834207   .8084921   .9857415
-------------+----------------------------------------------------------------
        mu_a |  .3336887   .1436216   .009742    .334092   .0562924   .6164115
       var_a |  .1221547   .0406908   .002376   .1144729   .0642768   .2229326
        mu_b | -.0407488   .1958039   .005645  -.0398847  -.4220523   .3323791
       var_b |  .4991736   .1612246    .00629   .4660071   .2802531   .9023824
------------------------------------------------------------------------------

The bayesmh command issued a note indicating high autocorrelation for some of the model parameters. This may be related to slower MCMC convergence or more substantial problems in the model specification. It is thus worthwhile to inspect the individual autocorrelation of the parameters. We can do so by using the bayesstats ess command. The parameters with lower estimated sample size (ESS) have higher autocorrelation and vice versa.

. bayesstats ess {d:i.item} {mu_a var_a mu_b var_b}

Efficiency summaries    MCMC sample size =    10,000
 
----------------------------------------------------
             |        ESS   Corr. time    Efficiency
-------------+--------------------------------------
d            |
        item |
          1  |     171.82        58.20        0.0172
          2  |      58.33       171.43        0.0058
          3  |      98.17       101.87        0.0098
          4  |      94.02       106.36        0.0094
          5  |      66.62       150.11        0.0067
          6  |      72.44       138.05        0.0072
          7  |      48.25       207.26        0.0048
          8  |     173.30        57.70        0.0173
          9  |      77.41       129.19        0.0077
-------------+--------------------------------------
        mu_a |     217.35        46.01        0.0217
       var_a |     293.34        34.09        0.0293
        mu_b |    1203.20         8.31        0.1203
       var_b |     656.92        15.22        0.0657
----------------------------------------------------

We observe that the parameters with ESS lower than 200 are among the asymptote parameter’s \(d_i\)’s. This may be caused, for example, by overparameterization of the likelihood model and subsequent nonidentifiability, which is not resolved by the specified priors.

We can also fit a model with a common upper asymptote parameter, \(d\), and compare it with the model with the item-specific upper asymptote.

. set seed 14

. bayesmh y, likelihood(dbernoulli(({gues:}+({d}-{gues:})*                  ///
>                                  invlogit({discrim:}*({subj:}-{diff:})))* ///
>                                  cond({gues:}<{d},1,.)))                  ///
>         redefine(discrim:i.item) redefine(diff:i.item)                    ///
>         redefine(gues:i.item)    redefine(subj:i.id)                      ///
>         prior({subj:i.id},      normal(0, 1))                             ///
>         prior({discrim:i.item}, lognormal({mu_a}, {var_a}))               ///
>         prior({diff:i.item},    normal({mu_b}, {var_b}))                  ///
>         prior({gues:i.item},    igamma(10, 1))                            ///
>         prior({d},              uniform(0.8, 1))                          ///
>         prior({mu_a} {mu_b},    normal(0, 0.1))                           ///
>         prior({var_a} {var_b},  igamma(10, 1))                            ///
>         block({mu_a mu_b var_a var_b d}, split)                           ///
>         init({discrim:i.item} 1 {gues:i.item} 0.1 {d} 0.9)                ///
>         exclude({subj:i.id}) burnin(5000) saving(sim4pl, replace) notable
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ binomial(<expr>>,1)

Priors: 
  {discrim:i.item} ~ lognormal({mu_a},{var_a})                             (1)
     {diff:i.item} ~ normal({mu_b},{var_b})                                (2)
     {gues:i.item} ~ igamma(10,1)                                          (3)
       {subj:i.id} ~ normal(0,1)                                           (4)
               {d} ~ uniform(0.8,1)

Hyperpriors: 
    {mu_a mu_b} ~ normal(0,0.1)
  {var_a var_b} ~ igamma(10,1)

Expression: 
  expr1 : (xb_gues+({d}-xb_gues)*invlogit(xb_discrim*(xb_subj-xb_diff)))* cond
          (xb_gues<{d},1,.)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_discrim.
(2) Parameters are elements of the linear form xb_diff.
(3) Parameters are elements of the linear form xb_gues.
(4) Parameters are elements of the linear form xb_subj.

Bayesian Bernoulli model                         MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3877
                                                 Efficiency:  min =      .0107
                                                              avg =     .03047
Log marginal likelihood =          .                          max =      .1626

file sim4pl.dta saved

. estimates store est4pl

. bayesstats summary {d mu_a var_a mu_b var_b}

Posterior summary statistics                      MCMC sample size =    10,000
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
           d |  .9664578   .0144952   .001293   .9668207   .9371181   .9924572
        mu_a |  .2206696   .1387873    .01113   .2208302  -.0483587   .4952625
       var_a |  .1245785   .0391551   .001806   .1188779   .0658243   .2187058
        mu_b |  .0371722   .2020157    .00501   .0331742  -.3481366   .4336587
       var_b |  .5603447   .1761812   .006817   .5279243   .3157048   .9805077
------------------------------------------------------------------------------

We now compare the two 4PL models by using the bayesstats ic command:

. bayesstats ic est4pls est4pl, diconly

Deviance information criterion

------------------------
             |       DIC
-------------+----------
     est4pls |  8050.805
      est4pl |  8037.075
------------------------

The DIC of the more complex 4PL model (8,051) is substantially higher than the DIC of the simpler model (8,037). This and the potential nonidentifiability of the more complex est4pls model, indicated by high autocorrelation in the simulated MCMC sample, compel us to proceed with the model with a common upper asymptote, est4pl.

The posterior distribution of \(d\) has an estimated 95% equal-tailed credible interval of (0.93, 0.99) and is concentrated about 0.97. The \({\rm Uniform}(0.8,1)\) prior on \(d\) does not seem to be too restrictive. The estimated DIC of the est4pl model (8,037) is lower than the DIC of the est3pls 3PL model from the previous section (8,049), implying that the introduction of the upper asymptote parameter \(d\) does improve the model fit.

Back to table of contents

5PL model

The five-parameter logistic (5PL) model extends the 4PL model by adding item-specific asymmetry parameters \(e_i\):
\[
P(Y_{ij}=1) = c_i + (d_i-c_i){\rm InvLogit}\big[{\{a_i(\theta_j-b_i)\}}^{e_i}\big]
,\ c_i < d_i < 1, \ 0 < e_i < 1 \] In the previous section, we found the 4PL model with common upper asymptote \(d\), est4pl, to be the best one so far. We thus consider here a 5PL model with common upper asymptote \(d\).

Typically, we expect the \(e_i\) parameters to be close to 1. Similarly to the upper asymptote parameter \(d\), the \(e_i\) parameters are assumed to be in the (0.8,1) range and are assigned \({\rm Uniform}(0.8,1)\) prior. We initialize the \(e_i\)s to 0.9. We again use the notable option to suppress the long table output, and we display a subset of results by using bayesstats summary. (We could have used bayesmh's noshow() option instead to achieve the same result.)

. set seed 14

. bayesmh y, likelihood(dbernoulli(({gues:}+({d}-{gues:})*                  ///
>                           (invlogit({discrim:}*({subj:}-{diff:})))^{e:})* ///
>                           cond({gues:}<{d},1,.)))                         ///
>         redefine(discrim:i.item) redefine(diff:i.item)                    ///
>         redefine(gues:i.item)    redefine(e:i.item)  redefine(subj:i.id)  ///
>         prior({subj:i.id},      normal(0, 1))                             ///
>         prior({discrim:i.item}, lognormal({mu_a}, {var_a}))               ///
>         prior({diff:i.item},    normal({mu_b}, {var_b}))                  ///
>         prior({gues:i.item},    igamma(10, 1))                            ///
>         prior({d},              uniform(0.8, 1))                          ///
>         prior({e:i.item},       uniform(0.8, 1))                          ///
>         prior({mu_a} {mu_b},    normal(0, 0.1))                           ///
>         prior({var_a} {var_b},  igamma(10, 1))                            ///
>         block({mu_a mu_b var_a var_b d}, split)                           ///
>         init({discrim:i.item} 1 {gues:i.item} 0.1 {d} {e:i.item} 0.9)     ///
>         exclude({subj:i.id}) burnin(5000) saving(sim5pls, replace) notable
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ binomial(<expr1>,1)

Priors: 
  {discrim:i.item} ~ lognormal({mu_a},{var_a})                             (1)
     {diff:i.item} ~ normal({mu_b},{var_b})                                (2)
     {gues:i.item} ~ igamma(10,1)                                          (3)
        {e:i.item} ~ uniform(0.8,1)                                        (4)
       {subj:i.id} ~ normal(0,1)                                           (5)
               {d} ~ uniform(0.8,1)

Hyperpriors: 
    {mu_a mu_b} ~ normal(0,0.1)
  {var_a var_b} ~ igamma(10,1)

Expression: 
  expr1 : (xb_gues+({d}-xb_gues)*(invlogit(xb_discrim*(xb_subj-xb_diff)))^xb_e
          )* cond(xb_gues<{d},1,.)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_discrim.
(2) Parameters are elements of the linear form xb_diff.
(3) Parameters are elements of the linear form xb_gues.
(4) Parameters are elements of the linear form xb_e.
(5) Parameters are elements of the linear form xb_subj.

Bayesian Bernoulli model                         MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3708
                                                 Efficiency:  min =    .007341
                                                              avg =     .02526
Log marginal likelihood =          .                          max =      .1517

file sim5pls.dta saved

. estimates store est5pls

. bayesstats summary {e:i.item} {d mu_a var_a mu_b var_b}

Posterior summary statistics                      MCMC sample size =    10,000
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
e            |
        item |
          1  |   .897859   .0578428   .006083   .8939272   .8050315   .9957951
          2  |  .9042669   .0585023   .005822     .90525   .8053789   .9956565
          3  |    .88993   .0562398   .005013    .887011    .803389   .9930454
          4  |  .9010241   .0574186   .006492   .9042044   .8030981   .9925598
          5  |  .9126369   .0545625    .00521   .9178927   .8098596   .9964487
          6  |  .9037269   .0583833   .006814   .9086704   .8054932   .9961268
          7  |  .9136308   .0558911   .005373   .9203899   .8112029    .996217
          8  |   .889775   .0568656   .005119   .8849938    .803912   .9938777
          9  |  .8808435    .056257   .004743   .8727194   .8030522   .9904972
-------------+----------------------------------------------------------------
           d |  .9671374   .0144004   .001165   .9670598   .9382404   .9933374
        mu_a |  .2770211   .1353777    .00832   .2782552   .0141125   .5418087
       var_a |   .122635   .0404159   .002148   .1160322   .0666951   .2208711
        mu_b |  .1211885   .1929743   .004955   .1199136  -.2515431    .503733
       var_b |  .5407642   .1747674   .006353   .5088269   .3016315   .9590086
------------------------------------------------------------------------------

We also want to compare the above model with a simpler one using a common asymmetry parameter \(e\).

. set seed 14

. bayesmh y, likelihood(dbernoulli(({gues:}+({d}-{gues:})*                  ///
>                            (invlogit({discrim:}*({subj:}-{diff:})))^{e})* ///
>                            cond({gues:}<{d},1,.)))                        ///
>         redefine(discrim:i.item) redefine(diff:i.item)                    ///
>         redefine(gues:i.item)    redefine(subj:i.id)                      ///
>         prior({subj:i.id},      normal(0, 1))                             ///
>         prior({discrim:i.item}, lognormal({mu_a}, {var_a}))               ///
>         prior({diff:i.item},    normal({mu_b}, {var_b}))                  ///
>         prior({gues:i.item},    igamma(10, 1))                            ///
>         prior({d} {e},          uniform(0.8, 1))                          ///
>         prior({mu_a} {mu_b},    normal(0, 0.1))                           ///
>         prior({var_a} {var_b},  igamma(10, 1))                            ///
>         block({mu_a mu_b var_a var_b d e}, split)                         ///
>         init({discrim:i.item} 1 {gues:i.item} 0.1 {d e} 0.9)              ///
>         exclude({subj:i.id}) burnin(5000) saving(sim5pl, replace) notable
  
Burn-in ...
Simulation ...

Model summary
------------------------------------------------------------------------------
Likelihood: 
  y ~ binomial(<expr1>,1)

Priors: 
  {discrim:i.item} ~ lognormal({mu_a},{var_a})                             (1)
     {diff:i.item} ~ normal({mu_b},{var_b})                                (2)
     {gues:i.item} ~ igamma(10,1)                                          (3)
       {subj:i.id} ~ normal(0,1)                                           (4)
             {d e} ~ uniform(0.8,1)

Hyperpriors: 
    {mu_a mu_b} ~ normal(0,0.1)
  {var_a var_b} ~ igamma(10,1)

Expression: 
  expr1 : (xb_gues+({d}-xb_gues)*(invlogit(xb_discrim*(xb_subj-xb_diff)))^{e})
          * cond(xb_gues<{d},1,.)
------------------------------------------------------------------------------
(1) Parameters are elements of the linear form xb_discrim.
(2) Parameters are elements of the linear form xb_diff.
(3) Parameters are elements of the linear form xb_gues.
(4) Parameters are elements of the linear form xb_subj.

Bayesian Bernoulli model                         MCMC iterations  =     15,000
Random-walk Metropolis-Hastings sampling         Burn-in          =      5,000
                                                 MCMC sample size =     10,000
                                                 Number of obs    =      7,200
                                                 Acceptance rate  =      .3805
                                                 Efficiency:  min =    .008179
                                                              avg =     .02768
Log marginal likelihood =          .                          max =     .08904

file sim5pl.dta saved

. estimates store est5pl

. bayesstats summary {e d mu_a var_a mu_b var_b}

Posterior summary statistics                      MCMC sample size =    10,000
 
------------------------------------------------------------------------------
             |                                                Equal-tailed
             |      Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]
-------------+----------------------------------------------------------------
           e |  .9118363   .0558178   .004194   .9175841   .8063153   .9960286
           d |  .9655166   .0147373   .001495   .9659029   .9354708   .9924492
        mu_a |  .2674271   .1368926   .008485    .270597   .0102798   .5443345
       var_a |  .1250759   .0428095   .002635   .1173619   .0654135   .2340525
        mu_b |  .1015121   .2048178   .006864    .103268  -.3052377   .4934158
       var_b |  .5677309   .1824591   .006981   .5331636   .3079868   1.016762
------------------------------------------------------------------------------

We use bayesstats ic to compare the DIC values of the two 5PL models:

. bayesstats ic est5pls est5pl, diconly

Deviance information criterion

------------------------
             |       DIC
-------------+----------
     est5pls |  8030.894
      est5pl |  8034.517
------------------------

The estimated DIC of the more complex est5pls model (8,031) is lower than the DIC of the simpler model (8,035), suggesting a better fit.

Back to table of contents

Conclusion

Finally, we compare all eight fitted models.

. bayesstats ic est1pl est2pl est3pl est3pls est4pl est4pls est5pl est5pls, ///
>         diconly

Deviance information criterion

------------------------
             |       DIC
-------------+----------
      est1pl |  8122.428
      est2pl |  8055.005
      est3pl |  8049.426
     est3pls |  8049.425
      est4pl |  8037.075
     est4pls |  8050.805
      est5pl |  8034.517
     est5pls |  8030.894
------------------------

The est5pls model has the lowest overall DIC. To confirm this result, we run another set of simulations with a larger MCMC sample size of 50,000. (We simply added the mcmcsize(50000) option to the bayesmh specification of the above eight models.) The following DIC values, based on the larger MCMC sample size, are more reliably estimated.

. bayesstats ic est1pl est2pl est3pl est3pls est4pl est4pls est5pl est5pls, ///
>         diconly

Deviance information criterion

------------------------
             |       DIC
-------------+----------
      est1pl |  8124.015
      est2pl |  8052.068
      est3pl |  8047.067
     est3pls |  8047.738
      est4pl |  8032.417
     est4pls |  8049.712
      est5pl |  8031.375
     est5pls |  8031.905
------------------------

Again, the 5PL models have the lowest DIC values and seem to provide the best fit. However, the DIC differences between models est4pl, est5pl, and est5pls are minimal and may very well be within the estimation error. Regardless, these three models appear to be better than the simpler 1PL, 2PL, and 3PL models.

Additional model checking may be needed to assess the models' fit, and we should not rely solely on the DIC values to make our final model selection. A practitioner may still prefer the simpler est4pl 4PL model to the 5PL models even though it has a slightly higher DIC. In fact, given that the posterior mean estimate of the upper asymptote parameter \(d\) is 0.96 with a 95% equal-tailed credible interval of (0.94, 0.99), some practitioners may prefer the even simpler est3pl 3PL model.

References

De Boeck, P., and M. Wilson, ed. 2004. Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. New York: Springer.

Kim, J.-S., and D. M. Bolt. 2007. Estimating item response theory models using Markov chain Monte Carlo methods. Educational Measurement: Issues and Practice 26: 38-51.

  • PACES Consulting

    Hi Nikolay and Yulia,

    I wanted to try out using the example to specify a Rasch model, which seems correct (the parameter estimates are also reasonably close to those derived from jMetrik using the same data):

    webuse masc1, clear
    qui: g int id = _n
    qui: reshape long q, i(id) j(item)
    fvset base none id item
    set seed 14
    #d ;

    // 1PL Example from Blog Post
    bayesmh q = ({discrim}*({subj:}-{diff:})), likelihood(logit)
    redefine(diff:i.item) redefine(subj:i.id) prior({subj:i.id}, normal(0, 1))
    prior({discrim}, lognormal(0, 1)) prior({diff:i.item}, normal(0, 1))
    init({discrim} 1) exclude({subj:i.id}) burnin(5000);

    // Rasch example based on blog post
    bayesmh q = (1*({subj:}-{diff:})), likelihood(logit) redefine(diff:i.item)
    redefine(subj:i.id) prior({subj:i.id}, normal(0, 1))
    prior({diff:i.item}, normal(0, 1)) exclude({subj:i.id}) burnin(5000) ;

    However, it isn’t quite as clear how one could derive infit/outfit statistics, residuals for the person and item estimates, or the best way to fix the person estimate (e.g., using the sum score across items). It seems like the estimates in this example are all the Bayesian derivatives of the Marginal MLE estimator, but is there a way to fit the same models using the Joint MLE (for cases when the person parameters need to be estimated at the same time as the item parameters)? Last, is there any chance for a follow up to this post that would potentially show how to fit many facet rasch models and/or multidimensional IRT models using bayesmh (either case would be pretty amazing to see)?

    Thanks again and nice work on the blog post,
    Billy

  • Nikolay Balov

    Dear Billy,

    See our answers to each of your questions below.

    1. Rasch model

    Your specification of the Rasch model is correct. That is, for our math and science data example, a Rasch model can be specified as


    . bayesmh y = ({subj:}-{delta:}), likelihood(logit)
    > redefine(subj:i.id) redefine(delta:i.item) ...

    Here, we labelled the item-specific parameters as “delta” instead of “diff” as in our 1PL example to emphasize that the estimates from this Rasch model will be different than those from the fitted 1PL model.

    There is also more details about how to fit a Rasch model using
    bayesmh and its link to 1PL IRT model in example 28 in [BAYES] bayesmh.

    2. Joint estimation of person- and item-specific parameters

    bayesmh does estimate the person-specific and item-specific parameters jointly. In our IRT examples, we were not interested in the person-specific estimates so we used the exclude() option to exclude them from final results. If you do not use this option, the person-specific estimates will be saved with all MCMC estimates and will be displayed in the estimation table.

    3. Outfit/infit statistics and residuals

    Within Bayesian framework, model fit statistics are obtained by using a
    so-called posterior predictive distribution, the distribution of the outcome Y given the observed data y. A posterior predictive p-value associated with a statistic of interest is often used to access model fit. We will consider writing a follow-up blog entry about Bayesian posterior predictive assessment of IRT models.

    4. Many-facet Rasch models and multidimensional IRT models

    By viewing many-facet Rasch models as having additional “random-effects” parameters, we can extend the basic specification by simply adding more random-effects terms.

    Continuing our example of a Rasch model, suppose that there is another “facet” represented by a variable task in the dataset.

    id task item q
    ——————-
    1 1 1 0
    1 1 2 1

    100 10 5 0

    We simply add the random-effects parameters associated with task to our model specification as follows:


    . fvset base none id task item
    . bayesmh q = ({subj:}+{task:}-{delta:}), likelihood(logit)
    > redefine(subj:i.id)
    > redefine(task:i.task)
    > redefine(delta:i.item) ...

    Within the IRT context, the corresponding 2-dimensional IRT model could be fit as follows. (We use the specification of a multidimensional IRT model given by formula (3) in Reckase (2007, p. 612).)


    . fvset base none id task item
    . bayesmh q = ({a1}*{subj:}+{a2}*{task:}+{d:}), likelihood(logit)
    > redefine(subj:i.id)
    > redefine(task:i.task)
    > redefine(d:i.item) ...

    where parameters {a1} and {a2} are common across items. If we wanted to make these parameters item-specific, we can use the following specification:


    . fvset base none id task item
    . bayesmh q = ({a1:}*{subj:}+{a2:}*{task:}+{d:}), likelihood(logit)
    > redefine(subj:i.id)
    > redefine(task:i.task)
    > redefine(d:i.item)
    > redefine(a1:i.item)
    > redefine(a2:i.item) ...

    You can extend models above in a straightforward way to accommodate more facets or dimensions.

    Reference:

    Reckase, M. D. 2007. Multidimensional Item Response Theory. In Vol. 26 of Handbook of Statistics: Psychometrics, ed. C. R. Rao and S. Sinharay, 607-642. Amsterdam: Elseiver.

    — Nikolay and Yulia

  • PACES Consulting

    Hi Nikolay and Yulia,

    Awesome. I think we’re talking about slightly different things with regards to #3. Here’s an extremely brief explanation of the infit/outfit statistics from the Rasch perspective: http://www.rasch.org/rmt/rmt162f.htm as well as a partially useful excerpt from Wright, B. D., & Masters, G. N. (1982). Rating Scale Analysis. Chicago, Il: MESA Press: http://www.rasch.org/rmt/rmt34e.htm.

    The infit/outfit statistics are used when making decisions about retaining/dropping an item from the scoring/item bank/calibration and to some degree the person analogs of these statistics could be useful in detecting possible cases of test irregularities (e.g., a student with low theta answering difficult questions correctly and easier questions at chance levels, etc…). Nothing I’ve read thus far has talked about an omnibus style goodness of fit, and the discussion (at least with the folks from the Rasch camp) tends to be reversed into testing how well the data fit the model (instead of how well the model fits the data).

    In either case, this is awesome and timely (there was a comparison of Stata’s Bayesian capabilities to JAGS and Stan on Andrew Gelman’s blog today). If it’s possible to throw out another potential idea for a future blog post, if it isn’t too trouble some anything that demonstrates fitting any latent class models and/or mixture measurement models would be truly outstanding.

    Thanks again,
    Billy

  • PACES Consulting

    Hi Nikolay and Yulia,

    As a brief follow up regarding the Joint MLE and Infit/Outfit statistics, I put together a quick demonstration of some of the differences that I observed. The program is a wrapper that passed the data from Stata into some of the classes used by jMetrik (see https://github.com/meyerjp3/psychometrics for more info) to fit the Rasch model using the Joint Maximum Likelihood Estimator (as well as values for infit/outfit). The program also creates variables in the data set in memory with the person level estimates of theta, the SE around theta, and the person level infit/outfit statistics:

    net inst raschjmle, from(“http://www.paces-consulting.org/stata”)

    webuse masc1.dta, clear

    raschjmle q1-q9

    800

    Iteration Delta Log-likelihood

    ————————————————————–

    1 0.502591842208104 -3402.304331969046

    2 0.142412255554409 -3397.822027114892

    3 0.020979991419945 -3397.719031584525

    4 0.003561687956111 -3397.716620516149

    5 0.000591506681447 -3397.716599152711

    ————————————————————–

    =================================================================================================

    Item Difficulty Std. Error WMS Std. WMS UMS Std. UMS

    ————————————————————————————————-

    q1 -0.40 0.08 0.85 -4.32 0.84 -2.86

    q2 0.11 0.08 1.03 1.04 1.05 1.04

    q3 -1.36 0.10 0.93 -1.39 0.86 -1.39

    q4 0.49 0.08 0.99 -0.25 1.02 0.38

    q5 1.66 0.09 0.93 -1.54 1.02 0.28

    q6 0.82 0.08 0.93 -2.05 0.95 -0.82

    q7 1.37 0.09 1.10 2.42 1.17 1.99

    q8 -1.87 0.11 0.77 -3.81 0.85 -1.14

    q9 -0.81 0.09 1.04 1.04 1.13 1.66

    =================================================================================================

    SCALE QUALITY STATISTICS

    ==================================================

    Statistic Items Persons

    ————————————————–

    Observed Variance 1.3031 1.4411

    Observed Std. Dev. 1.1415 1.2005

    Mean Square Error 0.0080 0.7097

    Root MSE 0.0894 0.8425

    Adjusted Variance 1.2951 0.7314

    Adjusted Std. Dev. 1.1380 0.8552

    Separation Index 12.7235 1.0151

    Number of Strata 17.2980 1.6868

    Reliability 0.9939 0.5075

    ==================================================

    SCORE TABLE

    ==================================

    Score Theta Std. Err

    ———————————-

    0.00 -3.94 1.89

    1.00 -2.55 1.12

    2.00 -1.59 0.89

    3.00 -0.89 0.80

    4.00 -0.28 0.77

    5.00 0.31 0.76

    6.00 0.91 0.79

    7.00 1.59 0.87

    8.00 2.53 1.11

    9.00 3.89 1.89

    ==================================

  • Nikolay Balov

    Dear Billy,

    As we have mentioned earlier, the bayesmh formulation of the
    Rasch model does estimate item-specific and person-specific parameters jointly.
    The Bayesian estimates of the parameters should be reasonably close to those
    obtained using joint maximum likelihood estimation. Some difference however are
    permissible due to the use of informative prior distributions in the Bayesian
    model specification.

    The estimates you report using the raschjmle command are
    significantly different though.
    I believe that the reason for this is that raschjmle‘s estimates
    are centered. For example, if I center the posterior mean estimates reported by bayesmh the results seem to agree.

    Below I show the entire Bayesian model specification


    set seed 14
    bayesmh q = (1*({subj:}-{diff:})), likelihood(logit) ///
    redefine(diff:i.item) redefine(subj:i.id) ///
    prior({subj:i.id}, normal(0, 1)) ///
    prior({diff:i.item}, normal(0, 1)) ///
    exclude({subj:i.id}) burnin(5000) dots saving("sim1", replace)

    With the following few lines of code I center the posterior mean estimates for
    item difficulties and list them.


    matrix mitem = e(mean)'
    clear
    svmat mitem, name(item)
    summ item, meanonly
    gen citem = item - r(mean)
    list

    The result is

    +———————–+
    | item1 citem |
    |———————–|
    1. | -.6146172 -.4006386 |
    2. | -.104734 .1092446 |
    3. | -1.578288 -1.364309 |
    4. | .2841987 .4981773 |
    5. | 1.444101 1.65808 |
    |———————–|
    6. | .6083501 .8223287 |
    7. | 1.159187 1.373166 |
    8. | -2.090234 -1.876255 |
    9. | -1.033772 -.8197934 |
    +———————–+

    — Nikolay

  • PACES Consulting

    Hi Nikolay,

    Thanks again for the additional information. I thought there may have been more substantial differences between the marginal and joint MLE algorithms for estimating the item and person parameters, but this definitely helps quite a bit.

    Thanks again,
    Billy

  • Pingback: The Stata Blog » Fitting distributions using bayesmh()

  • Christian Gregory

    Hi, I really appreciate this post and the work you have done on these models. I have a much simpler problem. In trying to set up a simple bayesian IRT model with data that has 37,000 observations with 10 questions each. Every time I try to run the 1PL outlined above, I get the following error: variable id is missing or contains non-integer values r(198). I’m sure this has something to do with the way id is being stored, but I’ve yet to figure out how to correct this.

  • Billy

    Nice presentation at the Stata conference Nikolay. Definitely cool to get a bit more explanation of things and see other uses of the techniques.

  • Nikolay Balov

    Thank you Billy. It was pleasure meeting you in Chicago.

  • Adelina Morgan

    Doing Trade is extremely profitable when using the right techniques and strategies and also frustrating for those without better system to trade and signal provider. If you need an assistance in doing a beneficiary trade and you are losing out all your investment instead of gaining, there is still a big hope for you to recover all your lost funds.(Some traders will tell you that is either you loose or gain) in my case loosing and failure is not an option…You can contact me if interested on (adelinamorgan3310@gmaildotcom) Your assurance on gaining instead of losing is 95% on binary trade or forex. Take a try and thank me.

  • Billy Bass

    Hi Nikolay and Yulia,

    I have a new question/suggestion for a follow up to this post. How would one go about fitting Bayesian Partial Credit, Graded Response, and/or Rating Scale models? In particular I’m thinking about applications related to Many Facet Rasch models where we would want to adjust the estimates of theta for the individual rater effects.

    Thanks again,
    Billy

  • Billy Bass

    Are you generating the id variable as the row indices like the example above (e.g., g id = _n)? That should work fairly consistently. Also, if you post the exact code you used it will be much easier for others to give you advice/help.