Bayesian binary item response theory models using bayesmh
This post was written jointly with Yulia Marchenko, Executive Director of Statistics, StataCorp.
Table of Contents
Overview
1PL model
2PL model
3PL model
4PL model
5PL model
Conclusion
Overview
Item response theory (IRT) is used for modeling the relationship between the latent abilities of a group of subjects and the examination items used for measuring their abilities. Stata 14 introduced a suite of commands for fitting IRT models using maximum likelihood; see, for example, the blog post Spotlight on irt by Rafal Raciborski and the [IRT] Item Response Theory manual for more details. In this post, we demonstrate how to fit Bayesian binary IRT models by using the redefine() option introduced for the bayesmh command in Stata 14.1. We also use the likelihood option dbernoulli() available as of the update on 03 Mar 2016 for fitting Bernoulli distribution. If you are not familiar with the concepts and jargon of Bayesian statistics, you may want to watch the introductory videos on the Stata Youtube channel before proceeding.
Introduction to Bayesian analysis, part 1 : The basic concepts
Introduction to Bayesian analysis, part 2: MCMC and the Metropolis-Hastings algorithm
We use the abridged version of the mathematics and science data from DeBoeck and Wilson (2004), masc1. The dataset includes 800 student responses to 9 test questions intended to measure mathematical ability.
The irt suite fits IRT models using data in the wide form – one observation per subject with items recorded in separate variables. To fit IRT models using bayesmh, we need data in the long form, where items are recorded as multiple observations per subject. We thus reshape the dataset in a long form: we have a single binary response variable, y, and two index variables, item and id, which identify the items and subjects, respectively. This allows us to formulate our IRT models as multilevel models. The following commands load and prepare the dataset.
. webuse masc1 (Data from De Boeck & Wilson (2004)) . generate id = _n . quietly reshape long q, i(id) j(item) . rename q y
To ensure that we include all levels of item and id in our models, we use fvset base none to keep the base categories.
. fvset base none id item
In what follows, we present eight Bayesian binary IRT models increasing in complexity and explanatory power. We perform Bayesian model comparison to gain insight into what would be the more appropriate model for the data at hand.
For high-dimensional models such as IRT models, you may see differences in the estimation results between different platforms or different flavors of Stata because of the nature of the Markov chain Monte Carlo (MCMC) sampling and finite numerical precision. These differences are not a source of concern; they will be within the range of the MCMC variability and will lead to similar inferential conclusions. The differences will diminish as the MCMC sample size increases. The results in this post are obtained from Stata/SE on the 64-bit Linux platform using the default 10,000 MCMC sample size.
Let the items be indexed by \(i=1,\dots,9\) and the subjects by \(j=1,\dots,800\). Let \(\theta_j\) be the latent mathematical ability of subject \(j\), and let \(Y_{ij}\) be the response of subject \(j\) to item \(i\).
1PL model
In the one-parameter logistic (1PL) model, the probability of getting a correct response is modeled as an inverse-logit function of location parameters \(b_i\), also called item difficulties, and a common slope parameter \(a\), also called item discrimination:
\[
P(Y_{ij}=1) = {\rm InvLogit}\{a(\theta_j-b_i)\} =
\frac{\exp\{a(\theta_j-b_i)\}}{1+\exp\{a(\theta_j-b_i)\}}
\]
Typically, the abilities are assumed to be normally distributed:
\[
\theta_j \sim {\rm N}(0,1)
\]
In a multilevel framework, the \(\theta_j\)’s represent random effects. In a Bayesian framework, we use the term “random effects” to refer to the parameters corresponding to levels of grouping variables identifying the hierarchy of the data.
A Bayesian formulation of the 1PL model also requires prior specification for the model parameters \(a\) and \(b_i\). The discrimination parameter \(a\) is assumed to be positive and is often modeled in the log scale. Because we have no prior knowledge about the discrimination and difficulty parameters, we assume that the prior distributions of \(\ln(a)\) and \(b_i\) have support on the whole real line, are symmetric, and are centered at 0. A normal prior distribution is thus a natural choice. We furthermore assume that \(\ln(a)\) and \(b_i\) are close to 0 and have prior variance of 1, which is an entirely subjective decision. We thus assign \(\ln(a)\) and \(b_i\) standard normal prior distributions:
\[\ln(a) \sim {\rm N}(0, 1)\] \[b_i \sim {\rm N}(0, 1) \]
To specify the likelihood function of the 1PL model in bayesmh, we use a nonlinear equation specification for the response variable y. The direct nonlinear specification for this model is
bayesmh y = ({discrim}*({subj:i.id}-{diff:i.item})), likelihood(logit) ...
where {discrim} is the discrimination parameter \(a\), {subj:i.id} are latent abilities \(\theta_j\), and {diff:i.item} are item difficulties \(b_i\). The logit model is used for the probability of a success, \(P(Y_{ij}=1)\). Specification {subj:i.id} in the above nonlinear expression is viewed as a substitutable expression for linear combinations of indicators associated with the id variable and parameters \(\theta_j\). This specification may be computationally prohibitive with a large number of subjects. A more efficient solution is to use the redefine() option to include subject random effects \(\theta_j\) in the model. The same argument may apply to the {diff:i.item} specification when there are many items. Thus, it may be computationally convenient to treat the \(b_i\) parameters as “random effects” in the specification and use the redefine() option to include them in the model.
A more efficient specification is thus
bayesmh y = ({discrim}*({subj:}-{diff:})), likelihood(logit) /// redefine(subj:i.id) redefine(diff:i.item) ... ///
where {subj:} and {diff:} in the nonlinear specification now represent the \(\theta_j\) and \(b_i\) parameters, respectively, without using expansions into linear combinations of indicator variables.
Below, we show the full bayesmh specification of the 1PL model and the output summary. In our examples, we treat the abilities {subj:i.id} as nuisance parameters and exclude them from the final results. The discrimination model parameter {discrim} must be positive and is thus initialized with 1. A longer burn-in period, burnin(5000), allows for longer adaptation of the MCMC sampler, which is needed given the large number of parameters in the model. Finally, the estimation results are stored for later model comparison.
. set seed 14 . bayesmh y = ({discrim}*({subj:}-{diff:})), likelihood(logit) /// > redefine(diff:i.item) redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim}, lognormal(0, 1)) /// > prior({diff:i.item}, normal(0, 1)) /// > init({discrim} 1) exclude({subj:i.id}) /// > burnin(5000) saving(sim1pl, replace) Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ logit({discrim}*(xb_subj-xb_diff)) Priors: {diff:i.item} ~ normal(0,1) (1) {subj:i.id} ~ normal(0,1) (2) {discrim} ~ lognormal(0,1) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_diff. (2) Parameters are elements of the linear form xb_subj. Bayesian logistic regression MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3074 Efficiency: min = .02691 avg = .06168 Log marginal likelihood = . max = .09527 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- diff | item | 1 | -.6934123 .0998543 .003576 -.6934789 -.8909473 -.4917364 2 | -.1234553 .0917187 .002972 -.1241642 -.3030341 .0597863 3 | -1.782762 .1323252 .00566 -1.781142 -2.05219 -1.534451 4 | .3152835 .0951978 .003289 .3154714 .1279147 .4981263 5 | 1.622545 .127213 .005561 1.619388 1.377123 1.883083 6 | .6815517 .0978777 .003712 .6788345 .4911366 .881128 7 | 1.303482 .1173994 .005021 1.302328 1.084295 1.544913 8 | -2.353975 .1620307 .008062 -2.351207 -2.672983 -2.053112 9 | -1.168668 .1120243 .004526 -1.163922 -1.392936 -.9549209 -------------+---------------------------------------------------------------- discrim | .8644787 .0439804 .002681 .8644331 .7818035 .9494433 ------------------------------------------------------------------------------ file sim1pl.dta saved . estimates store est1pl
The sampling efficiency is acceptable, about 6% on average, with no indication of convergence problems. Although detailed convergence inspection of all parameters is outside the scope of this post, we recommend that you do so by using, for example, the bayesgraph diagnostics command.
Though we used informative priors for the model parameters, the estimation results from our Bayesian model are not that different from the maximum likelihood estimates obtained using the irt 1pl command (see example 1 in [IRT] irt 1pl). For example, the posterior mean estimate for {discrim} is 0.86 with an MCMC standard error of 0.003, whereas irt 1pl reports 0.85 with a standard error of 0.05.
The log-marginal likelihood is reported missing because we have excluded the {subj:i.id} parameters from the simulation results and the Laplace-Metropolis estimator of the log-marginal likelihood is not available in such cases. This estimator requires simulation results for all model parameters to compute the log-marginal likelihood.
2PL model
The two-parameter logistic (2PL) model extends the 1PL model by allowing for item-specific discrimination. The probability of correct response is now modeled as a function of item-specific slope parameters \(a_i\):
\[
P(Y_{ij}=1) = {\rm InvLogit}\{a_i(\theta_j-b_i)\} =
\frac{\exp\{a_i(\theta_j-b_i)\}}{1+\exp\{a_i(\theta_j-b_i)\}}
\]
The prior specification for \(\theta_j\) stays the same as in the 1PL model. We will, however, apply more elaborate prior specifications for the \(a_i\)’s and \(b_i\)’s. It is a good practice to use proper prior specifications without overwhelming the evidence from the data. The impact of the priors can be controlled by introducing additional hyperparameters. For example, Kim and Bolt (2007) proposed the use of a normal prior for the difficulty parameters with unknown mean and variance. Extending this approach to the discrimination parameters as well, we apply a hierarchical Bayesian model in which the \(\ln(a_i)\) and \(b_i\) parameters have the following prior specifications:
\[ \ln(a_i) \sim {\rm N}(\mu_a, \sigma_a^2) \] \[ b_i \sim {\rm N}(\mu_b, \sigma_b^2) \]
The mean hyperparameters, \(\mu_a\) and \(\mu_b\), and variance hyperparameters, \(\sigma_a^2\) and \(\sigma_b^2\), require informative prior specifications. We assume that the means are centered at 0 with a variation of 0.1:
\[
\mu_a, \mu_b \sim {\rm N}(0, 0.1)
\]
To lower the variability of the \(\ln(a_i)\) and \(b_i\) parameters, we apply an inverse-gamma prior with shape 10 and scale 1 for the variance parameters:
\[
\sigma_a^2, \sigma_b^2 \sim {\rm InvGamma}(10, 1)
\]
Thus, the prior mean of \(\sigma_a^2\) and \(\sigma_b^2\) is about 0.1.
In the bayesmh specification, the hyperparameters \(\mu_a\), \(\mu_b\), \(\sigma_a^2\), and \(\sigma_a^2\) are denoted as {mu_a}, {mu_b}, {var_a}, and {var_b}, respectively. We use the redefine(discrim:i.item) option to include in the model the discrimination parameters \(a_i\), referred to as {discrim:} in the likelihood specification.
Regarding the MCMC simulation, we change some of the default options. The hyperparameters {mu_a}, {mu_b}, {var_a}, and {var_b} are placed in separate blocks to improve the simulation efficiency. The discrimination parameters {discrim:i.item} must be positive and are thus initialized with 1s.
. set seed 14 . bayesmh y = ({discrim:}*({subj:}-{diff:})), likelihood(logit) /// > redefine(discrim:i.item) redefine(diff:i.item) /// > redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim:i.item}, lognormal({mu_a}, {var_a})) /// > prior({diff:i.item}, normal({mu_b}, {var_b})) /// > prior({mu_a} {mu_b}, normal(0, 0.1)) /// > prior({var_a} {var_b}, igamma(10, 1)) /// > block({mu_a mu_b var_a var_b}, split) /// > init({discrim:i.item} 1) /// > exclude({subj:i.id}) burnin(5000) saving(sim2pl, replace) Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ logit(xb_discrim*(xb_subj-xb_diff)) Priors: {discrim:i.item} ~ lognormal({mu_a},{var_a}) (1) {diff:i.item} ~ normal({mu_b},{var_b}) (2) {subj:i.id} ~ normal(0,1) (3) Hyperpriors: {mu_a mu_b} ~ normal(0,0.1) {var_a var_b} ~ igamma(10,1) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_discrim. (2) Parameters are elements of the linear form xb_diff. (3) Parameters are elements of the linear form xb_subj. Bayesian logistic regression MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3711 Efficiency: min = .01617 avg = .04923 Log marginal likelihood = . max = .1698 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- discrim | item | 1 | 1.430976 .1986011 .010953 1.413063 1.089405 1.850241 2 | .6954823 .1081209 .004677 .6897267 .4985004 .9276975 3 | .9838528 .1343908 .009079 .9780275 .7506566 1.259427 4 | .8167792 .1169157 .005601 .8136229 .5992495 1.067578 5 | .9402715 .1351977 .010584 .9370298 .6691103 1.214885 6 | .9666747 .1420065 .008099 .9616285 .7038868 1.245007 7 | .5651287 .0864522 .006201 .5617302 .3956216 .7431265 8 | 1.354053 .2048404 .015547 1.344227 .9791096 1.761437 9 | .7065096 .1060773 .006573 .6999745 .5102749 .9271799 -------------+---------------------------------------------------------------- diff | item | 1 | -.5070314 .0784172 .003565 -.507922 -.671257 -.3596057 2 | -.1467198 .117422 .003143 -.1456633 -.3895978 .0716841 3 | -1.630259 .1900103 .013494 -1.612534 -2.033169 -1.304171 4 | .3273735 .1073891 .003565 .3231703 .1248782 .5492114 5 | 1.529584 .1969554 .01549 1.507982 1.202271 1.993196 6 | .6325194 .115724 .005613 .6243691 .4272131 .8851649 7 | 1.827013 .2884057 .019582 1.79828 1.349654 2.490633 8 | -1.753744 .1939559 .014743 -1.738199 -2.211475 -1.438146 9 | -1.384486 .2059005 .012105 -1.361195 -1.838918 -1.059687 -------------+---------------------------------------------------------------- mu_a | -.1032615 .1148176 .003874 -.102376 -.3347816 .1277031 var_a | .1129835 .0356735 .001269 .1056105 .063403 .1981331 mu_b | -.0696525 .2039387 .004949 -.072602 -.4641566 .3298393 var_b | .6216005 .2023137 .008293 .5843444 .3388551 1.101153 ------------------------------------------------------------------------------ file sim2pl.dta saved . estimates store est2pl
The average simulation efficiency is about 5%, but some of the parameters converge slower than the others, such as {diff:7.item}, which has the largest MCMC standard error (0.02) among the difficulty parameters. If this was a rigorous study, to lower the MCMC standard errors, we would recommend longer simulations with MCMC sample sizes of at least 50,000.
We can compare the 1PL and 2PL models by using the deviance information criterion (DIC) available with the bayesstats ic command.
. bayesstats ic est1pl est2pl, diconly Deviance information criterion ------------------------ | DIC -------------+---------- est1pl | 8122.428 est2pl | 8055.005 ------------------------
DIC is often used in Bayesian model selection as an alternative to AIC and BIC criteria and can be easily obtained from an MCMC sample. Larger MCMC samples produce more reliable DIC estimates. Because different MCMC samples produce different sample DIC values and the sample approximation error in calculating DIC is not known, one should not rely solely on DIC when choosing a model.
Lower DIC values indicate better fit. The DIC of the 2PL model (8,055) is markedly lower than the DIC of the 1PL model (8,122), implying better fit of the 2PL model.
3PL model
The three-parameter logistic (3PL) model introduces lower asymptote parameters \(c_i\), also called guessing parameters. The probability of giving a correct response is given by
\[
P(Y_{ij}=1) = c_i + (1-c_i){\rm InvLogit}\{a_i(\theta_j-b_i)\} ,\ c_i > 0
\]
The guessing parameters may be difficult to estimate using maximum likelihood. Indeed, the irt 3pl command with the sepguessing option fails to converge, as you can verify by typing
. irt 3pl q1-q9, sepguessing
on the original dataset.
It is thus important to specify an informative prior for \(c_i\). We assume that the prior mean of the guessing parameters is about 0.1 and thus apply
\[
c_i \sim {\rm InvGamma}(10, 1)
\]
Similarly to the discrimination and difficulty parameters, the \(c_i\)’s are introduced as random-effects parameters in the bayesmh specification and are referred to as {gues:} in the likelihood specification.
Unlike 1PL and 2PL models, we cannot use the likelihood(logit) option to model the probability of success because the probability of correct response is no longer an inverse-logit transformation of the parameters. Instead, we use likelihood(dbernoulli()) to model the probability of success of a Bernoulli outcome directly.
To have a valid initialization of the MCMC sampler, we assign the \(c_i\)’s positive starting values, 0.1.
. set seed 14 . bayesmh y, likelihood(dbernoulli({gues:}+(1-{gues:})* /// > invlogit({discrim:}*({subj:}-{diff:})))) /// > redefine(discrim:i.item) redefine(diff:i.item) /// > redefine(gues:i.item) redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim:i.item}, lognormal({mu_a}, {var_a})) /// > prior({diff:i.item}, normal({mu_b}, {var_b})) /// > prior({gues:i.item}, igamma(10, 1)) /// > prior({mu_a} {mu_b}, normal(0, 0.1)) /// > prior({var_a} {var_b}, igamma(10, 1)) /// > block({mu_a mu_b var_a var_b}, split) /// > init({discrim:i.item} 1 {gues:i.item} 0.1) /// > exclude({subj:i.id}) burnin(5000) saving(sim3pls, replace) Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ binomial(xb_gues+(1-xb_gues)*invlogit(xb_discrim*(xb_subj-xb_diff)),1) Priors: {discrim:i.item} ~ lognormal({mu_a},{var_a}) (1) {diff:i.item} ~ normal({mu_b},{var_b}) (2) {gues:i.item} ~ igamma(10,1) (3) {subj:i.id} ~ normal(0,1) (4) Hyperpriors: {mu_a mu_b} ~ normal(0,0.1) {var_a var_b} ~ igamma(10,1) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_discrim. (2) Parameters are elements of the linear form xb_diff. (3) Parameters are elements of the linear form xb_gues. (4) Parameters are elements of the linear form xb_subj. Bayesian Bernoulli model MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3496 Efficiency: min = .0148 avg = .03748 Log marginal likelihood = . max = .2044 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- discrim | item | 1 | 1.712831 .2839419 .018436 1.681216 1.232644 2.351383 2 | .8540871 .1499645 .008265 .8414399 .6058463 1.165732 3 | 1.094723 .1637954 .01126 1.081756 .817031 1.454845 4 | 1.090891 .2149095 .013977 1.064651 .7488589 1.588164 5 | 1.363236 .2525573 .014858 1.338075 .9348136 1.954695 6 | 1.388325 .3027436 .024245 1.336303 .9466695 2.068181 7 | .9288217 .2678741 .021626 .8750048 .5690308 1.603375 8 | 1.457763 .2201065 .01809 1.438027 1.068937 1.940431 9 | .7873631 .127779 .007447 .7796568 .563821 1.06523 -------------+---------------------------------------------------------------- diff | item | 1 | -.2933734 .0976177 .006339 -.2940499 -.4879558 -.0946848 2 | .2140365 .157158 .008333 .2037788 -.0553537 .5550411 3 | -1.326351 .1981196 .013101 -1.326817 -1.706671 -.9307443 4 | .6367877 .1486799 .007895 .6277349 .3791045 .9509913 5 | 1.616056 .1799378 .00966 1.606213 1.303614 2.006817 6 | .8354059 .124184 .00656 .8191839 .614221 1.097801 7 | 2.066205 .3010858 .018377 2.034757 1.554484 2.709601 8 | -1.555583 .1671435 .012265 -1.54984 -1.89487 -1.267001 9 | -.9775626 .2477279 .016722 -.9936727 -1.431964 -.4093629 -------------+---------------------------------------------------------------- gues | item | 1 | .1078598 .0337844 .0019 .1020673 .0581353 .1929404 2 | .1128113 .0372217 .002162 .1065996 .0596554 .2082417 3 | .123031 .0480042 .002579 .1127147 .0605462 .2516237 4 | .1190103 .0390721 .002369 .1123544 .0617698 .2095427 5 | .0829503 .0185785 .001275 .0807116 .0514752 .1232547 6 | .1059315 .0289175 .001708 .1022741 .0584959 .1709483 7 | .1235553 .0382661 .002964 .1186648 .0626495 .2067556 8 | .1142118 .0408348 .001733 .1062507 .0592389 .2134006 9 | .1270767 .0557821 .003939 .113562 .0621876 .2825752 -------------+---------------------------------------------------------------- mu_a | .109161 .1218499 .005504 .1126253 -.135329 .3501061 var_a | .108864 .0331522 .001053 .1030106 .0604834 .1860996 mu_b | .0782094 .1974657 .004367 .0755023 -.3067717 .4638104 var_b | .5829738 .1803167 .006263 .5562159 .3260449 1.034225 ------------------------------------------------------------------------------ file sim3pls.dta saved . estimates store est3pls
The estimated posterior means of the \(c_i\)’s range between 0.08 and 0.13. Clearly, the introduction of guessing parameters has an impact on the item discrimination and difficulty parameters. For example, the estimated posterior means of \(\mu_a\) and \(\mu_b\) shift from -0.10 and -0.07, respectively, for the 2PL model to 0.11 and 0.08, respectively, for the 3PL model.
Because the estimated guessing parameters are not that different, one may ask whether item-specific guessing parameters are really necessary. To answer this question, we fit a model with a common guessing parameter, {gues}, and compare it with the previous model.
. set seed 14 . bayesmh y, likelihood(dbernoulli({gues}+(1-{gues})* /// > invlogit({discrim:}*({subj:}-{diff:})))) /// > redefine(discrim:i.item) redefine(diff:i.item) /// > redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim:i.item}, lognormal({mu_a}, {var_a})) /// > prior({diff:i.item}, normal({mu_b}, {var_b})) /// > prior({gues}, igamma(10, 1)) /// > prior({mu_a} {mu_b}, normal(0, 0.1)) /// > prior({var_a} {var_b}, igamma(10, 1)) /// > block({mu_a mu_b var_a var_b gues}, split) /// > init({discrim:i.item} 1 {gues} 0.1) /// > exclude({subj:i.id}) burnin(5000) saving(sim3pl, replace) Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ binomial({gues}+(1-{gues})*invlogit(xb_discrim*(xb_subj-xb_diff)),1) Priors: {discrim:i.item} ~ lognormal({mu_a},{var_a}) (1) {diff:i.item} ~ normal({mu_b},{var_b}) (2) {subj:i.id} ~ normal(0,1) (3) {gues} ~ igamma(10,1) Hyperpriors: {mu_a mu_b} ~ normal(0,0.1) {var_a var_b} ~ igamma(10,1) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_discrim. (2) Parameters are elements of the linear form xb_diff. (3) Parameters are elements of the linear form xb_subj. Bayesian Bernoulli model MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3753 Efficiency: min = .01295 avg = .03714 Log marginal likelihood = . max = .1874 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- discrim | item | 1 | 1.692894 .2748163 .021944 1.664569 1.232347 2.299125 2 | .8313512 .1355267 .00606 .8218212 .5928602 1.125729 3 | 1.058833 .1611742 .014163 1.054126 .7676045 1.393611 4 | 1.041808 .1718472 .008782 1.029867 .7398569 1.397073 5 | 1.534997 .3208687 .023965 1.497019 1.019998 2.266078 6 | 1.38296 .2581948 .019265 1.355706 .9559487 1.979358 7 | .8310222 .1698206 .012896 .8107371 .5736484 1.248736 8 | 1.442949 .2266268 .017562 1.431204 1.066646 1.930829 9 | .77944 .1159669 .007266 .7750891 .5657258 1.014941 -------------+---------------------------------------------------------------- diff | item | 1 | -.3043161 .0859905 .005373 -.2968324 -.4870583 -.1407109 2 | .1814508 .1289251 .006543 .1832146 -.0723988 .4313265 3 | -1.391216 .1924384 .014986 -1.373093 -1.809343 -1.050919 4 | .5928491 .1262631 .006721 .5829347 .356614 .857743 5 | 1.617348 .1929263 .011604 1.601534 1.293032 2.061096 6 | .817635 .1172884 .006125 .812838 .5990503 1.064322 7 | 2.006949 .2743517 .01785 1.981052 1.556682 2.594236 8 | -1.576235 .1747855 .013455 -1.559435 -1.952676 -1.272108 9 | -1.039362 .1840773 .01138 -1.02785 -1.432058 -.7160181 -------------+---------------------------------------------------------------- gues | .1027336 .0214544 .001753 .1022211 .0627299 .1466367 mu_a | .1009741 .123915 .006567 .0965353 -.1343028 .3510697 var_a | .1121003 .0344401 .001154 .1059563 .0628117 .1970842 mu_b | .0632173 .1979426 .004572 .0666684 -.3292497 .4482957 var_b | .5861236 .1818885 .006991 .5574743 .3239369 1.053172 ------------------------------------------------------------------------------ file sim3pl.dta saved . estimates store est3pl
We can again compare the two 3PL models by using the bayesstats ic command:
. bayesstats ic est3pls est3pl, diconly Deviance information criterion ------------------------ | DIC -------------+---------- est3pls | 8049.425 est3pl | 8049.426 ------------------------
Although the estimated DICs of the two 3PL models are essentially the same, we decide for demonstration purposes to proceed with the model with item-specific guessing parameters.
4PL model
The four-parameter logistic (4PL) model extends the 3PL model by adding item-specific upper asymptote parameters \(d_i\):
\[
P(Y_{ij}=1) = c_i + (d_i-c_i){\rm InvLogit}\{a_i(\theta_j-b_i)\}
,\ c_i < d_i < 1
\]
The \(d_i\) parameter can be viewed as an upper limit on the probability of correct response to the \(i\)th item. The probability of giving correct answers by subjects with very high ability can thus be no greater than \(d_i\).
We restrict the \(d_i\)'s to the (0.8,1) range and assign them a \({\rm Uniform}(0.8,1)\) prior. For other parameters, we use the same priors as in the 3PL model.
In the bayesmh specification of the model, the condition \(c_i < d_i\) is incorporated in the likelihood, and the condition \(d_i < 1\) is implied by the specified prior for the \(d_i\)'s. We initialize the \(d_i\)'s to 0.9. We use the notable option to suppress the long table output.
. set seed 14 . bayesmh y, likelihood(dbernoulli(({gues:}+({d:}-{gues:})* /// > invlogit({discrim:}*({subj:}-{diff:})))* /// > cond({gues:}<{d:},1,.))) /// > redefine(discrim:i.item) redefine(diff:i.item) /// > redefine(gues:i.item) redefine(d:i.item) redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim:i.item}, lognormal({mu_a}, {var_a})) /// > prior({diff:i.item}, normal({mu_b}, {var_b})) /// > prior({gues:i.item}, igamma(10, 1)) /// > prior({d:i.item}, uniform(0.8, 1)) /// > prior({mu_a} {mu_b}, normal(0, 0.1)) /// > prior({var_a} {var_b}, igamma(10, 1)) /// > block({mu_a mu_b var_a var_b}, split) /// > init({discrim:i.item} 1 {gues:i.item} 0.1 {d:i.item} 0.9) /// > exclude({subj:i.id}) burnin(5000) saving(sim4pls, replace) notable Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ binomial(<expr1>,1) Priors: {discrim:i.item} ~ lognormal({mu_a},{var_a}) (1) {diff:i.item} ~ normal({mu_b},{var_b}) (2) {gues:i.item} ~ igamma(10,1) (3) {d:i.item} ~ uniform(0.8,1) (4) {subj:i.id} ~ normal(0,1) (5) Hyperpriors: {mu_a mu_b} ~ normal(0,0.1) {var_a var_b} ~ igamma(10,1) Expression: expr1 : (xb_gues+(xb_d-xb_gues)*invlogit(xb_discrim*(xb_subj-xb_diff)))* con d(xb_gues<xb_d,1,.) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_discrim. (2) Parameters are elements of the linear form xb_diff. (3) Parameters are elements of the linear form xb_gues. (4) Parameters are elements of the linear form xb_d. (5) Parameters are elements of the linear form xb_subj. Bayesian Bernoulli model MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3639 Efficiency: min = .004825 avg = .0219 Log marginal likelihood = . max = .1203 Note: There is a high autocorrelation after 500 lags. file sim4pls.dta saved . estimates store est4pls
We use bayesstats summary to display results of selected model parameters.
. bayesstats summary {d:i.item} {mu_a var_a mu_b var_b} Posterior summary statistics MCMC sample size = 10,000 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- d | item | 1 | .9598183 .0255321 .001948 .9621874 .9044441 .9981723 2 | .9024564 .0565702 .007407 .9019505 .8066354 .9944216 3 | .9525519 .0281878 .002845 .9551054 .8972454 .9971564 4 | .8887963 .0561697 .005793 .8859503 .8036236 .9916784 5 | .8815547 .0588907 .007215 .8708021 .8031737 .9926549 6 | .8891188 .0586482 .006891 .881882 .8024593 .9935512 7 | .874271 .0561718 .008087 .8635082 .8018176 .9880433 8 | .9663644 .0147606 .001121 .9667563 .9370666 .9950912 9 | .889164 .0486038 .005524 .8834207 .8084921 .9857415 -------------+---------------------------------------------------------------- mu_a | .3336887 .1436216 .009742 .334092 .0562924 .6164115 var_a | .1221547 .0406908 .002376 .1144729 .0642768 .2229326 mu_b | -.0407488 .1958039 .005645 -.0398847 -.4220523 .3323791 var_b | .4991736 .1612246 .00629 .4660071 .2802531 .9023824 ------------------------------------------------------------------------------
The bayesmh command issued a note indicating high autocorrelation for some of the model parameters. This may be related to slower MCMC convergence or more substantial problems in the model specification. It is thus worthwhile to inspect the individual autocorrelation of the parameters. We can do so by using the bayesstats ess command. The parameters with lower estimated sample size (ESS) have higher autocorrelation and vice versa.
. bayesstats ess {d:i.item} {mu_a var_a mu_b var_b} Efficiency summaries MCMC sample size = 10,000 ---------------------------------------------------- | ESS Corr. time Efficiency -------------+-------------------------------------- d | item | 1 | 171.82 58.20 0.0172 2 | 58.33 171.43 0.0058 3 | 98.17 101.87 0.0098 4 | 94.02 106.36 0.0094 5 | 66.62 150.11 0.0067 6 | 72.44 138.05 0.0072 7 | 48.25 207.26 0.0048 8 | 173.30 57.70 0.0173 9 | 77.41 129.19 0.0077 -------------+-------------------------------------- mu_a | 217.35 46.01 0.0217 var_a | 293.34 34.09 0.0293 mu_b | 1203.20 8.31 0.1203 var_b | 656.92 15.22 0.0657 ----------------------------------------------------
We observe that the parameters with ESS lower than 200 are among the asymptote parameter’s \(d_i\)’s. This may be caused, for example, by overparameterization of the likelihood model and subsequent nonidentifiability, which is not resolved by the specified priors.
We can also fit a model with a common upper asymptote parameter, \(d\), and compare it with the model with the item-specific upper asymptote.
. set seed 14 . bayesmh y, likelihood(dbernoulli(({gues:}+({d}-{gues:})* /// > invlogit({discrim:}*({subj:}-{diff:})))* /// > cond({gues:}<{d},1,.))) /// > redefine(discrim:i.item) redefine(diff:i.item) /// > redefine(gues:i.item) redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim:i.item}, lognormal({mu_a}, {var_a})) /// > prior({diff:i.item}, normal({mu_b}, {var_b})) /// > prior({gues:i.item}, igamma(10, 1)) /// > prior({d}, uniform(0.8, 1)) /// > prior({mu_a} {mu_b}, normal(0, 0.1)) /// > prior({var_a} {var_b}, igamma(10, 1)) /// > block({mu_a mu_b var_a var_b d}, split) /// > init({discrim:i.item} 1 {gues:i.item} 0.1 {d} 0.9) /// > exclude({subj:i.id}) burnin(5000) saving(sim4pl, replace) notable Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ binomial(<expr>>,1) Priors: {discrim:i.item} ~ lognormal({mu_a},{var_a}) (1) {diff:i.item} ~ normal({mu_b},{var_b}) (2) {gues:i.item} ~ igamma(10,1) (3) {subj:i.id} ~ normal(0,1) (4) {d} ~ uniform(0.8,1) Hyperpriors: {mu_a mu_b} ~ normal(0,0.1) {var_a var_b} ~ igamma(10,1) Expression: expr1 : (xb_gues+({d}-xb_gues)*invlogit(xb_discrim*(xb_subj-xb_diff)))* cond (xb_gues<{d},1,.) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_discrim. (2) Parameters are elements of the linear form xb_diff. (3) Parameters are elements of the linear form xb_gues. (4) Parameters are elements of the linear form xb_subj. Bayesian Bernoulli model MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3877 Efficiency: min = .0107 avg = .03047 Log marginal likelihood = . max = .1626 file sim4pl.dta saved . estimates store est4pl . bayesstats summary {d mu_a var_a mu_b var_b} Posterior summary statistics MCMC sample size = 10,000 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- d | .9664578 .0144952 .001293 .9668207 .9371181 .9924572 mu_a | .2206696 .1387873 .01113 .2208302 -.0483587 .4952625 var_a | .1245785 .0391551 .001806 .1188779 .0658243 .2187058 mu_b | .0371722 .2020157 .00501 .0331742 -.3481366 .4336587 var_b | .5603447 .1761812 .006817 .5279243 .3157048 .9805077 ------------------------------------------------------------------------------
We now compare the two 4PL models by using the bayesstats ic command:
. bayesstats ic est4pls est4pl, diconly Deviance information criterion ------------------------ | DIC -------------+---------- est4pls | 8050.805 est4pl | 8037.075 ------------------------
The DIC of the more complex 4PL model (8,051) is substantially higher than the DIC of the simpler model (8,037). This and the potential nonidentifiability of the more complex est4pls model, indicated by high autocorrelation in the simulated MCMC sample, compel us to proceed with the model with a common upper asymptote, est4pl.
The posterior distribution of \(d\) has an estimated 95% equal-tailed credible interval of (0.93, 0.99) and is concentrated about 0.97. The \({\rm Uniform}(0.8,1)\) prior on \(d\) does not seem to be too restrictive. The estimated DIC of the est4pl model (8,037) is lower than the DIC of the est3pls 3PL model from the previous section (8,049), implying that the introduction of the upper asymptote parameter \(d\) does improve the model fit.
5PL model
The five-parameter logistic (5PL) model extends the 4PL model by adding item-specific asymmetry parameters \(e_i\):
\[
P(Y_{ij}=1) = c_i + (d_i-c_i){\rm InvLogit}\big[{\{a_i(\theta_j-b_i)\}}^{e_i}\big]
,\ c_i < d_i < 1, \ 0 < e_i < 1
\]
In the previous section, we found the 4PL model with common upper asymptote \(d\), est4pl, to be the best one so far. We thus consider here a 5PL model with common upper asymptote \(d\).
Typically, we expect the \(e_i\) parameters to be close to 1. Similarly to the upper asymptote parameter \(d\), the \(e_i\) parameters are assumed to be in the (0.8,1) range and are assigned \({\rm Uniform}(0.8,1)\) prior. We initialize the \(e_i\)s to 0.9. We again use the notable option to suppress the long table output, and we display a subset of results by using bayesstats summary. (We could have used bayesmh's noshow() option instead to achieve the same result.)
. set seed 14 . bayesmh y, likelihood(dbernoulli(({gues:}+({d}-{gues:})* /// > (invlogit({discrim:}*({subj:}-{diff:})))^{e:})* /// > cond({gues:}<{d},1,.))) /// > redefine(discrim:i.item) redefine(diff:i.item) /// > redefine(gues:i.item) redefine(e:i.item) redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim:i.item}, lognormal({mu_a}, {var_a})) /// > prior({diff:i.item}, normal({mu_b}, {var_b})) /// > prior({gues:i.item}, igamma(10, 1)) /// > prior({d}, uniform(0.8, 1)) /// > prior({e:i.item}, uniform(0.8, 1)) /// > prior({mu_a} {mu_b}, normal(0, 0.1)) /// > prior({var_a} {var_b}, igamma(10, 1)) /// > block({mu_a mu_b var_a var_b d}, split) /// > init({discrim:i.item} 1 {gues:i.item} 0.1 {d} {e:i.item} 0.9) /// > exclude({subj:i.id}) burnin(5000) saving(sim5pls, replace) notable Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ binomial(<expr1>,1) Priors: {discrim:i.item} ~ lognormal({mu_a},{var_a}) (1) {diff:i.item} ~ normal({mu_b},{var_b}) (2) {gues:i.item} ~ igamma(10,1) (3) {e:i.item} ~ uniform(0.8,1) (4) {subj:i.id} ~ normal(0,1) (5) {d} ~ uniform(0.8,1) Hyperpriors: {mu_a mu_b} ~ normal(0,0.1) {var_a var_b} ~ igamma(10,1) Expression: expr1 : (xb_gues+({d}-xb_gues)*(invlogit(xb_discrim*(xb_subj-xb_diff)))^xb_e )* cond(xb_gues<{d},1,.) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_discrim. (2) Parameters are elements of the linear form xb_diff. (3) Parameters are elements of the linear form xb_gues. (4) Parameters are elements of the linear form xb_e. (5) Parameters are elements of the linear form xb_subj. Bayesian Bernoulli model MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3708 Efficiency: min = .007341 avg = .02526 Log marginal likelihood = . max = .1517 file sim5pls.dta saved . estimates store est5pls . bayesstats summary {e:i.item} {d mu_a var_a mu_b var_b} Posterior summary statistics MCMC sample size = 10,000 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- e | item | 1 | .897859 .0578428 .006083 .8939272 .8050315 .9957951 2 | .9042669 .0585023 .005822 .90525 .8053789 .9956565 3 | .88993 .0562398 .005013 .887011 .803389 .9930454 4 | .9010241 .0574186 .006492 .9042044 .8030981 .9925598 5 | .9126369 .0545625 .00521 .9178927 .8098596 .9964487 6 | .9037269 .0583833 .006814 .9086704 .8054932 .9961268 7 | .9136308 .0558911 .005373 .9203899 .8112029 .996217 8 | .889775 .0568656 .005119 .8849938 .803912 .9938777 9 | .8808435 .056257 .004743 .8727194 .8030522 .9904972 -------------+---------------------------------------------------------------- d | .9671374 .0144004 .001165 .9670598 .9382404 .9933374 mu_a | .2770211 .1353777 .00832 .2782552 .0141125 .5418087 var_a | .122635 .0404159 .002148 .1160322 .0666951 .2208711 mu_b | .1211885 .1929743 .004955 .1199136 -.2515431 .503733 var_b | .5407642 .1747674 .006353 .5088269 .3016315 .9590086 ------------------------------------------------------------------------------
We also want to compare the above model with a simpler one using a common asymmetry parameter \(e\).
. set seed 14 . bayesmh y, likelihood(dbernoulli(({gues:}+({d}-{gues:})* /// > (invlogit({discrim:}*({subj:}-{diff:})))^{e})* /// > cond({gues:}<{d},1,.))) /// > redefine(discrim:i.item) redefine(diff:i.item) /// > redefine(gues:i.item) redefine(subj:i.id) /// > prior({subj:i.id}, normal(0, 1)) /// > prior({discrim:i.item}, lognormal({mu_a}, {var_a})) /// > prior({diff:i.item}, normal({mu_b}, {var_b})) /// > prior({gues:i.item}, igamma(10, 1)) /// > prior({d} {e}, uniform(0.8, 1)) /// > prior({mu_a} {mu_b}, normal(0, 0.1)) /// > prior({var_a} {var_b}, igamma(10, 1)) /// > block({mu_a mu_b var_a var_b d e}, split) /// > init({discrim:i.item} 1 {gues:i.item} 0.1 {d e} 0.9) /// > exclude({subj:i.id}) burnin(5000) saving(sim5pl, replace) notable Burn-in ... Simulation ... Model summary ------------------------------------------------------------------------------ Likelihood: y ~ binomial(<expr1>,1) Priors: {discrim:i.item} ~ lognormal({mu_a},{var_a}) (1) {diff:i.item} ~ normal({mu_b},{var_b}) (2) {gues:i.item} ~ igamma(10,1) (3) {subj:i.id} ~ normal(0,1) (4) {d e} ~ uniform(0.8,1) Hyperpriors: {mu_a mu_b} ~ normal(0,0.1) {var_a var_b} ~ igamma(10,1) Expression: expr1 : (xb_gues+({d}-xb_gues)*(invlogit(xb_discrim*(xb_subj-xb_diff)))^{e}) * cond(xb_gues<{d},1,.) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_discrim. (2) Parameters are elements of the linear form xb_diff. (3) Parameters are elements of the linear form xb_gues. (4) Parameters are elements of the linear form xb_subj. Bayesian Bernoulli model MCMC iterations = 15,000 Random-walk Metropolis-Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 7,200 Acceptance rate = .3805 Efficiency: min = .008179 avg = .02768 Log marginal likelihood = . max = .08904 file sim5pl.dta saved . estimates store est5pl . bayesstats summary {e d mu_a var_a mu_b var_b} Posterior summary statistics MCMC sample size = 10,000 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- e | .9118363 .0558178 .004194 .9175841 .8063153 .9960286 d | .9655166 .0147373 .001495 .9659029 .9354708 .9924492 mu_a | .2674271 .1368926 .008485 .270597 .0102798 .5443345 var_a | .1250759 .0428095 .002635 .1173619 .0654135 .2340525 mu_b | .1015121 .2048178 .006864 .103268 -.3052377 .4934158 var_b | .5677309 .1824591 .006981 .5331636 .3079868 1.016762 ------------------------------------------------------------------------------
We use bayesstats ic to compare the DIC values of the two 5PL models:
. bayesstats ic est5pls est5pl, diconly Deviance information criterion ------------------------ | DIC -------------+---------- est5pls | 8030.894 est5pl | 8034.517 ------------------------
The estimated DIC of the more complex est5pls model (8,031) is lower than the DIC of the simpler model (8,035), suggesting a better fit.
Conclusion
Finally, we compare all eight fitted models.
. bayesstats ic est1pl est2pl est3pl est3pls est4pl est4pls est5pl est5pls, /// > diconly Deviance information criterion ------------------------ | DIC -------------+---------- est1pl | 8122.428 est2pl | 8055.005 est3pl | 8049.426 est3pls | 8049.425 est4pl | 8037.075 est4pls | 8050.805 est5pl | 8034.517 est5pls | 8030.894 ------------------------
The est5pls model has the lowest overall DIC. To confirm this result, we run another set of simulations with a larger MCMC sample size of 50,000. (We simply added the mcmcsize(50000) option to the bayesmh specification of the above eight models.) The following DIC values, based on the larger MCMC sample size, are more reliably estimated.
. bayesstats ic est1pl est2pl est3pl est3pls est4pl est4pls est5pl est5pls, /// > diconly Deviance information criterion ------------------------ | DIC -------------+---------- est1pl | 8124.015 est2pl | 8052.068 est3pl | 8047.067 est3pls | 8047.738 est4pl | 8032.417 est4pls | 8049.712 est5pl | 8031.375 est5pls | 8031.905 ------------------------
Again, the 5PL models have the lowest DIC values and seem to provide the best fit. However, the DIC differences between models est4pl, est5pl, and est5pls are minimal and may very well be within the estimation error. Regardless, these three models appear to be better than the simpler 1PL, 2PL, and 3PL models.
Additional model checking may be needed to assess the models' fit, and we should not rely solely on the DIC values to make our final model selection. A practitioner may still prefer the simpler est4pl 4PL model to the 5PL models even though it has a slightly higher DIC. In fact, given that the posterior mean estimate of the upper asymptote parameter \(d\) is 0.96 with a 95% equal-tailed credible interval of (0.94, 0.99), some practitioners may prefer the even simpler est3pl 3PL model.
References
De Boeck, P., and M. Wilson, ed. 2004. Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach. New York: Springer.
Kim, J.-S., and D. M. Bolt. 2007. Estimating item response theory models using Markov chain Monte Carlo methods. Educational Measurement: Issues and Practice 26: 38-51.