The Accuracy of Bayesian Model Fit Indices in Selecting Among Multidimensional Item Response Theory Models

Educational and Psychological Measurement, Ahead of Print.
Item response theory (IRT) models are often compared with respect to predictive performance to determine the dimensionality of rating scale data. However, such model comparisons could be biased toward nested-dimensionality IRT models (e.g., the bifactor model) when comparing those models with non-nested-dimensionality IRT models (e.g., a unidimensional or a between-item-dimensionality model). The reason is that, compared with non-nested-dimensionality models, nested-dimensionality models could have a greater propensity to fit data that do not represent a specific dimensional structure. However, it is unclear as to what degree model comparison results are biased toward nested-dimensionality IRT models when the data represent specific dimensional structures and when Bayesian estimation and model comparison indices are used. We conducted a simulation study to add clarity to this issue. We examined the accuracy of four Bayesian predictive performance indices at differentiating among non-nested- and nested-dimensionality IRT models. The deviance information criterion (DIC), a commonly used index to compare Bayesian models, was extremely biased toward nested-dimensionality IRT models, favoring them even when non-nested-dimensionality models were the correct models. The Pareto-smoothed importance sampling approximation of the leave-one-out cross-validation was the least biased, with the Watanabe information criterion and the log-predicted marginal likelihood closely following. The findings demonstrate that nested-dimensionality IRT models are not automatically favored when the data represent specific dimensional structures as long as an appropriate predictive performance index is used.