Educational and Psychological Measurement, Volume 83, Issue 5, Page 953-983, October 2023.
Rating scale analysis techniques provide researchers with practical tools for examining the degree to which ordinal rating scales (e.g., Likert-type scales or performance assessment rating scales) function in psychometrically useful ways. When rating scales function as expected, researchers can interpret ratings in the intended direction (i.e., lower ratings mean “less” of a construct than higher ratings), distinguish between categories in the scale (i.e., each category reflects a unique level of the construct), and compare ratings across elements of the measurement instrument, such as individual items. Although researchers have used these techniques in a variety of contexts, studies are limited that systematically explore their sensitivity to problematic rating scale characteristics (i.e., “rating scale malfunctioning”). I used a real data analysis and a simulation study to systematically explore the sensitivity of rating scale analysis techniques based on two popular polytomous item response theory (IRT) models: the partial credit model (PCM) and the generalized partial credit model (GPCM). Overall, results indicated that both models provide valuable information about rating scale threshold ordering and precision that can help researchers understand how their rating scales are functioning and identify areas for further investigation or revision. However, there were some differences between models in their sensitivity to rating scale malfunctioning in certain conditions. Implications for research and practice are discussed.