Lee, H. S., & Winke, P. (2013). The differences among three-, four-, and five-option-item formats in the context of a high-stakes English-language listening test. Language Testing, 30(1), 99-123.
摘要：We adapted three practice College Scholastic Ability Tests (CSAT) of English listening, each with five-option items, to create four- and three-option versions by asking 73 Korean speakers or learners of English to eliminate the least plausible options in two rounds. Two hundred and sixty-four Korean high school English-language learners formed three groups. Each took three of the nine tests, one with five-option items, one with four-, and one with three-, with administrations counterbalanced to control for order and practice effects. Mean test scores of the three-option tests were significantly higher than those of four- and five-option tests. While no difference was found in mean item discriminations across the three different test formats, reliability coefficients showed inconsistent patterns depending on the number of options and test versions. One possible interpretation of the low correlations among the scores of three test formats is that items with different numbers of options tap into skills other than listening. The findings suggest that statistically, three options may or may not be optimal depending on the point of view taken - from that of the test score users, or from that of the test stakeholders. Test developers must consider multiple statistical, affective, and contextual factors in determining the optimal number of options.