Professional Documents
Culture Documents
Rationale: the quality of items determines the quality of test (i.e., reliability & validity) May suggest ways of improving the measurement of a test Can help with understanding why certain tests predict some criteria but not others
Item Analysis
When
analyzing the test items, we have several questions about the performance of each item. Some of these questions include:
Are the items congruent with the test objectives? Are the items valid? Do they measure what they're supposed to measure? Are the items reliable? Do they measure consistently? How long does it take an examinee to complete each item? What items are most difficult to answer correctly? What items are easy? Are there any poor performing items that need to be discarded?
DISTRACTOR ANALYSIS
A. Multiple-Choke B. Multiply-Choice
C. Multiple-Choice
D. Multi-Choice
Distractor Analysis
First question of item analysis: How many people choose each response? If there is only one best response, then all other response options are distractors. Example from in-class assignment (N = 35):
Which method has the best internal consistency? a) projective test b) peer ratings c) forced choice d) differences n.s. # 1 1 21 12
This result indicates a potential problem with the question. This distractor may be too similar to the correct answer and/or there may be something in either the stem or the alternatives that is misleading.
14 3
= 4.7
extremely popular is likely to lower the reliability and validity of the test
It
is often difficult to explain or define difficulty in terms of some intrinsic characteristic of the item only common thread of difficult items is that individuals did not know the answer
The
Item Difficulty
What if p = .00
What if p = 1.00?
Item Difficulty
An item with a p value of .0 or 1.0 does not contribute to measuring individual differences and thus is certain to be useless When comparing 2 test scores, we are interested in who had the higher score or the differences in scores p value of .5 have most variation so seek items in this range and remove those with extreme values can also be examined to determine proportion answering in a particular way for items that dont have a correct answer
wanting to screen the very top group of applicants (i.e., admission to university or medical school).
easy item
ITEM DISCRIMINATION
... The extent to which an item
differentiates people on the behavior that the test is designed to assess.
the computed difference between the percentage of high achievers and the percentage of low achievers who got the item right.
the performance of upper group (with high test scores) and lower group (low test scores) on each item--% of test takers in each group who were correct
Divide sample into TOP half and BOTTOM half (or TOP and BOTTOM third)
Compute Discrimination Index (D)
Item Discrimination
D
=U-L
U = # in the upper group correct response Total # in upper group L = # in the lower group correct response Total # in lower group The higher the value of D, the more adequately the item discriminates (The highest value is 1.0)
Item Discrimination
seek
items with high positive numbers (those who do well on the test tend to get the item correct)
negative
numbers (lower scorers on test more likely to get item correct) and low positive numbers (about the same proportion of low and high scorers get the item correct) dont discriminate well and are discarded
Because the more each item correlates with the test as a whole, the higher all items correlate with each other
( = higher alpha, internal consistency)
correlation matrix displays the correlation of each item with every other item provides important information for increasing the tests internal consistency each item should be highly correlated with every other item measuring the same construct and not correlated with items measuring a different construct
that are not highly correlated with other items measuring the same construct can and should be dropped to increase internal consistency