Professional Documents
Culture Documents
Test objective
Subjects
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
There were twenty-six Mathayom Two students attending the test.
All of the participants are in the same class. Among the participants there
were one student whose family immigrated from the United States, and
the other one from Japan. The test then sounded quite unequal for other
Thai students. However, the test score came out quite unexpectedly
dissatisfied.
From the test score, I then make an analysis of the whole score into
individual item score per item number. Table 2 shows how many points
each student get from individual items. At the bottom of the table are
mean score and standard deviation of total score.
From the raw score, mean score, and standard deviation, I then turn
to analyze item facility of the test. Item Facility can be measured by
adding up the number of students who correctly answered a particular
item, and divide that sum by the total number of students who took the
test (James Brown, 1996: p. 65). The formula can be written like this:
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
IF = Ncorrect
Ntotal
where Ncorrect = number of students answering correctly
Ntotal = number of students taking the test
This formula can range from 0.00 to 1.00 for different items. Items left
blank are assumed incorrect answers. The IF score indicates difficulty or
easiness of each test item.
The IF value then gives another useful score for interpretation of
the item. The Item Discrimination score is the degree to which an item
separates the students who performed well from those who performed
poorly. The ID score helps the teachers to contrast the performance of the
upper-group students on the test with that of the lower-group students.
From both table 1 and table 2 you can see the discrimination into groups
of the students taking the test.
The ID score can be calculated by this formula:
ID = IFupper – IFlower
where ID means item discrimination for an individual item
IFupper = item facility for the upper group of the whole test
IFlower = item facility for the lower group on the whole test
Below are table 3 indicating IF score and ID score of individual
items.
IF score and ID score
Part I
Item Item Number
Statistics 1 2 3 4 5 6 7 8 9 10
IF total 0.69 0.65 0.65 0.58 0.62 0.58 0.50 0.46 0.62 0.62
IF upper 0.85 0.85 0.85 0.92 0.77 0.85 0.62 0.62 0.62 0.77
IF lower 0.54 0.46 0.46 0.23 0.46 0.31 0.38 0.31 0.62 0.46
ID 0.31 0.38 0.38 0.69 0.31 0.54 0.23 0.31 0.00 0.31
Part II
Item Item Number
Statistics 1 2 3 4 5 6 7 8 9 10
IF total 0.35 0.35 0.27 0.77 0.38 0.50 0.54 0.38 0.38 0.38
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
IF upper 0.46 0.46 0.46 0.77 0.54 0.62 0.69 0.46 0.46 0.62
IF lower 0.23 0.23 0.08 0.77 0.23 0.38 0.38 0.31 0.31 0.15
ID 0.23 0.23 0.38 0.00 0.31 0.23 0.31 0.15 0.15 0.46
Part II
Item Options Notes
IF ID
Number Group A. B. C. D.
1 0.35 0.23 High 0.00 0.38 0.08 0.46* Improvement
Low 0.31 0.31 0.15 0.23* Needed
2 0.35 0.23 High 0.23 0.23 0.46* 0.08 Improvement
Low 0.31 0.23 0.23* 0.23 Needed
3 0.27 0.38 High 0.46* 0.23 0.31 0.00 Reasonable
Low 0.08* 0.15 0.54 0.23
4 0.77 0 High 0.77* 0.08 0.08 0.08 Rejected
Low 0.77* 0.08 0.00 0.08
5 0.38 0.31 High 0.08 0.00 0.54* 0.15 Reasonable
Low 0.08 0.23 0.23* 0.46
6 0.5 0.24 High 0.31 0.62* 0.00 0.08 Improvement
Low 0.31 0.38* 0.23 0.08 Needed
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
7 0.54 0.31 High 0.31 0.00 0.15 0.69* Reasonable
Low 0.08 0.23 0.23 0.38*
8 0.38 0.15 High 0.23 0.15 0.15 0.46* Rejected
Low 0.15 0.31 0.15 0.31*
9 0.38 0.15 High 0.23 0.46* 0.08 0.23 Rejected
Low 0.38 0.31* 0.15 0.08
10 0.38 0.47 High 0.62* 0.15 0.00 0.23 Good
Low 0.15* 0.31 0.08 0.46
11 0.69 0.15 High 0.15 0.08 0.77* 0.00 Rejected
Low 0.08 0.08 0.62* 0.08
12 0.38 0 High 0.46 0.08 0.38* 0.00 Rejected
Low 0.23 0.31 0.38* 0.00
13 0.58 0.54 High 0.08 0.85* 0.00 0.08 Good
Low 0.38 0.31* 0.08 0.15
14 0.31 0.15 High 0.46 0.08 0.08 0.38* Rejected
Low 0.31 0.31 0.15 0.23*
15 0.5 0.38 High 0.00 0.00 0.31 0.69* Reasonable
Low 0.00 0.08 0.54 0.31*
16 0.46 0.31 High 0.23 0.08 0.62* 0.08 Reasonable
Low 0.31 0.31 0.31* 0.00
17 0.27 0.23 High 0.38* 0.15 0.31 0.08 Improvement
Low 0.15* 0.15 0.54 0.08 Needed
18 0.35 -0.07 High 0.31* 0.00 0.46 0.23 Rejected
Low 0.38* 0.23 0.15 0.15
19 0.31 0.15 High 0.15 0.38* 0.31 0.15 Rejected
Low 0.23 0.23* 0.38 0.08
20 0.38 0.15 High 0.15 0.23 0.46* 0.15 Rejected
Low 0.15 0.46 0.31* 0.08
*correct option
Table 4 Distractor Efficiency Analysis
As you could see, from the test part I it seems very good in
discriminating good students from poor students. But part II does not
seem so. Many of the item distractors work too well that even good
students could not answer correctly. For example, item 12 gets nearly
equal IF values for the correct answer and the other two distractors. Item
like this is good for distracting students who do not really know the
correct answer. On the other hand, it indicates the way the students had
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
been taught, or the retention of previous knowledge. Item 18 is a very bad
one that should be the first to eliminate because it does not discriminate
between the good students and poor students: poor students answered
more correctly than good students, which was unexpected. While item 4
For Part I
For Part II
But how much it is reliable? It is assumed that a test should give the same
examine the reliability of the test depending on what type of the test is.
Part II
Item number FV 1-IF IF(1-IF)
1 0.0385 0.9615 0.0370
2 0.0769 0.9231 0.0710
3 0.1154 0.8846 0.1021
4 0.1538 0.8462 0.1302
5 0.1923 0.8077 0.1553
6 0.2308 0.7692 0.1775
7 0.2692 0.7308 0.1967
8 0.3077 0.6923 0.2130
9 0.3462 0.6538 0.2263
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
10 0.3846 0.6154 0.2367
11 0.4231 0.5769 0.2441
12 0.4615 0.5385 0.2485
13 0.5000 0.5000 0.2500
14 0.5385 0.4615 0.2485
15 0.5769 0.4231 0.2441
16 0.6154 0.3846 0.2367
17 0.6538 0.3462 0.2263
18 0.6923 0.3077 0.2130
19 0.7308 0.2692 0.1967
20 0.7692 0.2308 0.1775
Variance Total 6.9127
Table 5 Calculation of Item Variances
Test Statistics
Mean 14.615
S 14.651
K-R20 0.029
Table 6 Test Statistics
The reliability of this test came out to be 0.029, which is quite low.
participants will have to retake the test so that the consistency of the test-
takers remains the same. In other cases, this test was rated by only one
Conclusion
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
Achieving the English Proficiency Test for Mathayom Two
The implication from this range means curriculum and course design
development needed. The test, though, was done to prepare the students
for the next coming year, it indicates the areas of improvement needed.
Though there were many influential factors that can make the test result
changed or different, the overall score proved that the students need an
extensive course for preparing them to classroom. And by the result of the
test, the course designer should make a better plan in directing and
explaining for specific required skills. And for NRT test developers, it is
The rationale why such items should be eliminated and why the
scores were not satisfactory was that my ideal concept that private school
But this idea was proved fault when this test was accomplished. The
reasons behind this lay on the curriculum design and lesson planning. The
participants are also influential that they were not ready to take the test,
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
and their concentration was not at the test, as the test was taken nearly at
Reference