You are on page 1of 15

Natapon Kidrai 4436733 SCAL/M

SCLG 637 Testing and Evaluation


Project on Test Development
English Proficiency Test for M.2 Students
at Bangna Demonstration School

The test aims to measure language proficiency of students. This


test was administered by Anuchit Nasomboon, an English teacher
teaching at Bangna Demonstration School. This private school is located
in Bangna area. Thus most students are of rich family. Surprisingly, quite
a few students turn to go to this school. There are at least two classes each
for primary level. However, there is only one class each for secondary
level. Some of the students have foreign family: their parents come to
work in Thailand. This school is attempting to create its own teaching
curriculum for every subject. Mr. Nasomboon then tried to measure how
much his students know before beginning the lesson. The participants
were twenty-six Mathayom Two students. Time for taking this test was
fifty minutes.

Test objective

This test was given to students to measure background knowledge


of students. The test score analysis will be used to adjust curriculum for
English for Mathayom Two at Bangna Demonstration School. At the
same time, score of this test will be analyzed to see how and where
improvement is needed form each item.

Subjects
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
There were twenty-six Mathayom Two students attending the test.
All of the participants are in the same class. Among the participants there
were one student whose family immigrated from the United States, and
the other one from Japan. The test then sounded quite unequal for other
Thai students. However, the test score came out quite unexpectedly
dissatisfied.

Students Total Students Total


Narit 29 Nattaporn 13
Porntip 25 Chalrmachai 13
Maturot 19 Staporn 13
Wannisa 19 Prakorn 13
Pravee 19 Pawetre 12
Sorratat 19 Phornphan 11
Piyada 17 Witawat 11
Warunya 17 Tanasan 11
Sutthida 16 Kanok-karn 10
Wareewan 16 Julawat 9
Manecha 16 Teerapat 9
Utomphorn 16 Jinnaput 7
Wiliya 15 Chatchai 5
Mean = 14.6154 Standard Deviation = 14.6511
Table 1 Test Score

From the test score, I then make an analysis of the whole score into
individual item score per item number. Table 2 shows how many points
each student get from individual items. At the bottom of the table are
mean score and standard deviation of total score.
From the raw score, mean score, and standard deviation, I then turn
to analyze item facility of the test. Item Facility can be measured by
adding up the number of students who correctly answered a particular
item, and divide that sum by the total number of students who took the
test (James Brown, 1996: p. 65). The formula can be written like this:
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
IF = Ncorrect
Ntotal
where Ncorrect = number of students answering correctly
Ntotal = number of students taking the test
This formula can range from 0.00 to 1.00 for different items. Items left
blank are assumed incorrect answers. The IF score indicates difficulty or
easiness of each test item.
The IF value then gives another useful score for interpretation of
the item. The Item Discrimination score is the degree to which an item
separates the students who performed well from those who performed
poorly. The ID score helps the teachers to contrast the performance of the
upper-group students on the test with that of the lower-group students.
From both table 1 and table 2 you can see the discrimination into groups
of the students taking the test.
The ID score can be calculated by this formula:
ID = IFupper – IFlower
where ID means item discrimination for an individual item
IFupper = item facility for the upper group of the whole test
IFlower = item facility for the lower group on the whole test
Below are table 3 indicating IF score and ID score of individual
items.
IF score and ID score
Part I
Item Item Number
Statistics 1 2 3 4 5 6 7 8 9 10
IF total 0.69 0.65 0.65 0.58 0.62 0.58 0.50 0.46 0.62 0.62
IF upper 0.85 0.85 0.85 0.92 0.77 0.85 0.62 0.62 0.62 0.77
IF lower 0.54 0.46 0.46 0.23 0.46 0.31 0.38 0.31 0.62 0.46
ID 0.31 0.38 0.38 0.69 0.31 0.54 0.23 0.31 0.00 0.31

Part II
Item Item Number
Statistics 1 2 3 4 5 6 7 8 9 10
IF total 0.35 0.35 0.27 0.77 0.38 0.50 0.54 0.38 0.38 0.38
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
IF upper 0.46 0.46 0.46 0.77 0.54 0.62 0.69 0.46 0.46 0.62
IF lower 0.23 0.23 0.08 0.77 0.23 0.38 0.38 0.31 0.31 0.15
ID 0.23 0.23 0.38 0.00 0.31 0.23 0.31 0.15 0.15 0.46

Item Item Number


Statistics 11 12 13 14 15 16 17 18 19 20
IF total 0.69 0.38 0.58 0.31 0.50 0.46 0.27 0.35 0.31 0.38
IF upper 0.77 0.38 0.85 0.38 0.69 0.62 0.38 0.31 0.38 0.46
IF lower 0.62 0.38 0.31 0.23 0.31 0.31 0.15 0.38 0.23 0.31
ID 0.15 0.00 0.54 0.15 0.38 0.31 0.23 -0.08 0.15 0.15
Table 3 IF score and ID score of the whole test

Since this proficiency test is one of the Norm-referenced test


(NRT) type, ideal item should have IF value of 0.50 as average, and the
highest possible ID. It is considered acceptable for IF value between 0.30
and 0.70. Ebel (1979, p. 267) has suggested the following guidelines for
making decisions based on ID:
0.40 and up Very good items
0.30 to 0.39 Reasonably good but possibly subject to
improvement
0.20 to 0.29 Marginal items, usually needing and being
subject to improvement
Below 0.19 Poor items, to be rejected or improved by
revision
Considering IF and ID of the test, they bring to analyze distractor
efficiency. The goal of distractor efficiency analysis is to examine the
degree to which the distractors are attracting students who do not know
the correct answer. And also, it investigates the degree to which the
distractors are functioning efficiently. As mentioned above, IF value helps
to see which items need improvement or elimination. For example, one
item might be considered too easy when ID value is very low. But an easy
item is sometimes good to see that the students can get from the simplest
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
item to the harder one. The percentages of students who chose each
option are analyzed. Below shows table 4 Distractors Efficiency of the
test.
Distractor Efficiency
Part I

Item Options Notes


IF ID
Number Group + ing + ed
1 0.69 0.31 High 0.85* 0.15 Reasonable
Low 0.54* 0.46
2 0.65 0.38 High 0.85* 0.15 Reasonable
Low 0.46* 0.54
3 0.65 0.38 High 0.15 0.85* Reasonable
Low 0.54 0.46*
4 0.58 0.69 High 0.92* 0.08 Good
Low 0.23* 0.77
5 0.62 0.31 High 0.23 0.77* Reasonable
Low 0.54 0.46*
6 0.58 0.54 High 0.15 0.85* Good
Low 0.69 0.31*
7 0.5 0.23 High 0.31 0.69* Improvement
Low 0.62 0.38* Needed
8 0.46 0.31 High 0.62* 0.38 Reasonable
Low 0.31* 0.69
9 0.62 0 High 0.31 0.69* Rejected
Low 0.38 0.62*
10 0.62 0.31 High 0.85* 0.15 Reasonable
Low 0.46* 0.54
*correct option

Part II
Item Options Notes
IF ID
Number Group A. B. C. D.
1 0.35 0.23 High 0.00 0.38 0.08 0.46* Improvement
Low 0.31 0.31 0.15 0.23* Needed
2 0.35 0.23 High 0.23 0.23 0.46* 0.08 Improvement
Low 0.31 0.23 0.23* 0.23 Needed
3 0.27 0.38 High 0.46* 0.23 0.31 0.00 Reasonable
Low 0.08* 0.15 0.54 0.23
4 0.77 0 High 0.77* 0.08 0.08 0.08 Rejected
Low 0.77* 0.08 0.00 0.08
5 0.38 0.31 High 0.08 0.00 0.54* 0.15 Reasonable
Low 0.08 0.23 0.23* 0.46
6 0.5 0.24 High 0.31 0.62* 0.00 0.08 Improvement
Low 0.31 0.38* 0.23 0.08 Needed
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
7 0.54 0.31 High 0.31 0.00 0.15 0.69* Reasonable
Low 0.08 0.23 0.23 0.38*
8 0.38 0.15 High 0.23 0.15 0.15 0.46* Rejected
Low 0.15 0.31 0.15 0.31*
9 0.38 0.15 High 0.23 0.46* 0.08 0.23 Rejected
Low 0.38 0.31* 0.15 0.08
10 0.38 0.47 High 0.62* 0.15 0.00 0.23 Good
Low 0.15* 0.31 0.08 0.46
11 0.69 0.15 High 0.15 0.08 0.77* 0.00 Rejected
Low 0.08 0.08 0.62* 0.08
12 0.38 0 High 0.46 0.08 0.38* 0.00 Rejected
Low 0.23 0.31 0.38* 0.00
13 0.58 0.54 High 0.08 0.85* 0.00 0.08 Good
Low 0.38 0.31* 0.08 0.15
14 0.31 0.15 High 0.46 0.08 0.08 0.38* Rejected
Low 0.31 0.31 0.15 0.23*
15 0.5 0.38 High 0.00 0.00 0.31 0.69* Reasonable
Low 0.00 0.08 0.54 0.31*
16 0.46 0.31 High 0.23 0.08 0.62* 0.08 Reasonable
Low 0.31 0.31 0.31* 0.00
17 0.27 0.23 High 0.38* 0.15 0.31 0.08 Improvement
Low 0.15* 0.15 0.54 0.08 Needed
18 0.35 -0.07 High 0.31* 0.00 0.46 0.23 Rejected
Low 0.38* 0.23 0.15 0.15
19 0.31 0.15 High 0.15 0.38* 0.31 0.15 Rejected
Low 0.23 0.23* 0.38 0.08
20 0.38 0.15 High 0.15 0.23 0.46* 0.15 Rejected
Low 0.15 0.46 0.31* 0.08
*correct option
Table 4 Distractor Efficiency Analysis

As you could see, from the test part I it seems very good in

discriminating good students from poor students. But part II does not

seem so. Many of the item distractors work too well that even good

students could not answer correctly. For example, item 12 gets nearly

equal IF values for the correct answer and the other two distractors. Item

like this is good for distracting students who do not really know the

correct answer. On the other hand, it indicates the way the students had
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
been taught, or the retention of previous knowledge. Item 18 is a very bad

one that should be the first to eliminate because it does not discriminate

between the good students and poor students: poor students answered

more correctly than good students, which was unexpected. While item 4

and 12 are to be rejected as well, as the items cannot differentiate good

students from the whole class.

Therefore, the items to be rejected should then be replaced by new

items. The followings are substitutions for those items:

For Part I

9. You speak English very (good/well).

For Part II

4. She sang …………


A. beautiful B. beautifully C. beauty D. beautily
8. He can paint the fence ………
A. fastly B. fasten C. fastness D. fast
9. He is ……….. right.
A. quite B. quiet C. quitely D. quietly
11. He killed a cat ……….. yesterday.
A. accident B. accidental C. accidentally D. accidently
12. The employees were ………. afraid of their new boss.
A. terrifying B. terrified C. terrible D. terribly
14. They entered the room …………… because they were ……
…..
A. quiet, late B. quietly, late
C. quietly, lately D. quiet, lately
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
18. Our teacher explained things very ………… We all understand
him ……..
A. clear, perfect B. clearly, perfect
C. clear, perfectly D. clearly, perfectly
19. Please carry the glasses ………… They were very expensive.
A. careful B. carefully C. carely D. care
20. She speaks ……….. She has a ……….. voice.
A. soft, soft B. softly, soft
B. soft, softly D. softly, softly

Reliability of the test

The test would be good in discriminating students from each other.

But how much it is reliable? It is assumed that a test should give the same

results every time it measures, if it is used under the same conditions,

should measure what it is supposed to measure, and should be practical to

use. Because in every measurement instrument it inevitably has flaws that

cause inaccuracies. Then in a language test, there are various ways to

examine the reliability of the test depending on what type of the test is.

The English Proficiency test is of course an NRT test. The method

in measuring reliability of the test can be done by using Kuder-

Richardson Formula 20 (K-R20). The reason is that it avoids the problem

of underestimating the reliability of certain language test. Using its

formula to calculate, it can be shown as follow:


Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
K-R20 = k (1 - ∑IV)
k-1 St2
where K-R20 = Kuder-Richardson Formula 20
k = number of items
IV = item variance
St2 = variance for the whole test (that is, the standard deviation
of the test scores squared)
in calculating for the K-R20 value, there are many others variables
involved. Below is calculation of item variances.

Calculating Item Variances


Part I
Item number IF 1-IF IF(1-IF)
1 0.6923 0.3077 0.2130
2 0.6538 0.3462 0.2263
3 0.6538 0.3462 0.2263
4 0.5769 0.4231 0.2441
5 0.6154 0.3846 0.2367
6 0.5769 0.4231 0.2441
7 0.5000 0.5000 0.2500
8 0.4615 0.5385 0.2485
9 0.6154 0.3846 0.2367
10 0.6154 0.3846 0.2367

Part II
Item number FV 1-IF IF(1-IF)
1 0.0385 0.9615 0.0370
2 0.0769 0.9231 0.0710
3 0.1154 0.8846 0.1021
4 0.1538 0.8462 0.1302
5 0.1923 0.8077 0.1553
6 0.2308 0.7692 0.1775
7 0.2692 0.7308 0.1967
8 0.3077 0.6923 0.2130
9 0.3462 0.6538 0.2263
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
10 0.3846 0.6154 0.2367
11 0.4231 0.5769 0.2441
12 0.4615 0.5385 0.2485
13 0.5000 0.5000 0.2500
14 0.5385 0.4615 0.2485
15 0.5769 0.4231 0.2441
16 0.6154 0.3846 0.2367
17 0.6538 0.3462 0.2263
18 0.6923 0.3077 0.2130
19 0.7308 0.2692 0.1967
20 0.7692 0.2308 0.1775
Variance Total 6.9127
Table 5 Calculation of Item Variances

In addition to the content in Table 5, there are others values needed.

See from table 6 for the rest of the calculation.

Test Statistics
Mean 14.615
S 14.651
K-R20 0.029
Table 6 Test Statistics

The reliability of this test came out to be 0.029, which is quite low.

When re-administering the test, putting new items in places on items to

be eliminated, the reliability will, of course, change. In that case, the

participants will have to retake the test so that the consistency of the test-

takers remains the same. In other cases, this test was rated by only one

rater, so the question of inter-rater can be eliminated.

Conclusion
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
Achieving the English Proficiency Test for Mathayom Two

students at Bangna Demonstration School gave a wide range of result.

The implication from this range means curriculum and course design

development needed. The test, though, was done to prepare the students

for the next coming year, it indicates the areas of improvement needed.

Though there were many influential factors that can make the test result

changed or different, the overall score proved that the students need an

extensive course for preparing them to classroom. And by the result of the

test, the course designer should make a better plan in directing and

explaining for specific required skills. And for NRT test developers, it is

recommended to make a test as long as possible, well-designed and

carefully written, assess relatively homogeneous material, has items that

discriminate well, is normally distributed, and is administered to a group

of students whose abilities are as wide as logically possible within the

context (James Brown, 1996: p. 209)

The rationale why such items should be eliminated and why the

scores were not satisfactory was that my ideal concept that private school

provides better language classroom learning than governmental schools.

But this idea was proved fault when this test was accomplished. The

reasons behind this lay on the curriculum design and lesson planning. The

participants are also influential that they were not ready to take the test,
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
and their concentration was not at the test, as the test was taken nearly at

the end of the school day.

Reference

Brown, James Dean. (1996). Testing in language programs. New Jersey:


Prentice Hall Regents.
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
Appendix

Circle the appropriate words in the brackets to complete the sentences.


1. I think this film is (bored/boring)………..
2. I don’t find politics (interested/interesting) ………….
3. Walking makes me (tired/tiring) …………
4. This book is really (excited/exciting) ………..
5. Kate is doing her exams and is (worried/worrying) …………
6. Are you (interested/interesting) ………….. in basketball?
7. Dang always feels (bored/boring) ……………
8. Jan finds computers (confused/confusing) …………..
9. We were all feeling (tired/tiring) ………….
10. What an (excited/exciting) ………………. day.

Circle the appropriate items to complete the sentences.


1. He bought a(n) …………. From the antique shop.
A. rosewood old round table B. old rose wood round table
C. round old rosewood table C. old round rosewood table
2. It is a(n) ……………
A. horrifying old mysterious story B. horrifying mysterious old story
C. old horrifying mysterious story D. mysterious old horrifying story
3. His voice is …………….
A. loud B. aloud C. loudly D. aloudly
4. The lesson seems …………….
A. interesting B. interested C. interestingly D. interest
5. We arrived at the destination ……………
A. save B. safe C. safely D. safety
6. I am sure the soup tastes …………..
A. well B. good C. goodness D. goodly
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
7. The ……….. parents scolded the child for his …………. results.
A. disappointing, disappointing B. disappointed, disappointed
C. disappointing, disappointed D. disappointed, disappointing
8. Give him that ………..
A. yellow old leather case B. old leather yellow case
C. leather yellow old case D. old yellow leather case
9. The curry smells …….. but it doesn’t taste …………
A. well, deliciously B. good, delicious
C. good, deliciously D. well, delicious
10. I feel ………… when I think of my housework.
A. bad B. badly C. badness D. worse
11. We were already ………..
A. worry B. worrying C. worried D. worrily
12. The children are ……….. by the animals.
A. frightening B. frighten C. frightened D. frightingly
13. It was a very ………… journey.
A. tired B. tiring C. tiresome D. tireness
14. We were all very ………… in what he said.
A. interesting B. interest C. interestingly D. interested
15. Why do you look so …………. at school?
A. boringly B. boredom C. boring D. bored
16. It was a terribly ………… day.
A. excited B. excitement C. exciting D. excitedly
17. Didn’t you think it was an ………….. play?
A. amusing B. amusement C. amused D. amusingly
18. We had a ………….. trip home.
A. tiring B. tiredness C. tired D. tiredly
19. The last half hour was a …………. time.
Natapon Kidrai 4436733 SCAL/M
SCLG 637 Testing and Evaluation
Project on Test Development
A. worry B. worrying C. worried D. worrily
20. I’ve never been so ………… in my life.
A. frightening B. frighten C. frightened D. frighteningly

You might also like