You are on page 1of 7

Guangzhou, China Modern Foreign Languages (Quarterly) Vol. 26 No.

1 January 2003: 69-75

Rasch Model Applied to Score Equating in the College English Test

ZHU Zheng-cai, YANG Hui-zhong, YANG Hao-ran

Of all the facets of Item Response Theory, the Rasch model for dichotomously scored items is
probably the most widely recognized by practitioners. In College English Test (CET) of China, this model
has been used for 15 years and a lot of data yielded about score equating. This paper presents a description
in detail of CET score equating process based on Rasch model and a pilot study with the CET test data.
For equating purposes, a comparative test is given to a randomly sampled group of candidates at
selected observation universities one week before each of the live CET test. The trial test paper is the one
used on the norm group (e.g., the 1987 paper). Insignificant variance between the performances of
candidates on the comparative test and performances of the norm group would indicate little change in
ability level; if, then, significant differences resulted from the live CET test, it would mean a change in the
difficulty level of the test. Since 1990, the Rasch model has been adopted to equate CET scores. As the
comparative test paper and the present test paper are administered on the same group of candidates, items
from both papers can be put on the same ability scale, making equating possible. In this process, an equating
model is built on a random sample of candidature before it is applied to the whole test population. It is
essential that the comparative test should be administered under quality control so that the data gathered are
also reliable. As the comparative test and the live test are administered on the same group of candidates,
with a time interval of only one week, time and proficiency level cannot be the main factors that affect the
test results. Besides, since items used on the CET are designed by well-trained specialists under thorough
quality control, and since they have been pre-tested and analyzed, with the overall difficulty level of the test
rigorously controlled, no significant differences should exist between the results of these two tests. The
above factors excluded, only two reasons could lead to significant variance between the scores the same
group of candidates gain from the comparative test and the scores they gain from the live test. One reason is
their attitude towards the tests (for example, they may take the live test seriously but treat the comparative
test as an exercise); another reason is possible exposure of the comparative test paper (even partial leakage
of the paper to the candidates would cause the data to become meaningless). In view of the above two
factors, two measures have been taken since 1994 to increase the degree of precision of the equating model.
First, before the comparative test is administered, an announcement is made to the candidates who take it
that their scores gained from both the comparative test and the live test will be valid and the higher score
from either test will be reported as their final test result. This is done to eradicate the influence of test attitude
on test results. The next measure goes that if a candidates score difference (Xd) between the comparative
and live tests is too big, then this candidate is considered unfit and his/her scores are not taken into
consideration in the equating procedure.
As a fixed-s equating design, CET score equating process appears to be very simple and reasonable
and the score equating results are satisfactory as well. In the final part, this paper also discusses a few
disadvantages of the score equating method.
Key words: Item Response Theory, Score equating, Rasch model

Correspondence: School of Foreign Languages, Shanghai Jiao Tong University, Shanghai


200030, P. R. China
2003 1 Jan. 2003
26 1 Modern Foreign Languages (Quarterly) Vol. 26 No.1




Rasch
Rasch

Rasch
[] H030 [] A [] 1003-6105200301-0069-7

1. Rasch

40 Rasch 1960
Rasch LogisticBirnbaum 1968 Rasch
1969
Wright & Panchapakesan 1969 Rasch BICAL
Rasch Rasch
Wrigh & Stone 1979 Rasch
i bj i j Pi i - bj
i - bj
=0 Pi =0i - bj>0 Pi 0.5i - bj<0Pi 0.5

i - bj ee=2.71828
expi - bj 0 +
expi - bj 1.1 01
Pi asch
exp( i b j )
= (1.1)
P i
1 + exp( i b j )
bj j i Pi i
asch
Birnbaum 1968 Rasch
Rasch

N n 0/ N n

x x
ij ij
= 0 / 1
exp[ xij ( i b j )]
1.2
P = ( = x | , b ) =
i ij i j
1 + exp( i b j )
Local independence assumptions


exp[ x ( b )]
N n

P ( = x | , b) = 1 + exp[( )] (1.3)
ij i j
i
b
i =1 j =1 i j

*
1992
Benjamin D.Wright Rasch
Rasch CET 71

Rasch Rasch
Ri

Ri = xij,
j =1
(1.4)

Ri

Conditional maximum likelihood estimation,CMLE


Specific objectivity

Incidental parameters

Stuctural parameters(Maximum likelihood estimationMLE)

Normal-Ogive Model, Lord 1952ogistic


Rasch
(CET4/6)
Weir 1998

1987 Rasch

(1) Rasch

(2) Rasch
Rasch CET
0.30.6 1
CET

1 Weir 1998
-: rb 90
0.0rb0.3 13 14. 4%
0.3rb0.6 74 82. 3%
0.6rb1.0 3 3. 3%

(3) Rasch k
CET

72

k k
CET

100
50 50 50 50

(4)

CET CET

1987 CET 10 865


20 20 50 30
20 Calibration 0

Joint maximum likelihood


estimationJMLE
CET

CET 9 60 120
600 1100

12


20
Weir
1998 2 Wright 1977
Rasch CET
7%





Rasch CET 73


x X Tx X
n

T x = Pi ( x) 0<Tx<nn X
i =1

Y Y Ty
n

T y = Pi ( y ) 0<Ty<nn Y
i =1

x Y Tx Ty
x Y

Rasch
n-1 X Y

Ri
ERi = T (T )

1~n-1
n

87 (
0.5 ) 87 2

2 4GSH1
A (N=634) B (N=672) C
1 1 1.5
2 2 3
3 3 4
4 4 5.5
5 5 6.5
6 6 7.5
7 7 9
8 8 10
9 9 11
10 10 12
11 11 13
12 12 14
13 13 15
14 14 15.5
15 15 16.5
16 16 17.5
17 17 18
18 18 19
19 19 19.5
20 20 20
74

() 4GSH1
(N=678) A (N=678)
A D D
1 1 26 27
2 2 27 28
3 3 28 29
4 4 29 30
5 5 30 31
6 6 31 32
7 7 32 33
8 8 33 34
9 9 34 35
10 10.5 35 36
11 11.5 36 37
12 12.5 37 38
13 13.5 38 39
14 14.5 39 40
15 15.5 40 41
16 16.5 41 41.5
17 17.5 42 42.5
18 18.5 43 43.5
19 19.5 44 44.5
20 20.5 45 45.5
21 21.5 46 46.5
22 23 47 47.5
23 24 48 48
24 25 49 49
25 26 50 50

ABAC AD 87
4GSH1 87 4GSH1 0.5
2 4GSH1 0.5 1.0
CET
(Weir 1998)

CET
(1)
(2)
Rasch

CET

(3)
87

87

Rasch CET 75

(4) Rasch

(5) Rasch
CET
Rasch
Woods 1994
(6)


Rasch CET
Rasch Rasch
(BILOG, LPCM-WIN, MULTILOG, PARSCALE, RASCAL,
RUMMFOLDss and RUMMFOLDpp, T-Rasch, WINMIRA 32, http://www.gamma.rug.nl internet
newsgroupRichard Perline, University of Chicago, Benjamin D. Wright, University of Chicago, Howard
Wainer, Bureau of Social Science Research http://www.rasch.org)
Rasch

Birbaum, A. 1968. Some latent trait models and their use in inferring an examinees ability [A]. In F. M. Lord and M. R. Novick (Eds.). Statistical
Theories of Mental Test [C], Reading, MA: Addison-Wesley397-472.
Bock, R. C. & M. Aitkin. 1981. Marginal maximum likelihood estimation of item parameters: An application of an EM algorithm [J].
Psychometrika, 46: 443-459.
Bock, R. D. and M. Lieberman. 1970. Fitting a response model for n dichotomously scored items [J]. Psychometrika, 35: 179-197.
Frank B. Baker. 1992. Item Response Theory: Parameter Estimation Techniques [M]. Marcel Dekker, Inc.
Haebara, T. 1980. Equating logistic ability scales by a weighted least square method [J]. Japanese Psychological Research, 22: 144-149.
Hambleton, R.K., Swaminathan, H., & H. J. Rogers. 1991. Fundamentals of item response theory [A]. Newbury Park [C]. CA: Sage.
Lord, F. M. 1952. A theory of test scores [J]. Psychometric Monograph.(No. 7).
Lord, F. M. , and M. R. Novick. 1968. Statistical Theories of Mental Test Scores [A]. Reading [C]. MA: Addison-Wesley.
Lord, B. H. and H. D. Hoover. 1980. Vertical equating using the Rasch model [J]. Journal of Educational Measurement, 18: 1-11.
Marco, G. L. 1977. Item characteristic curve solutions to three intractable testing problems [J]. Journal of Educational Measurement, 14: 139-160.
Mislevy, R. J., and R. D. Bock. 1992. BILOG: Maximum likelihood item analysis and test scoring with logistic models for binary items [M].
Chicago: International Educational Services.
Rasch, G. 1960. Probabilistic Models for Some Intelligence and Attainment Tests [M]. Copenhagen: Danish Institute for Educational Research.
Stocking, M. L., and F. M. Lord. 1983. Developing a common metric in item response theory [J]. Applied Psychological Measurement, 7: 201-210.
Wright, B. D. 1977. Solving measurement problems with the Rasch model [J]. Journal of Educational Measurement, 14: 97-226.
Wright, B. D., and G. A. Douglas. 1977. Conditional versus unconditional procedures for sample-free analysis [J]. Educational Psychological
Measurement, 37, 573-586.
Wright, B. D., and R. J. Mead. 1978. BICAL: Calibrating items and scales with the Rasch model [M]. Chicago: University of Chicago, Statistical
Laboratory.
Wright, B. D. ,and N. Panchapakesan. 1969. A procedure for sample-free item analysis [J]. Educational and Psychological Measurement.
Wright, B. D., and M. H. Stone. 1979. Best Design [M]. Chicago: MESA.
Woods,A. J. 1994. Report on a consultancy visit to China under the ELT project [R].
C. Weir1998 [M]
1992 [M]

200030 <zczhu@mail.sjtu.edu.cn>
200030 <hzyang@mail.sjtu.edu.cn>
200030 <hryang@mail.sjtu.edu.cn>