Rasch模型在CET考试分数等值中的应用_朱正才，杨惠中

Of all the facets of Item Response Theory, the Rasch model for dichotomously scored items is

probably the most widely recognized by practitioners. In College English Test (CET) of China, this model

has been used for 15 years and a lot of data yielded about score equating. This paper presents a description

in detail of CET score equating process based on Rasch model and a pilot study with the CET test data.

For equating purposes, a comparative test is given to a randomly sampled group of candidates at

selected observation universities one week before each of the live CET test. The trial test paper is the one

used on the norm group (e.g., the 1987 paper). Insignificant variance between the performances of

candidates on the comparative test and performances of the norm group would indicate little change in

ability level; if, then, significant differences resulted from the live CET test, it would mean a change in the

difficulty level of the test. Since 1990, the Rasch model has been adopted to equate CET scores. As the

comparative test paper and the present test paper are administered on the same group of candidates, items

from both papers can be put on the same ability scale, making equating possible. In this process, an equating

model is built on a random sample of candidature before it is applied to the whole test population. It is

essential that the comparative test should be administered under quality control so that the data gathered are

also reliable. As the comparative test and the live test are administered on the same group of candidates,

with a time interval of only one week, time and proficiency level cannot be the main factors that affect the

test results. Besides, since items used on the CET are designed by well-trained specialists under thorough

quality control, and since they have been pre-tested and analyzed, with the overall difficulty level of the test

rigorously controlled, no significant differences should exist between the results of these two tests. The

above factors excluded, only two reasons could lead to significant variance between the scores the same

group of candidates gain from the comparative test and the scores they gain from the live test. One reason is

their attitude towards the tests (for example, they may take the live test seriously but treat the comparative

test as an exercise); another reason is possible exposure of the comparative test paper (even partial leakage

of the paper to the candidates would cause the data to become meaningless). In view of the above two

factors, two measures have been taken since 1994 to increase the degree of precision of the equating model.

First, before the comparative test is administered, an announcement is made to the candidates who take it

that their scores gained from both the comparative test and the live test will be valid and the higher score

from either test will be reported as their final test result. This is done to eradicate the influence of test attitude

on test results. The next measure goes that if a candidates score difference (Xd) between the comparative

and live tests is too big, then this candidate is considered unfit and his/her scores are not taken into

consideration in the equating procedure.

As a fixed-s equating design, CET score equating process appears to be very simple and reasonable

and the score equating results are satisfactory as well. In the final part, this paper also discusses a few

disadvantages of the score equating method.

Key words: Item Response Theory, Score equating, Rasch model

200030, P. R. China

2003 1 Jan. 2003

26 1 Modern Foreign Languages (Quarterly) Vol. 26 No.1

Rasch

Rasch

Rasch

[] H030 [] A [] 1003-6105200301-0069-7

1. Rasch

40 Rasch 1960

Rasch LogisticBirnbaum 1968 Rasch

1969

Wright & Panchapakesan 1969 Rasch BICAL

Rasch Rasch

Wrigh & Stone 1979 Rasch

i bj i j Pi i - bj

i - bj

=0 Pi =0i - bj>0 Pi 0.5i - bj<0Pi 0.5

i - bj ee=2.71828

expi - bj 0 +

expi - bj 1.1 01

Pi asch

exp( i b j )

= (1.1)

P i

1 + exp( i b j )

bj j i Pi i

asch

Birnbaum 1968 Rasch

Rasch

N n 0/ N n

x x

ij ij

= 0 / 1

exp[ xij ( i b j )]

1.2

P = ( = x | , b ) =

i ij i j

1 + exp( i b j )

Local independence assumptions

exp[ x ( b )]

N n

P ( = x | , b) = 1 + exp[( )] (1.3)

ij i j

i

b

i =1 j =1 i j

*

1992

Benjamin D.Wright Rasch

Rasch CET 71

Rasch Rasch

Ri

Ri = xij,

j =1

(1.4)

Ri

Specific objectivity

Incidental parameters

Rasch

(CET4/6)

Weir 1998

1987 Rasch

(1) Rasch

(2) Rasch

Rasch CET

0.30.6 1

CET

1 Weir 1998

-: rb 90

0.0rb0.3 13 14. 4%

0.3rb0.6 74 82. 3%

0.6rb1.0 3 3. 3%

(3) Rasch k

CET

72

k k

CET

100

50 50 50 50

(4)

CET CET

20 20 50 30

20 Calibration 0

estimationJMLE

CET

CET 9 60 120

600 1100

12

20

Weir

1998 2 Wright 1977

Rasch CET

7%

Rasch CET 73

x X Tx X

n

T x = Pi ( x) 0<Tx<nn X

i =1

Y Y Ty

n

T y = Pi ( y ) 0<Ty<nn Y

i =1

x Y Tx Ty

x Y

Rasch

n-1 X Y

Ri

ERi = T (T )

1~n-1

n

87 (

0.5 ) 87 2

2 4GSH1

A (N=634) B (N=672) C

1 1 1.5

2 2 3

3 3 4

4 4 5.5

5 5 6.5

6 6 7.5

7 7 9

8 8 10

9 9 11

10 10 12

11 11 13

12 12 14

13 13 15

14 14 15.5

15 15 16.5

16 16 17.5

17 17 18

18 18 19

19 19 19.5

20 20 20

74

() 4GSH1

(N=678) A (N=678)

A D D

1 1 26 27

2 2 27 28

3 3 28 29

4 4 29 30

5 5 30 31

6 6 31 32

7 7 32 33

8 8 33 34

9 9 34 35

10 10.5 35 36

11 11.5 36 37

12 12.5 37 38

13 13.5 38 39

14 14.5 39 40

15 15.5 40 41

16 16.5 41 41.5

17 17.5 42 42.5

18 18.5 43 43.5

19 19.5 44 44.5

20 20.5 45 45.5

21 21.5 46 46.5

22 23 47 47.5

23 24 48 48

24 25 49 49

25 26 50 50

ABAC AD 87

4GSH1 87 4GSH1 0.5

2 4GSH1 0.5 1.0

CET

(Weir 1998)

CET

(1)

(2)

Rasch

CET

(3)

87

87

Rasch CET 75

(4) Rasch

(5) Rasch

CET

Rasch

Woods 1994

(6)

Rasch CET

Rasch Rasch

(BILOG, LPCM-WIN, MULTILOG, PARSCALE, RASCAL,

RUMMFOLDss and RUMMFOLDpp, T-Rasch, WINMIRA 32, http://www.gamma.rug.nl internet

newsgroupRichard Perline, University of Chicago, Benjamin D. Wright, University of Chicago, Howard

Wainer, Bureau of Social Science Research http://www.rasch.org)

Rasch

Birbaum, A. 1968. Some latent trait models and their use in inferring an examinees ability [A]. In F. M. Lord and M. R. Novick (Eds.). Statistical

Theories of Mental Test [C], Reading, MA: Addison-Wesley397-472.

Bock, R. C. & M. Aitkin. 1981. Marginal maximum likelihood estimation of item parameters: An application of an EM algorithm [J].

Psychometrika, 46: 443-459.

Bock, R. D. and M. Lieberman. 1970. Fitting a response model for n dichotomously scored items [J]. Psychometrika, 35: 179-197.

Frank B. Baker. 1992. Item Response Theory: Parameter Estimation Techniques [M]. Marcel Dekker, Inc.

Haebara, T. 1980. Equating logistic ability scales by a weighted least square method [J]. Japanese Psychological Research, 22: 144-149.

Hambleton, R.K., Swaminathan, H., & H. J. Rogers. 1991. Fundamentals of item response theory [A]. Newbury Park [C]. CA: Sage.

Lord, F. M. 1952. A theory of test scores [J]. Psychometric Monograph.(No. 7).

Lord, F. M. , and M. R. Novick. 1968. Statistical Theories of Mental Test Scores [A]. Reading [C]. MA: Addison-Wesley.

Lord, B. H. and H. D. Hoover. 1980. Vertical equating using the Rasch model [J]. Journal of Educational Measurement, 18: 1-11.

Marco, G. L. 1977. Item characteristic curve solutions to three intractable testing problems [J]. Journal of Educational Measurement, 14: 139-160.

Mislevy, R. J., and R. D. Bock. 1992. BILOG: Maximum likelihood item analysis and test scoring with logistic models for binary items [M].

Chicago: International Educational Services.

Rasch, G. 1960. Probabilistic Models for Some Intelligence and Attainment Tests [M]. Copenhagen: Danish Institute for Educational Research.

Stocking, M. L., and F. M. Lord. 1983. Developing a common metric in item response theory [J]. Applied Psychological Measurement, 7: 201-210.

Wright, B. D. 1977. Solving measurement problems with the Rasch model [J]. Journal of Educational Measurement, 14: 97-226.

Wright, B. D., and G. A. Douglas. 1977. Conditional versus unconditional procedures for sample-free analysis [J]. Educational Psychological

Measurement, 37, 573-586.

Wright, B. D., and R. J. Mead. 1978. BICAL: Calibrating items and scales with the Rasch model [M]. Chicago: University of Chicago, Statistical

Laboratory.

Wright, B. D. ,and N. Panchapakesan. 1969. A procedure for sample-free item analysis [J]. Educational and Psychological Measurement.

Wright, B. D., and M. H. Stone. 1979. Best Design [M]. Chicago: MESA.

Woods,A. J. 1994. Report on a consultancy visit to China under the ELT project [R].

C. Weir1998 [M]

1992 [M]

200030 <zczhu@mail.sjtu.edu.cn>

200030 <hzyang@mail.sjtu.edu.cn>

200030 <hryang@mail.sjtu.edu.cn>

