You are on page 1of 36

Examining the untestable

assumptions of the chained linear


linking for the Livingston score
adjustment with application to the
2005 MSCE mathematics paper
2

By
Chifundo Steven Azizi
Outline of Presentation
• Background information
• Statement of the problem
• Purpose of the study
• Research questions
• Significance of study
• Theoretical framework
• Methodology
• Results and Discussion
• Conclusions and Implications

Sunday, August 23, 2009 2


Background
• At MSCE examination, mathematics has two
papers; paper 1 and paper 2, each carries 100
marks.
• Paper 2 has two sections, A and B – section A is
compulsory and section B has six optional
questions.
• The raw scores on optional items cannot be
compared directly because they do not indicate
the same level of knowledge and skill.

Sunday, August 23, 2009 3


Background (cont’d)
• Psychometricians, nevertheless, have tried to
find a post hoc solution to the incomparability of
optional items’ raw scores.
• Livingston (1988) developed a method for
adjusting scores of optional questions to take
away the differential in difficulty.
• Under this procedure, raw scores on optional
question i are transformed to the scale of
optional question j through scores on the
mandatory section (common portion) for the
examinees that answered question i.

Sunday, August 23, 2009 4


Statement of the Problem
• When MANEB’s examiners compile math items
on test forms assume that the selected items
have equal difficulty as evidenced by equal
allotment of 15 marks.
• However, literature have shown that optional
items are differentially difficult.
• This brings about unfairness in grading because
raw scores from optional questions are
incomparable and yet in practice they are
matched up.

Sunday, August 23, 2009 5


Statement of the Problem (cont’d)
• Livingston (1988) proposed a method of adjusting raw
scores of optional questions to achieve fairness in
grading examinees that take different items.
• The procedure uses rigorous assumptions of test
equating which say that two equating functions should
be identical regardless of which subpopulation has
attempted which question.
• In view of the incomparability of raw scores of choice
items, there was a need to examine the assumptions
using one of the MSCE mathematics paper 2 test forms
to ascertain if the procedure could work on it.

Sunday, August 23, 2009 6


Purpose of the study
• General objective
• To examine the assumptions of chain linear linking for
Livingston raw score adjustment method on optional
questions’ scores of MSCE mathematics paper 2.
• Specific objectives
• Distinguish item difficulty level of optional questions
using item difficulty indices of raw scores.
• Compare correlations between total scores of Section A
and scores of Section B.
• Establish whether linking functions of examinees that
chose a concerned optional question and for those that
selected a different choice question are group invariant.
Sunday, August 23, 2009 7
Research questions
• To what extent do optional questions differ
in difficulty?
• How are scores on optional questions’
portion (Section B) and total scores on
common portion (Section A) correlated?
• Are linking functions of examinees that
chose a concerned optional question and
for those that selected alternate question
group invariant?
Sunday, August 23, 2009 8
Significance of the study
• Fairness in educational measurement is of
paramount significance.
• Therefore the study provided a window of
opportunity for examining the possibility of
score equity of mathematics paper 2
optional questions.
• The study would also stimulate further
research by other researchers in the area
of score equity of different groups.
Sunday, August 23, 2009 9
Theoretical Framework
• The study was guided by theories of equating
procedures.
• Equating can be defined as converting scores of one test
form to a scale of scores of another test form in order to
directly compare them (Angoff, 1971).
• In particular, the study used chain linear design which is
under common item non-equivalent groups (non-
equivalent groups anchor test, NEAT) method.
• The chain linear equating is done by equalising
standardised deviation scores (z-scores) of two test
forms via common items’ z-scores (Livington, 1988;
Livingston, 2004).
• However, for the chain linear equating to produce
unbiased results, the two chained equating functions
should not depend on which population used (Braun &
Holland, 1982).
Sunday, August 23, 2009 10
Methodology
• The Design
• The study employed cross-sectional
quantitative approach.
• The study population was all Form 4
students in 2007.
• Systematic sampling was used to obtain
examinees from five purposively selected
secondary schools.
Sunday, August 23, 2009 11
Methodology
• The Design
• 2005 MSCE mathematics paper 2 was
administered to 247 examinees in two
parts, section A followed by section B.
• For section B, examinees were asked to
first indicate their choice of three optional
questions then were instructed to answer
all the questions.
Sunday, August 23, 2009 12
Methodology (cont’d)
• Analysis
• The first research question was answered using item
difficulty indices (p-values).
average mark obtained on a question
p − value =
maximum mark for the question
• A low p-value indicates difficult item whilst a high p-value
shows that an item is easy to achieve.
• Pearson product-moment correlation coefficient (r) was
computed in order to answer the second research
question.
• The differences between the linking functions were
summarised by using standardised Root Mean Square
Difference (RMSD) and standardised Root Expected
Mean Square Difference (REMSD) to answer the third
Sunday, August 23, 2009 13
research question.
Methodology (cont’d)
• Ethical considerations
• Permission was sought from relevant institutions’
authorities to carry out the research.
• Students were also informed on the aim of the study and
they were given choice whether to participate or not.
• Confidentiality and anonymity assurance were given to
participants.
• Validity and reliability
• The paper was developed by MANEB which uses blue
print in regulating content coverage and cognitive level
demands.

Sunday, August 23, 2009 14


Methodology (cont’d)
• Students’ responses were optimal performance because
they took the test three weeks prior to national exams.
• Delimitation
• The results would not be generalised to all secondary
schools in Malawi but would be related to other schools
with similar characteristics as the sampled ones.
• Limitations
• Participants attrition.
• Time and financial constraints to cover more schools.

Sunday, August 23, 2009 15


Results and Discussion
1.To what extent do optional questions differ?
Section A Section B
Item Max. Average p-value Item Max. Average p-value
mark mark mark mark
1 8 5.190 0.649 7 15 6.436 0.429
2 7 4.401 0.629 8 15 5.061 0.337
3 9 5.518 0.613 9 15 5.869 0.391
4 10 5.801 0.580 10 15 5.116 0.341
5 11 6.324 0.575 11 15 3.927 0.262
6 10 1.917 0.192 12 15 7.566 0.504

Table 1: P-values for questions in section A and section B 'without


choice'
Sunday, August 23, 2009 16
Results and Discussion (cont’d)
• 2. How are scores on section A and
section B correlated?
• The correlation coefficient (r) was 0.80 which
indicates high positive relationship according to
Hinkle, Wiersma & Jurs (1998).
• The coefficient of determination (r2) was 0.64
which means total score of section A explains
64% of the variability in the scores of section B.
• This means that the two sections were
measuring the same construct.
Sunday, August 23, 2009 17
Results and Discussion (cont’d)
• 3. Establishing group invariance on linking functions
• 3.1 Linking functions that largely vary at lower tale of score scale
60
Equated/Linked score on section

0.4
50
0.35
40 0.3
30 0.25
A

0.2
20
0.15
10 0.1
0 0.05
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0
Score on question 7 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Equating/Linking function for group that chose ques 7


Score on question 7
Equating/Linking function for group that chose ques 8
Standardised Difference That Matters Root Mean Square Difference
Equating/Linking function for the combined group

REMSD=0.249
Figure 1: Equated scores on section A from optional question 7 that largely vary at
lower tale of choice question scale
Sunday, August 23, 2009 18
Results and Discussion (cont’d)
3.1 Linking functions that largely vary at lower tale of score scale (cont’d)

Pair Subgroup that chose Subgroup that selected


concerned question other question
1 8 7
2 9 11
3 11 7
4 11 8
5 12 10

lTable 2: Pairs of subgroups that chose particular questions and other


questions which have the same trend as of figure 1
Sunday, August 23, 2009 19
Results and Discussion (cont’d)
• 3.2 Linking functions that largely vary at upper tale of score scale
80 0.6
70
60
Equated/Linked score

0.5
50
on section A

40 0.4
30
0.3
20
10
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0.1
Score on question 8
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Equating/Linking function for the group that chose ques 8
Equating/Linking function for the group that chose ques 11 Standardised Difference That Matters Root Mean Square Difference
Equating/Linking function for the combined group
REMSD= 0.234

Figure 2: Equated scores on section A from optional question 8 that


largely vary at higher tale of choice question scale
Sunday, August 23, 2009 20
Results and Discussion (cont’d)
• 3.2 Linking functions that largely vary at upper tale of score scale
(cont’d)
Pair Subgroup that chose Subgroup that selected other
concerned question question
1 9 7
2 9 8
3 9 10
4 10 7
5 10 9
6 10 11
7 12 7
8 12 8
9 12 9
10 12 11
Table 2: Pairs of subgroups that chose particular questions and other questions
which have the same trend as of figure 2
Sunday, August 23, 2009 21
Results and Discussion (cont’d)
• 3.3 Linking functions that largely vary at lower and first
upper tale of score scale
0.3
80
70 0.25
60
Equated/Linked score

50 0.2
on section A

40 0.15
30
20 0.1
10
0.05
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 11
Score on question 11
Equating/Linking function for the group that chose ques 11
Standardised DTM Root Mean Square Difference
Equating/Linking function for the group that chose ques 9
Equating/Linking function for the combined group
REMSD= 0.189
Figure 3: Equated scores on section A from optional question 11 that
largely vary at lower & first upper tale of choice question scale
Sunday, August 23, 2009 22
Results and Discussion (cont’d)
• 3.3 Linking functions that largely vary at lower and second upper
tale of score scale (cont’d)
Pair Subgroup that chose Subgroup that selected
concerned question other question
1 7 10
2 7 12
3 11 10
4 11 12

Table 3: Pairs of subgroups that chose particular questions


and other questions which have the same trend as of figure 3

Sunday, August 23, 2009 23


Results and Discussion (cont’d)
• 3.4 Linking functions that vary at both lower and upper
tales of score scale
60
0.16
50 0.14
Equated/Linked score

40 0.12
on section A

30 0.1
0.08
20
0.06
10
0.04
0 0.02
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0
Score on question 10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 10
Equating/Linking function for the group that chose ques 10
Standardised DTM Root Mean Square Difference
Equating/Linking function for the group that chose ques 8

REMSD= 0.092
Equating/Linking function for the combined group

Figure 4: Equated scores on section A from optional question 10


that largely vary at higher tale of choice question scale
Sunday, August 23, 2009 24
Results and Discussion (cont’d)
• 3.5 Linking functions that constantly vary across the
entire score scale
70
60 0.18
50
Equated/Linked score

0.16
on section A

40 0.14
30 0.12
0.1
20
0.08
10
0.06
0 0.04
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 0.02
0
Score on question 7
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Score on question 7
Equating/Linking function for the group that chose ques 7
Equating/Linking function for the group that chose ques 11 Standardised Differnce That Matters Root Mean Square Difference

Equating/Linking function for the combined group REMSD= 0.149


Figure 5: Equated scores on section A that vary constant across
the entire score scale of optional question 7
Sunday, August 23, 2009 25
Conclusions and Implication
• Conclusions
• The questions in optional section were
differentially difficult.
• Section A and section B measured the
same construct.
• Subgroups found the same optional
question differentially difficult, which
meant the linking functions were not group
independent.

Sunday, August 23, 2009 26


Conclusion and Implications
(cont’d)
• Implications
• Lack of group independence on this paper indicates that
the difficulty of concerned optional question is not
consistent across two subgroups.
• This means Livingston score adjustment methodology
would cause biased score adjustment.
• For example, a subgroup that performed poorly on a
certain optional question would have its equated scores
on section A substantially adjusted upwards, on the
other hand, the other subgroup that scored highly on the
same question would have its equated scores on section
A considerably adjusted downwards.
Sunday, August 23, 2009 27
Thank you

Sunday, August 23, 2009 28


Methodology (cont’d)
• Ethical considerations
• Permission was sought from relevant institutions’
authorities to carry out the research.
• Students were also informed on the aim of the study and
they were given choice whether to participate or not.
• Confidentiality and anonymity assurance were given to
participants.
• Validity and reliability
• The paper was developed by MANEB which uses blue
print in regulating content coverage and cognitive level
demands.

Sunday, August 23, 2009 29


Methodology (cont’d)
• Students’ responses were optimal performance because
they took the test three weeks prior to national exams.
• Delimitation
• The results would not be generalised to all secondary
schools in Malawi but would be related to other schools
with similar characteristics as the sampled ones.
• Limitations
• Participants attrition.
• Time and financial constraints to cover more schools.

Sunday, August 23, 2009 30


Conclusion and Implications
(cont’d)
• Recommendation
• The literature has shown that diversities in level of
cognitive demands of choice items contribute to score
inequity.
• Thus, to control item content dissimilarities based on
ability levels could be one solution to achieve score
equity.
• When constructing optional items, item setters could
analytically match the diversity proficiencies required of
topics by using MSCE mathematics performance level
descriptors.
Sunday, August 23, 2009 31
Formula for linking functions

σX
X i ( y i ) = µ X i1 + i1
( y i − µ i1 )
σ i1

σ X j1
X ij ( y i ) = µ X j1 +
σ ij1
(y i − µ ij1 )
where

X i ( y i ) denotes equating raw score ( y i ) to scale of X on exa min ee who answers


question i
y i = raw score on optional question i
µ X i1 = mean of X for exa min ees selecting question i
σ X = s tan dard deviation of X for exa min ees choo sin g question i
i1

σ i1 = s tan dard deviation of question i for exa min ees selecting it


µ i1 = mean of question i for exa min ees selecting question i
X ij ( y i ) denotes equating raw score ( y i ) to X on exa min ee who answers question j
µ ij1 = mean of question i for exa min ees choo sin g question j
σ ij1 = s tan dard deviation of question i for exa min ees selecting question j

Sunday, August 23, 2009 32


RMSD & REMSD Statistics

∑ w [ eq ( y ) − eq X ( y )]
H
2
h Xh

RMSD( y ) = h =1

σ ( X combined group )
eq X represents transformed scores on Y to the scale of X for the combined group,

eq X represents transformed scores on Y to the scale of X for subgroup h.


h

[eq ( y ) − eq X ( y )]
H max( y )

∑w ∑ υ
2
h yh Xh N h is the sample size for subgroup h,
REMSD =
h =1 min( y )

σ ( X combined group ) N is the total number of examinees


Nh
wh = is the weight for the subgroup h
N
N yh is the number of examinees for subgroup h with a particular score (y) on Y
N yh
υ yh =
is a weighting factor for subgroup h and score (y).
Nh
As it can be noted, RMSD is computed at each y-value and the contribution of each subgroup
is weighted by its proportional representation in the combined group.
REMSD is a doubly weighted statistics over υ yh and wh

Sunday, August 23, 2009 33


• Under this procedure, raw scores on optional
question i are transformed to the scale of
optional question j through scores on the
mandatory section (common portion) for the
examinees that answered question i.
• The methodology makes implicit assumptions of
group invariance when imputing scores using
chained linear equating (Allen, Holland &
Thayer, 1993).

Sunday, August 23, 2009 34


Livington score adjustment as
reviewed by Allen et al. (1993)
Livingston score adjustment, it is discovered that: In step 1, the linear equation for equating Yi to
the scale of X in Pi is
σ
X i ( yi ) = µ X + X i1
(y − µ i1 ) ( 1)
σ i1
i1 i

and the linear equation for equating X i to the scale of Y j in Pj is


σ
in the Y j ( x) = µ j1 + j1 ( x − µ X ) (2)
σX j1
j1

Where µ X and σ X are the mean and standard deviation of


j1 j1
X for examinees choosing
question j .

The essence of the word “chained” in the chained linear equating is the substitution of x in the Y j (x)
of equation (2) with X i ( yi ) in the equation (1), neglecting that two equating functions are for
different populations (Brennan, 2006). That is

Sunday, August 23, 2009 35


σ j1 σ σ j1
Y j ( X j ( yi )) = µ j1 + (µ X − µ X ) + X i1
( y i − µ i1 )
σX j1
i1 j1
σ X σ i1
j1

= Yij ( yi )
*
(3)

Sunday, August 23, 2009 36

You might also like