Professional Documents
Culture Documents
Main analyses
Supplementary analyses
Interrater reliability
Aim of co-judge procedure, to discern: Consistency within coder Consistency between coders
Interrater reliability
Categorical IV with 3 discreet scale-steps 9 ratings the same % exact agreement = 9/12 = .75
1
0 6 2 8
2
0 0 1 1
Sum
2 7 3 12
PO PE , 1 PE
Kappa: Positive values indicate how much the raters agree over and above chance alone Negative values indicate disagreement
PO ( 2 6 1) / 12 .75 PE [(2)(3) (7)(8) (3)(1)] / 122 .451 K .750 .451 .544 1 .451
Measure of Agreement Kappa N of Valid Cases a. Not assuming the null hypothesis.
3
0 1 7 8
Sum
5 10 10 25
2
0 1 6 3 10
3
0 0 1 7 8
4 0 0 0 0 0
Sum
0 5 10 10 25
K = .51
K = -.16
4
0 0 2 4 6
Sum
4 3 6 5 18
2 0 0 0 0 0
3
0 1 3 1 5
4
0 0 2 4 6
Sum
4 3 6 5 18
K = .47
KW 1
wi poi wi pei
Papers and macros available for estimating Kappa when unequal or misaligned rows and columns, or multiple raters: <http://www.stataxis.com/ab out_me.htm>
rater2
rater3
Average correlation r = (.873 + .879 + .866) / 3 = .873 Coders code in same direction!
Interrater reliability of continuous IV (3) Design 1 one-way random effects model when each
study is rater by a different pair of coders Design 2 two-way random effects model when a random pair of coders rate all studies Design 3 two-way mixed effects model ONE pair of coders rate all studies
Low Kappa but good AR when little variability across items, and coders agree
Interrater reliability in meta-analysis vs. in other contexts Meta-analysis: coding of independent variables How many co-judges? How many objects to co-judge? (sub-sample of
studies, versus sub-sample of codings) Use of Golden standard (i.e., one master-coder) Coder drift (cf. observer drift): are coders consistent over time? Your qualitative analysis is only as good as the quality of your categorisation of qualitative data