You are on page 1of 2

Manuscript

Not for reproduction or citation without permission by the author

Algorithm for bootstrapping a distribution of 


Klaus Krippendorff
kkrippendorff@asc.upenn.edu
2006.08.16
Revised 2011.6.12

In the absence of a theoretically motivated distribution for , and especially because reliability
data may be small and have various metrics (levels of measurement), the distribution of  is
obtained by bootstrapping. It provides probabilities of the -values that can be expected when
very many similar samples of reliability data were coded. This bootstrapping algorithm randomly
draws a great number of samples from the cell contents of a matrix of observed coincidences,
obtains a hypothetical disagreement Do for each, which together with the original expected
disagreement De, gives rise to a probability distribution, p, of likely -values.

Given:
 The square matrix of observed coincidences ock, which gave rise to the .as calculated,
 
v v
including the total number n.. of values contributing to pair comparisons n..  c 1
o
k 1 ck
Do
 The expected disagreement De in the denominator of the observed   1 
De
 The applicable metric difference metric  ck2
 The number X of resamples to be drawn – chosen by the analyst.

The bootstrapping algorithm is defined in four steps:

First. Define the function   f ( R ) where


2
metric ck

R is a uniformly distributed random number between 0 and 1 within a continuum of


adequate precision. That continuum is segmented by the probabilities
ock
pck   
v v
; c 1 k 1
pck  1
n..
so that each segment pck of R is associated with its corresponding metric  ck2 :
g c h  k g v h v

 p gh pck  p
g c h  k
gh
g 1 h1
R =0 1=R
 2
metric 11 metric  ck2 metric  vv2
Second. Determine the number M of random draws with replacement from the data, capped by a
practical limit.
Let Q = the number of non-zero c-k coincidences, ock > 0,
M = min[25Q, (m–1)n../2]

Third. Bootstrap the distribution of :


Set the array n = 0; where –1    +1, and  has at least 4 significant digits.
Do X times – X is chosen by the analyst, by default X = 20,000
SUM = 0
Do M times
Pick a random number R between 0 and 1 (uniform distribution)
Determine metric  ck2 by means of the function f(R)
SUM <= SUM + metric  ck2
SUM
=1–
M  De
If  < –1.000, n= –1 <= n= –1 + 1
Otherwise: n <= n + 1

Forth. Correct the frequencies n for situations in which the lack of variation should cause 
to be indeterminate (  = 1 – 0/0 ):
nx = 0
If the matrix of coincidences contains exactly one non-zero diagonal cell: occ > 0:
nx = n=1 and n=1=0
If the matrix of coincidences contains two or more non-zero diagonal cells: occ > 0:
c v M

nx = X   cc 
o
and n=1 <= n=1 – nx
c 1  n.. 

n
The resulting distribution of  is expressed in terms of the probabilities p  .
X  nx
This distribution offers two important statistical properties of :
 The confidence interval for  at a chosen level p of statistical significance (two-tailed):
n p
 smallest  the smallest    
 X  nx 2
n  p
 largest  the largest     1  
 X  nx  2
smallest    largest

 The probability q that the reliability data fail to reach the smallest acceptable min:
n
q 
 
 X  nx
min

You might also like