Professional Documents
Culture Documents
by Ken Black
Discrete Distributions
Chapter 16
Chi-Square and
Other
Nonparametric
Statistics
1-1
Learning Objectives
Recognize the advantages and disadvantages of
nonparametric statistics.
Understand the 2 goodness-of-fit test and how to use it.
Analyze data using the 2 test of independence.
Understand how to use the runs test to test for
randomness.
Know when and how to use the Mann-Whitney U test,
the Wilcoxon matched-pairs signed rank test, the
Kruskal-Wallis test, and the Friedman test.
Learn when and how to measure correlation using
Spearmans rank correlation measurement.
1-2
1-3
Advantages
of Nonparametric Techniques
Sometimes there is no parametric alternative to
the use of nonparametric statistics.
Certain nonparametric test can be used to analyze
nominal data.
Certain nonparametric test can be used to analyze
ordinal data.
The computations on nonparametric statistics are
usually less complicated than those for parametric
statistics, particularly for small samples.
Probability statements obtained from most
nonparametric tests are exact probabilities.
Business Statistics: Contemporary Decision
1-4
Disadvantages
of Nonparametric Statistics
Nonparametric tests can be wasteful of data
if parametric tests are available for use with
the data.
Nonparametric tests are usually not as
widely available and well know as
parametric tests.
For large samples, the calculations for many
nonparametric statistics can be tedious.
Business Statistics: Contemporary Decision
1-5
2 Goodness-of-Fit Test
The2 goodness-of-fit test compares
expected (theoretical) frequencies
of categories from a population distribution
to the observed (actual) frequencies
from a distribution to determine whether
there is a difference between what was
expected and what was observed.
1-6
2 Goodness-of-Fit Test
2
f o f e
f
df = k - 1 - c
where :
k number of categories
c = number of parameters estimated from the sample data
1-7
Month
January
February
March
April
May
June
July
August
September
October
November
December
Gallons
1,610
1,585
1,649
1,590
1,540
1,397
1,410
1,350
1,495
1,564
1,602
1,655
18,447
1-8
2
.01,11
If
If
2
Cal
2
Cal
24.725
1-9
Calculations
for Demonstration Problem 16.1
Month
January
February
March
April
May
June
July
August
September
October
November
December
fo
fe
(fo - fe)2/fe
1,610 1,537.25
3.44
1,585 1,537.25
1.48
1,649 1,537.25
8.12
1,590 1,537.25
1.81
1,540 1,537.25
0.00
1,397 1,537.25
12.80
1,410 1,537.25
10.53
1,350 1,537.25
22.81
1,495 1,537.25
1.16
1,564 1,537.25
0.47
1,602 1,537.25
2.73
1,655 1,537.25
9.02
18,447 18,447.00
74.38
18447
f e 12
1537.25
2
Cal
74.37
1-10
0.01
24.725
2
Cal
1-11
Observed
Frequencies
7
1
2
18
25
17
12
1-12
2
.05, 4
If
If
2
Cal
2
Cal
9.488
1-13
Calculations
for Demonstration Problem 16.2:
Estimating the Mean Arrival Rate
Number of
Observed
Arrivals
Frequencies
X
f
0
7
1
18
2
25
3
17
4
12
5
5
fX
0
18
50
51
48
25
192
f X
Mean
Arrival
Rate
192
84
2.3 customers per minute
1-14
0.0838
7.04
Poisson
Poisson
Probabilities
Probabilities
for
for ==2.3
2.3
n f
84
1-15
2 Calculations
for Demonstration Problem 16.2
Number of Observed
Expected
Arrivals Frequencies Frequencies
X
f
nP(X)
0
7
8.42
1
18
19.37
2
25
22.28
3
17
17.08
4
12
9.82
5
5
7.04
84
84.00
(fo - fe)2
fe
2
Cal
1.74
0.24
0.10
0.33
0.00
0.48
0.59
1.74
1-16
0.05
9.488
2
Cal
174
. 9.488, do not reject Ho.
1-17
2
.05,1
3841
.
Ho: P =.10
Ha: P .10
If
If
2
Cal
2
Cal
1-18
Defects
Nondefects
n=
Defects
f
f
fe
20
180
200
n P
e
e
200 .10
20
Nondefects
f
f
e
e
f o f e
157180
43
20
20
2645
. 2.94
2939
.
180
n 1 P
200 .90
180
1-19
Using a 2
Goodness-ofFit Test
to Test a
Population
Proportion:
Conclusion
df = 1
0.05
Non Rejection
region
3.841
29
.
39
3841
.
,
reject
H
o.
Cal
1-20
2 Test of Independence
Used to analyze the frequencies of two
variables with multiple categories to
determine whether the two variables
are independent.
Qualitative Variables
Nominal Data
1-21
Type of financial
Investment
Contingency Table
E
A
Geographic B
C
Region
D
nE
nF
G
O13
nG
nA
nB
nC
nD
N
1-22
n
P A
n
P F
AF
n n
P A F
A
N P A F
n n
n
n
Type of Financial
Investment
Contingency Table
E
Geographic
Region
A
B
C
D
e12
nE
nF
nG
nA
nB
nC
nD
N
1-23
Expected
Frequencies
n
n
N
where : i = the row
j = the colum n
ni
nj
f o f e
Calculated
(Observed )
fe
where : df = (r - 1)(c - 1)
r = the numberr of rows
c = the numberr of columns
1-24
If
If
2
.01, 6
16.812
r=4
Income
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
c=3
Type of
Gasoline
Regular Premium
Extra
Premium
Ho : Type of gasoline is
independent of income
Ha : Type of gasoline is not
independent of income
1-25
Regular Premium
85
16
102
27
36
22
15
23
238
88
Extra
Premium
6
13
15
25
59
107
142
73
63
385
1-26
n
n
ij
N
107 238
e11 385
66.15
12
107 88
385
24 .46
e13
Type of
Gasoline
107 59
385
16.40
Income
Less than $30,000
$30,000 to $49,999
$50,000 to $99,000
At least $100,000
Regular Premium
(66.15)
(24.46)
85
16
(87.78)
(32.46)
102
27
(45.13)
(16.69)
36
22
(38.95)
(14.40)
15
23
238
88
Extra
Premium
(16.40)
6
(21.76)
13
(11.19)
15
(9.65)
25
59
107
142
73
63
385
1-27
f
88 66.15
16 24.46
27 32.46
66.15
102 87.78
87 .78
36 4513
.
45.13
70.78
38.95
24 .46
6 16.40
13 21.76
32 .46
15 1119
.
23 14.40
16.69
25 9.65
14.40
22 16.69
15 38.95
16.40
21.76
11.19
9.65
1-28
0.01
16.812
70
.
78
16
.
812
,
reject
H
o.
Cal
1-29
Runs Test
Test for randomness - is the order or sequence of
observations in a sample random or not
Each sample item possesses one of two possible
characteristics
Run - a succession of observations which possess
the same characteristic
Example with two runs: F, F, F, F, F, F, F, F, M,
M, M, M, M, M, M
Example with fifteen runs: F, M, F, M, F, M, F,
M, F, M, F, M, F, M, F
Business Statistics: Contemporary Decision
1-30
1-31
1
2
3 4 5
6 7 8 9 10 11 12
D CCCCC D CC D CCCC D C D CCC DDD CCC
R = 12
Since 7 R = 12 17, do not reject H0
1-32
distribution of R is
approximately normal.
2 n1 n2 (2 n1 n2 n1 n2)
(n1 n2)
(n1 n2 1)
R R
1-33
11
NNN
NNN
22
33 44 55 66
77
FF NNNNNNN
NNNNNNN FF NN
NN FF
FF NNNNNN
NNNNNN
88
99
FF NNNN
NNNN
12
13
12
13
FFFF
RR =
FFFF NNNNNNNNNNNN
NNNNNNNNNNNN
= 13
13
Business Statistics: Contemporary Decision
11
00
11
11
FF NNNNN
NNNNN
1-34
1
40 10
17
2 n1 n2 (2 n1 n2 n1 n2)
(n1 n2)
(n1 n2 1)
(4010)
(40 10 1)
2.213
R R
13 17
181
.
2.213
1-35
Mann-Whitney U Test
Nonparametric counterpart of the t test for
independent samples
Does not require normally distributed
populations
May be applied to ordinal data
Assumptions
Independent Samples
At Least Ordinal Data
1-36
Mann-Whitney U Test:
Sample Size Consideration
Size of sample one: n1
Size of sample two: n2
If both n1 and n2 are 10, the small sample
procedure is appropriate.
If either n1 or n2 is greater than 10, the
large sample procedure is appropriate.
1-37
Mann-Whitney U Test:
Small Sample Example
H0: The health service
population is identical to the
educational service
population on employee
compensation
Ha: The health service
population is not identical to
the educational service
population on employee
compensation
Health
Service
20.10
19.80
22.36
18.75
21.90
22.96
20.75
Educational
Service
26.19
23.88
25.50
21.64
24.85
25.30
24.12
23.45
1-38
Mann-Whitney U Test:
Small Sample Example
= .05
Compensation
18.75
19.80
0
20.10
20.75
21.64
21.90
22.36
22.96
23.45
23.88
24.12
24.85
25.30
25.50
26.19
W1 = 1 + 2 + 3 + 4 + 6 + 7 + 8
= 31
Rank
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
W2 = 5 + 9 + 10 + 11 + 12 + 13 + 14 + 15
= 89
Group
H
H
H
H
E
H
H
H
E
E
E
E
E
E
E
1-39
Mann-Whitney U Test:
Small Sample Example
U n n
1
W1
2
(7)(8)
(7)(8)
31
2
53
1
U n n
2
(n 1)
n
(n
n
1)
W 2
(8)(9)
(7)(8) n1 n2
89
2
3
1-40
Mann-Whitney U Test:
Formulas for Large Sample Case
U n1 n2
1
n
n
2
where : n1 number in group 1
n
W
number in group 2
n
n
2
n n n n
1
U U
12
values in group 1
1-41
Incomes of PBS
and Non-PBS Viewers
Ho: The incomes for PBS viewers
and non-PBS viewers are
identical
Ha: The incomes for PBS viewers
and non-PBS viewers are not
identical
.05
If Z 1.96 or Z 1.96, reject Ho
n1 = 14
n2 = 13
PBS
24,500
39,400
36,800
44,300
57,960
32,000
61,000
34,000
43,500
55,000
39,000
62,500
61,400
53,000
Non-PBS
41,000
32,500
33,000
21,000
40,500
32,400
16,000
21,500
39,500
27,600
43,500
51,900
27,800
1-42
1-43
4 7 11 12 13 14 18 19.5 22 23 24 25 26 27
2455
.
U n1 n2
n
n
W1
2
14 15
14 13
2455
.
2
415
.
1
1-44
n n
1
2
14 13
2
91
n n n n
1
U U
415
. 91
20.6
2.40
12
14 13 28
12
20.6
Cal
1-45
Wilcoxon Matched-Pairs
Signed Rank Test
A nonparametric alternative to the t test for
related samples
Before and After studies
Studies in which measures are taken on the
same person or object under different
conditions
Studies or twins or other relatives
Business Statistics: Contemporary Decision
1-46
Wilcoxon Matched-Pairs
Signed Rank Test
Differences of the scores of the two matched
samples
Differences are ranked, ignoring the sign
Ranks are given the sign of the difference
Positive ranks are summed
Negative ranks are summed
T is the smaller sum of ranks
1-47
1-48
Family
Pair
1
2
3
4
5
6
Pittsburgh
1,950
1,840
2,015
1,580
1,790
1,925
Oakland
1,760
1,870
1,810
1,660
1,340
1,765
1-49
Pittsburgh
1,950
1,840
2,015
1,580
1,790
1,925
Oakland
1,760
1,870
1,810
1,660
1,340
1,765
d
190
-30
205
-80
450
160
Rank
+4
-1
+5
-2
+6
+3
T = minimum(T+, T-)
T = 3 > Tcrit = 1, do not rejec
T+ = 4 + 5 + 6 + 3= 18
T- = 1 + 2 = 3
T = 3 Business Statistics: Contemporary Decision
1-50
n n 1
4
n n 1 2n 1
T
24
T
1-51
.05
If Z 1.96 or Z 1.96, reject Ho
d Rank
-2.5
-8
6.8
17
4.5
13
4.8
15
-5.3
-16
-1.6
-4
4.7
14
1.9
6.5
3.1
10
d Rank
-0.6
-1
-3.4 -11.5
2.6
9
-1.9 -6.5
-0.8
-2
-1.8
-5
1.1
3
3.4 11.5
1-52
17 13 15 14 6.5 10 9 3 115
.
99
8 16 4 1 115
. 6.5 2 5
54
T min imum(99,54)
54
1-53
n n 1
4
17 18
4
76.5
n n 1 2n 1
17 18 35
211
.
T
24
24
Z
T T
54 76.5
107
.
211
.
1-54
Kruskal-Wallis Test
A nonparametric alternative to one-way
analysis of variance
May used to analyze ordinal data
No assumed population shape
Assumes that the C groups are independent
Assumes random selection of individual items
1-55
Kruskal-Wallis K Statistic
12
K
n n 1
n
C
j 1
2
j
3 n 1
1-56
0.05
df C 1 3 1 2
5
.
991
.05, 2
Three or
Two
More
Partners Partners HMO
13
24
26
15
16
22
20
19
31
18
22
27
23
25
28
14
33
17
1-57
1-58
12
K
n n 1
n
C
3 n 1
j 1
12
18 18 1
29 52.5 89.5
5
3 18 1
12
1,897 3 18 1
18 18 1
9.56
2
.05, 2
5.991
1-59
Friedman Test
A nonparametric alternative to the randomized
block design
Assumptions
The blocks are independent.
There is no interaction between blocks and
treatments.
Observations within each block can be ranked.
Hypotheses
Ho: The treatment populations are equal
Ha: At least one treatment population
yields larger values than at least one
other treatment population
1-60
Friedman Test
2
r
C
12
2
3b(C 1)
R
j
bC (C 1) j 1
2
r
, with df = C - 1
1-61
Ha:
Supplier 1
62
63
61
62
64
Supplier 2
63
61
62
60
63
Supplier 3
57
59
56
57
58
Supplier 4
61
65
63
64
66
1-62
7
.
81473
.05, 3
If
2
r
1-63
Supplier 2
Supplier 3
Supplier 4
Monday
Tuesday
Wednesday
Thursday
Friday
14
13
18
196
169
25
324
R
R
2
j
j 1
1-64
2
r
C
12
2
3b(C 1)
R
j
bC (C 1) j 1
12
(714) 3(5)(4 1)
(5)(4)(4 1)
10.68
2
r
1-65
6 d
n 1
1-66