You are on page 1of 120

Chapter 21

Nonparametric
Statistics

Nonparametric Statistics
This chapter deals with statistical techniques that
deal with ordinal data.
Recall: when the data are ordinal, the mean is not
an appropriate measure of central location.
Instead, we will test characteristics of populations
without referring to specific parameters, hence
the term nonparametric.
Rather than testing to determine whether the
population means differ, we will test to determine
whether the population locations differ

Nonparametric Statistics
The tests that we discussed so far
can be applied only when the data is
normal or approximately normal. If
the above condition is not satisfied
we can use nonparametric statistics
also known as distribution free
statistics.

Nonparametric Statistics
The techniques that we are going
to discuss can be used when the
data is interval and the required
condition of normality is
unsatisfied. In such circumstances
we will treat the interval data as if
they are ordinal.

Distribution of two populations when


their locations are same

Population Locations
The location of popn 1 is to the left of the location of
popn 2
population 1

population 2

The location of popn 1 is to the right of the location of


popn 2
population 2

population 1

Problem Objectives
When the problem objective is to compare two
populations the null hypothesis will state:
H0: The two population locations are the
same.
The alternative hypothesis can take on any one
of the following three forms:
H1: The location of population 1 is different
from the location of population 2
H1: The location of population 1 is to the
right of the location of population 2
H1: The location of population 1 is to the left
of the location of population 2

The Alternative Hypotheses


H1: The location of population 1 is
different from the location of
population 2
Used when we want to know
whether there is sufficient evidence to
infer that there is a difference between
the two populations.

The Alternative Hypotheses


H1: The location of population 1 is to
the right of the location of population 2
Used when we want to know whether
we can conclude that the random
variable in population 1 is larger in
general than the random variable in
population 2,

The Alternative Hypotheses


H1: The location of population 1 is
to the left of the location of
population 2
Used when we want to know
whether we can conclude that the
random variable in population 1 is
smaller in general than the random
variable in population 2.

NOTE:
All of our hypotheses are phrased in
terms of 1 then 2.
This is for consistency. Rather than
state:
H1: The location of population 2 is to
the left of the location of population 1,
we would want to phrase this as:
H1: The location of population 1 is to
the right of the location of population 2

Wilcoxon Rank Sum Test


The problem characteristics of this
test are:
The problem objective is to compare two
populations.
The data are either ordinal or interval (but
not normal).
The samples are independent.

Wilcoxon Rank Sum Test


Example
Example 21.1
Based on the two samples shown below,
can we infer at 5% significance level that the
location of population 1 is to the left of the
location of population 2?
Sample 1: 22, 23, 20; Sample 2: 18, 27, 26;
The hypotheses are:
H0: The two population locations are the same.
H1: The location of population 1 is to the left of the
location of population 2.

Graphical Demonstration
Why use the sum of ranks to test
locations?
If the locations of the two populations are about the same, (the null hypothesis is true)
we would expect the ranks to be evenly spread between the samples.
In this case the sum of ranks for the two samples will be close to one another.
Sum of ranks = 41

Sum of ranks = 37

Two hypothetical populations and their corresponding samples are


presented, the GREEN population and the PURPLE population.
Populations
Let us rank the observations of the two samples together
1

10

11

12

Graphical Demonstration
Why use the sum of ranks to test
locations?

Allow the GREEN population to shift


to the left of the PURPLE population.

Graphical Demonstration
Why use the sum of ranks to test
locations?
Sum of ranks = 41

Sum of ranks = 37

Sum of ranks = 40

Sum of ranks = 38

Sum of ranks = 33

Sum of ranks = 45

10

The green sample is expected to shift to the left too.


As a result,
several observations exchange location.
At
ten happens
What
Click.
n
tio
on to the sum oftioranks?
i
t
n en
ten
t
t
t
A
A

11

12

Graphical Demonstration
Why use the sum of ranks to test
locations?
Sum of ranks = 41

Sum of ranks = 37

Sum of ranks = 40

Sum of ranks = 38

Sum of ranks = 33

Sum of ranks = 45

10

11

12

The green sum decreases , and the purple sum increases.


Changing the relative location of two populations affect the
sum of ranks of the two samples combined.

Wilcoxon Rank Sum Test Example


Example 21.1 continued
Test statistic
1. Rank all the six observations (1 for the
smallest).

Sample 1
22
23
20

Rank
3
4
2

Sample 2
18
27
26

Rank
1
6
5

2. Calculate the
2. Calculate the
sum of ranks: 9
sum of ranks:12
3. Let T = 9 be the test statistic (We arbitrarily define the test
statistic as the rank sum of sample 1.)

Sampling Distribution of the Test Statistic


A small value of T indicates most of the smaller
observations are in sample 1 which was drawn from
population 1 but how small is small? is 9 small
enough?
We have our test statistic, T=9. We need to compare it
to some critical value of T to know if were in the
rejection region for H0 (or not).
So, what then, does the sampling distribution of
ranks look like?

Sampling Distribution of the Test Statistic


We can build up the sampling distribution of
the test statistic in much the same way we
built histograms for the outcomes of rolls of 2
and 3 dice
1. Enumerate all possible combinations of ranks
2. Calculate ranks sums for the combinations
3. The probability of any rank sum is the number
of occurrences divided by the total number of
combinations

ENUMERATE

CALCULATE
PROBABILITIES

Table 21.2

Sampling
Distribution of
T with Two
Samples of
Size 3

Sampling Distribution of the Test Statistic


1.Enumerate 2. Calculate 3. Probabilities
1c
om
bin
ati
2co
on
mb
ina
ti o
n

3 co

ion
t
a
n
m bi

Total of
20 combinations

INTERPRET

Example 21.1

H0 is rejected if TSince T = 9,
there is insufficient evidence to
conclude that population 1 is
located to the left of population 2,
at the 5% significance level.

Sampling Distribution of T with Two


Samples of Size 3

P(T6) = 1/20 = .05 Since


hus our critical value of T is 6

T=9 <
X TCritical=6, we cannot
reject H0

Critical Values of the


Wilcoxon Rank Sum Test

Table 21.3a

Critical values of the


Wilcoxon Rank Sum Test
= .025 for two tail test, or = .05 for one tail test
T L T U T L T U TL T U

TL T U

11 25

For a two tail test: P(T<11) = P(T>25) = .025 if n 1=4 and n2=4.
For a one tail test: P(T<11) = P(T>25) = .05 if n 1=4 and n2=4.

Using the table: For given two samples of sizes n 1 and n2, P(T<TL)=P(T>TU)=

A similar table exists for = .05 (one tail test) and = .10 (two tail test)

Table 21.3b

Critical Values of the Wilcoxon Rank Sum


Test

Critical Values: Wilcoxon Rank Sum Test


For sample sizes smaller than 10 observations
(in each sample), refer to the Critical Values in
Table 9 (Appendix B)
For sample sizes larger than 10, the test
statistic is approximately normally distributed
with:
Mean:
Hence:
Standard Deviation:
ni=sizeofsamplei,i=1,2

Wilcoxon rank sum test for


samples where n > 10
The test statistic is approximately normally
distributed with the following parameters:
n1(n1 + n2 + 1)
2
n1n2 (n1 n2 1)
T
12
E(T) =

Therefore,
T - E(T)
Z=
T

Example 21.2
A drug company is trialing a new painkiller. 30 people
were selected at random, half were given the new drug,
half given aspirin, and all were told to rate the
effectiveness on a five point scale (hence ordinal data):
5 = The drug was extremely effective.
4 = The drug was quite effective.
3 = The drug was somewhat effective.
2 = The drug was slightly effective.
1 = The drug was not at all effective.

Example 21.2

IDENTIFY

The data were recorded. Can we conclude (at 5%


significance) that the new painkiller is perceived to be
more effective?

New painkiller: 3, 5, 4, 3, 2, 5, 1, 4, 5, 3, 3, 5, 5, 5, 4
Aspirin:
4, 1,to3,note
2, 4,here
1, 3,
4,5
2, 2,
4, 3, 4,score,
5 so
Its important
that
is a2,good
if the drug is effective, wed likely see its location
greater than the location of aspirin users, hence:
H1: The location of population 1 is to the right of the
location of population 2, and so:
H0: The two population locations are the
same.

Example 21.2

IDENTIFY

The data looks like:


These three ones would occupy
ranks 1, 2, & 3 we average
them to ( 1 + 2 + 3)/3 = 2

These five twos would occupy


ranks 4,5,6,7, & 8 again,
average them to (4+5+6+7+8)/5
=6
and so on and so forth

Example 21.2

IDENTIFY

New Painkiller

Rank

Aspirin

Rank

12

19.5

27

19.5

12

12

19.5

27

12

19.5

19.5

27

12

12

27

19.5

27

12

27

19.5

19.5

27

Rank Total T1 =

276.5

Rank Total T2 =

188.5

Example 21.2

COMPUTE

The rank sum for the new painkiller is T1=276.5,


and the rank sum for aspirin: T2=188.5
Set T= T1=276.5, and begin calculating

Example 21.2

COMPUTE

T - E(T)
276.5 232.5
Z=
=
= 1.83
T
24.1
The p-value of the test is:
p-value = P(Z > 1.83) = .5 - .4664
= .0336
(or Z=1.83 > ZCritical=1.645)

Example 21.2

INTERPRET

Since Z = 1.83 > Zcritical =1.645


There is sufficient evidence to infer
that the new painkiller is perceived
to be more effective than aspirin

Wilcoxon rank sum test for nonnormal interval data, Example


Retaining Workers
The human resource manager of a large
company wanted to compare how long business
and non-business graduates worked for the
company before quitting.
Two samples of 25 business graduates and 20
non-business graduates were randomly selected.
The data representing their time with the
company were recorded.

Duration of Employment (Months)


Business graduates
60 11 18 19 5 25 60 7 8 17 37 4 8
28 27 11 60 25 5 13 22 11 17 9 4

Nonbusiness graduates
25 60 22 24 23 36 39 15 35 16
28 9 60 29 16 22 60 17 60 32

Wilcoxon rank sum test for nonnormal interval data, Example


Retaining workers - continued
Business Non-Bus
60
25
11
60
18
22
19
24
5
23
25
36
.
.
.
.
.
.

Can the personnel manager


conclude at 5% significance
level that a difference in
duration of employment exists
between business and nonbusiness graduates?

Wilcoxon rank sum test for nonnormal interval data, Example


Solution
The problem objective is to compare two
populations of interval data.
The samples are independent.
The non-normality of the two populations
is apparent from the sample histograms:
Non Business graduates

Business graduates

Wilcoxon rank sum test for nonnormal interval data, Example


Solution continued
The Wilcoxon rank test is the correct
procedure to run.
H0: The two population locations are
the same
H1: The location of population
1(business graduates) is
different from the location of
population 2 (non-business graduates).

Business
60
11
18
19
5
25
60
7
8
17
37
4
8
28
27
11
60
25
5
13
22
11
17
9
4
T1
=

Rank
42
11
20
21
3.5
28
42
5
6.5
18
37
1.5
6.5
31.5
30
11
42
28
3.5
13
23
11
18
8.5
1.5
463

Nonbusiness
25
60
22
24
23
36
39.
15
35
16
28
9
60
29
16
22
60
17
60
32
T2
=

Rank
28
42
23
26
25
36
38
14
35
15.5
31.5
8.5
42
33
15.5
23
42
18
42
34
572

Wilcoxon rank sum test for nonnormal interval data, Example


Solution continued
Solving by hand
The rejection region is

z z / 2 z .025 1.96

After the ranking process is completed,


we have:
T = Tbusiness graduates = 463.
E(T) = n1(n1+n2+1)/2=575;
T=[n1n2(n1+n2+1)/12]1/2=43.8

T E(T) 463 575


z

2.56 Reject the null hypothesis


T
43.8

Wilcoxon rank sum test for


INTERPRET
non-normal interval data, Example
The rejection region is

z z / 2 z .025 1.96
2.56 > 1.96
There is strong evidence to infer that
the duration of employment is different
for business and non-business
graduates. The data can not tell
us the reason.

Required Conditions
The Wilcoxon rank sum test actually tests to
determine whether the population distributions
are identical. This means that it tests not only
for identical locations, but for identical spreads
(variances) and shapes (distributions) as well.
The rejection of the null hypothesis may be due
instead to a difference in distribution shapes
and/or spreads.
To avoid this problem, we will require that the
two probability distributions be identical
except with respect to location.

Identifying Factors
Factors that identify the Wilcoxon Rank
Sum

Sign Test and Wilcoxon Signed Rank Sum Test


(Tests for Matched Pairs Experiments)
We will now look at two nonparametric
techniques (Sign Test and Wilcoxon Signed Rank
Sum Test) that test hypotheses in problems with
the following characteristics:
We want to compare two populations,
The data are either ordinal or interval
(nonnormal),
and the samples are matched pairs.
As before, well compute matched pair
differences and work from there

The Sign Test


We can use the Sign Test when were dealing with
two populations of ordinal data in a matched pairs
experiment.
For each matched pair, take the differences and
count up the number of positive differences and
negative differences.
If population locations are the same (say), wed
expect the number of positives and negatives to
net out to zero. If we have more positives than
negatives (or vice versa) what can we learn?
Again, how many is enough to make a difference?

Sign Test
We can think of the sign test in
terms of a binomial experiment,
getting a positive sign is like flipping
heads on a coin. We use this notion
along with previously developed
statistics to come up with our
standardized test statistic (assuming
the null hypothesis is true):

Test Statistics and Sampling


Distribution
When x is binomially distributed and
that, for sufficiently large n, x is
approximately normally distributed
with mean = np and standard
deviation
np ( 1- p ) . The
standardized test statistics is
Z=

x - np
np ( 1- p )

Test Statistics and Sampling Distribution


The null hypothesis is:
H0 = the two population locations are the
same
is equivalent to:
H0: p = .5 (i.e. equal proportions of +s & -s)
Therefore the test statistics becomes
x - .5n
x
np
z=
= z=
.5 n
np ( 1- p )

Test Statistics and Sampling


Distribution
The normal approximation of binomial
is valid when np 5 and
n ( 1 p ) 5 when p = .5
np = n (.5) 5 and
n( 1- p ) = n ( 1 - .5) = n(.5) 5
Implies that n must be greater than 10.
This is one of the required conditions
for sign test.

Sign Test Hypotheses


Since our null hypothesis is:
H0: the two population locations are the
same
(i.e. p=.5)
Our research hypothesis must be:
H1: the two population locations are different
which is the same as:
H1: p .5

Example 21.3
25 people were asked to ride in a European car
(and rate the ride) then ride in a North American car
(and again, rate the ride). The ratings were ordinal,
from 1 very uncomfortable to 5 very
comfortable, and its a matched pairs experiment
since the same rider tried both cars. [Xm21-03.xls]
Can we conclude (at 5% significance) that the
European car is perceived to be more comfortable
than the North American car?

Example 21.3
Comfort Ratings
Respondent

E. Car

N.A. Car

Comfort Ratings

Differen.

-1

Respondent

E. Car

N.A. Car

Differen.

-1

13

14

-1

15

16

17

18

-2
-2

19

20

21

10

22

11

23

-1
-1

12

24

25

5 negatives
18 positives
2 same rating

-1

-1-1

Example 21.3
The data was analyzed

COMPUTE

We had 5 negative
responses.
We had 25 pairs of data
initially, two pairs gave
identical ratings (i.e.
delta = zero) so these
data points are
dropped, hence n=23
We had 18 positive
responses, thus x=18

Example 21.3

INTERPRET

The p-value is P(Z > 2.71) =0.5 - .4966 = .0034,


hence we reject H0 in favor of H1, and conclude:
H1: the two population locations are different
Or, in the context of this problem
There is relatively strong evidence to indicate that
people perceive the European car to provide a more
comfortable ride than the North American car.

SPSS Output
Ranks
N
european - american

Negative Ranks
Positive Ranks
Ties
Total

5a
18b
2c
25

Mean Rank
10.70
12.36

Sum of Ranks
53.50
222.50

a. european < american


b. european > american
c. european = american

Te st Sta tisticsb

Z
Asy mp. Sig. (2-tailed)

european american
-2.683a
.007

a. Based on negative ranks.


b. W ilcoxon Signed Ranks Test

Checking the Required Conditions


The sign test requires:
The populations be similar in shape and spread:

The sample size exceeds 10 (n=23).

Wilcoxon Signed Rank Sum Test


Well use Wilcoxon Signed Rank Sum test
when we want to compare two populations of
interval (but not normally distributed) data in a
matched pairs type experiment.
Compute paired differences, discard zeros.
Rank absolute values of differences smallest
(1) to largest (n), averaging ranks of tied
observations.
Sum the ranks of positive differences (T+)
and of negative differences (T).
Use T=T+ as our test statistic

Wilcoxon Signed Rank Sum Test


Now we have a test statistic, but what to
compare it against?
For small sample sizes, i.e. n 30, critical
values of T can be read from Table 10 in
Appendix B.
For large sample sizes, i.e. n > 30, T is
approximately normally distributed, so we
have:

Table 21.4

Critical Values for the


Wilcoxon Signed Rank
Sum Test

Example 21.4

IDENTIFY

Do travel times to the office vary between an


8:00 am start and a flextime start? 32
workers recorded their travel times
We want to research this hypothesis:
H1: the two population locations are
different
Thus we require:
H0: the two population locations are the
same.

Example 21.4

Data

Travel time
Worker
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

32

Arrival at 8.00AM
34
35
43
46
16
26
68
38
61
52
68
13
69
18
53
18

42

Flextime Program
31
31
44
44
15
28
63
39
63
54
65
12
71
13
55
19

38

Example 21.4

IDENTIFY

The data are interval (i.e. times) and were produced by a matched pairs
experiment (same drivers, same day of the week Wednesday). Why
arent we using a t-test for D ?

A histogram of the paired differences reveals a non-normal distribution,


hence we must use a non-parametric technique.

Example 21.4

COMPUTE

Travel time
Worker Arrival at 8.00AM Flextime Program Difference Difference
3
1
34
3
31
4
2
35
4
31
-1
3
43
1
44
2
4
46
2
44
1
5
16
1
15
-2
6
26
2
28
5
7
68
5
63
-1
8
38
1
39
-2
9
61
2
63
-2
10
52
2
54
3
11
68
3
65
1
12
13
1
12
-2
13
69
2
71
5
14
18
5
13
-2
15
53
2
55
-1
16
18
1
19

4
32
42
4
38

Rank
21.0
27.0
4.5
13.0
4.5
13.0
31.0
4.5
13.0
13.0
21.0
4.5
13.0
31.0
13.0
4.5

27.0

Example 21.4
The Original
Data

COMPUTE

ranks of +ve differences


ranks of -ve
differences

Sorted ascending by |
difference|

Rank
Sums

Example 21.4

We compute our test statistic as follows

Our rejection region is

COMPUTE

INTERPRET

Example 21.4
There is not enough evidence to
infer that flextime commutes are
different from the commuting times
under the current schedule

SPSS Output for example 21.4


Ranks
N
@8_00_ar - flextime

Negative Ranks
Positive Ranks
Ties
Total

12a
20b
0c
32

Mean Rank
13.38
18.38

Sum of Ranks
160.50
367.50

a. @8_00_ar < flextime


b. @8_00_ar > flextime
c. @8_00_ar = flextime

Te st Sta tisticsb

Z
Asymp. Sig. (2-tailed)

@8_00_ar
- flextime
-1.947a
.051

a. Based on negative ranks.


b. W ilcoxon Signed Ranks Test

Example 21.4

INTERPRET

compare

p-value

Identifying Factors I
Factors that Identify the Sign Test

Identifying Factors II
Factors that Identify the Wilcoxon Signed
Rank Sum Test

Problem 21.14 (Xr21-14)


Do the ways that women dress influence the ways
that other women judge them? This question was
addressed by a researcher at Ohio State University.
The experiment consisted of asking women to rate
how professional two women looked. One woman
wore a size 6 dress and the other woman wore size
14. Suppose that the researcher asked 20 women to
rate the women wearing the size 6 dress and another
20 rate the women wearing the size 14 dress. The
ratings were as follows:
4 = Highly professional; 3 = Somewhat professional
2 = Not very professional; 1 = Not all professional
Do these data provide sufficient evidence to infer
that women perceive another woman wearing size 6
dress as more professional than one wearing a size
14 dress?.

Size 6
2.00
3.00
3.00
2.00
3.00
4.00
3.00
3.00
4.00
4.00
2.00
3.00
4.00
4.00
1.00
3.00
4.00
3.00
4.00
2.00

Rank
8
20.5
20.5
8
20.5
34.5
20.5
20.5
34.5
34.5
8
20.5
34.5
34.5
2
20.5
34.5
20.5
34.5
8

Size 14
2.00
4.00
3.00
1.00
2.00
3.00
3.00
2.00
3.00
4.00
3.00
4.00
3.00
4.00
4.00
2.00
3.00
1.00
2.00
3.00

Rank
8
34.5
20.5
2
8
20.5
20.5
8
20.5
34.5
20.5
34.5
20.5
34.5
34.5
8
20.5
2
8
20.5

T 6 = 439.5
T14 = 380.5

21.14

H0 :

The two population locations are the same

H1 :

The location of population 1 is to the right of the


location of population 2
Rejection region:

z z z.05 1.645

n1 (n1 n 2 1) 20(20 20 1)
E (T )

410,
2
2
n n (n n 1)
(20)(20)(20 20 1)
T 1 2 1 2

37.0
12
12
439.5 410
T E (T )
.80,
z
=
37.0
T
p-value = P(Z > .80) = .5 .2881 = .2119.
There is not enough evidence to infer that women perceive another
woman wearing a size 6 dress as more professional than one wearing
a size 14 dress.

# 21.37
In a study to determine whether gender affects
salary offers for graduating MBA students, 25
pairs of students were selected. Each pair
consisted of a male and a female student who
had almost identical grade point averages,
courses taken, ages, and previous work
experience. The highest salary offered to each
student upon graduation was recorded. Is there
sufficient evidence to allow us to conclude the
salary offers differ between men and women?

Female

Male

Difference

Difference

29981.00
29689.00
30916.00
30300.00
31772.00
30647.00
30943.00
31598.00
32811.00
32754.00
32698.00
32223.00
32404.00
32578.00
34053.00
34823.00
35044.00
34783.00
34870.00
34806.00
35062.00
34905.00
36399.00
36186.00
36502.00

29233.00
28733.00
29541.00
29058.00
31149.00
29141.00
29739.00
33529.00
33938.00
32239.00
32661.00
31176.00
34375.00
34454.00
32184.00
34570.00
34097.00
36458.00
33321.00
34860.00
36207.00
33660.00
36758.00
34800.00
37701.00

748
956
1375
1242
623
1506
1204
-1931
-1127
515
37
1047
-1971
-1876
1869
253
947
-1675
1549
-54
-1145
1245
-359
1386
-1199

748
956
1375
1242
623
1506
1204
1931
1127
515
37
1047
1971
1876
1869
253
947
1675
1549
54
1145
1245
359
1386
1199

Rank

19
17
9
11
20
7
12
2
15
21
25
16
1
3
4
23
18
5
6
24
14
10
22
8
13

21.37

H0 :

H1 :

The two population locations are the same


The location of population 1 is different from the location
of population 2
Rejection region:

z z / 2 z .025 1.96
z z / 2 z.025 1.96

n ( n 1)
25( 25 1)

162.5
4
4
n ( n 1)(2n 1)
25( 25 1)(2[25] 1)

37.2
24
24

E (T )
;

or

T E (T )
z
T

190 162.5
.74,
37.2

p-value = 2P(Z > .74) = .2(5 .2704) = .4592.


There is not enough evidence of a difference
in salary offers between men and women

Ranks
N
male - female

Negative Ranks
Positive Ranks
Ties
Total

16a
9b
0c
25

Mean Rank
11.88
15.00

Sum of Ranks
190.00
135.00

a. male < female


b. male > female
c. male = female

Test Statisticsb
Z
Asymp. Sig. (2-tailed)

male - female
-.740a
.459

a. Based on positive ranks.


b. Wilcoxon Signed Ranks Test

Two or More Populations

Kruskal-Wallis Test
So far weve been comparing locations of two
populations, now well look at comparing two or
more populations.
The Kruskal-Wallis test is applied to problems
where we want to compare two or more
populations or ordinal or interval (but
nonnormal) data from independent samples.
Our hypotheses will be:
H0: The locations of all k populations are
the same.
H1: At least two population locations differ.

Test Statistic
In order to calculate the Kruskal-Wallis test
statistic, we need to:
1. Rank all the observations from smallest (1)
to largest (n), and average the ranks in the
case of ties.
2. We calculate rank sums for each sample:
T1, T2, , Tk
3. Lastly, we calculate the test statistic
(denoted H):

Sampling Distribution of the Test Statistic:


For sample sizes greater than or equal to
5, the test statistic H is approximately
Chi-squared distributed with k1 degrees
of freedom.
Our rejection region is: H > 2,k-1
And our p-value is: P ( 2 > H )

Figure 21.10

Sampling Distribution of H

Example 21.5

IDENTIFY

Can we compare customer ratings (4=good


1=poor) for speed of service across three shifts
in a fast food restaurant? Our hypotheses will be:
H0: The locations of all 3 populations are
the same.

(that is, there is no difference in service between shifts),


and
H1: At least two population locations differ.
Customer ratings for service were recorded

Example 21.5
10 customers were selected at random from
each shift
4:00 P.M to Midnight Midnight to 8:00 A.M
4
3
4
4
3
2
4
2
3
3
3
4
3
3
3
3
2
2
3
3

8: A.M to 4:P.M
3
1
3
2
1
3
4
2
4
1

Example 21.5

COMPUTE

One way to solve the problem is to take the original data,


stack it, and then
sort by customer response
& rank bottom to top
sorted by response

Example 21.5

COMPUTE

Once its in stacked format, put in straight rankings


from 1 to 30, average the rankings for the same
response, then parse them out by shift to come up with
rank sum totals

Example 21.5

COMPUTE

= 2.64
Our critical value of Chi-squared (5% significance
and k1=2 degrees of freedom) is 5.99147, hence
there is not enough evidence to reject H0.

Example 21.5

INTERPRET

There is not enough evidence to infer that a


difference in speed of service exists between
the three shifts, i.e. all three of the shifts are
equally rated, and any action to improve service
should be applied to all three shifts

Example 21.5

COMPUTE

compare

There is not enough evidence to infer that a p-value


difference
in speed of service exists between the three shifts, i.e.
all three of the shifts are equally rated, and any action to
improve service should be applied to all three shifts

SPSS Output
Test Statisticsa,b
Chi-Square
df
Asymp. Sig.

mid_8_00
1.752
2
.416

@8_00_4
2.226
2
.329

a. Kruskal Wallis Test


b. Grouping Variable: @4_00_mi

There is not enough evidence to infer that a difference in


speed of service exists between the three shifts, i.e. all three
of the shifts are equally rated, and any
action to improve service should be applied to all three shifts

Identifying Factors
Factors that Identify the Kruskal-Wallis Test

Friedman Test
The Friedman Test is a technique used
compare two or more populations of ordinal
or interval (nonnormal) data that are
generated from a matched pairs experiment.
The hypotheses are the same as before:
H0: The locations of all k populations are the
same.
H1: At least two population locations differ.

Friedman Test Test Statistic


Since this is a matched pairs experiment,
we first rank each observation within
each of b blocks from smallest to largest
(i.e. from 1 to k), averaging any ties. We
then compute the rank sums: T1, T2, ,
Tk. Then we calculate our test statistic:

Friedman Test Test Statistic


This test statistic is approximate
Chi-squared with k1 degrees of
freedom (provided either k or b 5).
Our rejection region and p-value
are:

Sampling Distribution of the Test


Statistic
The test statistics is approximately chisquared distributed with k 1 degrees
of freedom provided either k or b is
greater than or equal to 5.The rejection
region is
Fr = 2, k-1
and the p value is
P( 2 > Fr )
The figure on next slide depicts the
sampling distribution and p value

Figure 21.11

Sampling Distribution of Fr

Example 21.6

IDENTIFY

Four managers evaluate and score


job applicants on a scale from 1
(good) to 5 (not so good). There
have been complaints that the
process isnt fair. Is it the case that
all managers score the candidates
equally or not? That is:

Example 21.6

IDENTIFY

H0: The locations of all 4 populations are


the same.
(i.e. all managers score like candidates
alike)
H1: At least two population locations differ.
(i.e. there is some disagreement between
managers on scores)
The rejection region is

Fr > 2,k-1 = 2.05,3 = 7.81473

Example 21.6

COMPUTE

The data looks like this:

There are k=4 populations (managers)


and b=8 blocks (applicants) in this setup.

Example 21.6

COMPUTE

Applicant #1 for example, received a top score from


manager and next-to-top scores from the other three.
Applicant #7 received a top score from manager as
well, but the other three scored this candidate very low

Example 21.6

COMPUTE

rank each observation within block from


smallest to largest (i.e. from 1 to k), averaging
any ties For example, consider the case of
candidate #2:
Manage
r

Manage Manage Manage


r
r
r

Original
Scores

checksu
m

straigh
t
ranking

10

average
d
ranking

(1+2)/2=

1.5

(1+2)/2=

1.5

checksum = 1 + 2 + 3 + + k

10

Example 21.6

COMPUTE

Compute the rank sums: T1, T2, , Tk and


our test statistic

Example 21.6 COMPUTE


The rejection region is
Fr > 2,k-1 = 2.05,3 = 7.81473

Example 21.6

INTERPRET

The value of our Friedman test statistic is 10.61


compared to a critical value of Chi-squared (at
5% significance and 3 d.f.) which is: 7.81473
Thus, there is sufficient evidence to reject H0 in
favor of H1

Itappearsthatthemanagers
evaluationsofapplicantsdoindeeddiffer

SPSS Output
Ranks
manager1
manager2
manager3
manager4

Mean Rank
2.63
1.25
3.06
3.06

Test Statisticsa
N
Chi-Square
df
Asymp. Sig.

8
12.864
3
.005

a. Friedman Test

Identifying Factors
Factors that Identify the Friedman Test

Spearman Rank Correlation


Coefficient
Previously we looked at the t-test of the
coefficient of correlation ( ). In many
situations, one or both variables may be
ordinal; or if both variables are interval, the
normality requirement may not be satisfied.
In such cases, we measure and test to
determine whether a relationship exists by
employing a nonparametric technique, the
Spearman rank correlation coefficient.

Spearman Rank Correlation


Coefficient
We are interested whether a relationship exists between the two
variables, hence the hypotheses to be tested are:
H0: s = 0 (no linear pattern, hence no correlation)
H1: s 0 (correlation; we can also do one-tail tests)
Since s is a population parameter, our sample statistic is rs ,and is
calculated as:

Sab

r
=
s
where a and b are the ranks of x and y respectively.

Sa Sbsab is the covariance


sa & sb are the standard deviations;
[s is referred to as the Spearman correlation coefficient]

Spearman Rank Correlation


Coefficient
For values of n between 5 and 30, critical values of rs
are available in Table 11 of Appendix B.
When n is greater than 30, rs is approximately normally
distributed with
a mean of zero, and
a standard deviation of
Hence our standardized test statistic is:

Example 21.7
The production manager of a firm wants to examine the
relationship between aptitude test scores given prior to
hiring of production line workers and performance
ratings received by the employees 3 months after starting
work. The results of the study would allow the firm to
decide how much weight to give to these aptitude tests
relative to other work-history information obtained,
including references. The aptitude test results range from
0 to 100. The performance rating are as follows:
1 = Employee has performed well below average
2 = Employee has performed somewhat below average
3 = Employee has performed at the average level
4 = Employee has performed some what above average
5 = Employee has performed well above average

Example 21.7
A random sample of 20 production workers yielded
the results listed below. Can the firms manager infer
at the 5% significance level that aptitude test scores
are correlated with performance rating?
Aptitude Performance
Employee test score
Rating
1
2
3
4
5
6
7
8
9
10

59
47
58
66
77
57
62
68
69
36

3
2
4
3
2
4
3
3
5
1

Aptitude Performance
Employee test score
Rating
11
12
13
14
15
16
17
18
19
20

48
65
51
61
40
67
60
56
76
71

3
3
2
3
3
4
2
3
3
5

Example 21.7

IDENTIFY

We specify our hypotheses as:


H0: = 0
s

H1: 0
s

At a 5% significance level and n=20


observations, the rejection region (from Table
10) is:
rs < .450 -or- rs > .450

Example 21.7

COMPUTE

As before, we rank each of the variables separately and average any ties
Employee

Aptitude
test score

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

59
47
58
66
77
57
62
68
69
36
48
65
51
61
40
67
60
56
76
71

Performance
Rank a
Rating
9
3
8
14
20
7
12
16
17
1
4
13
5
11
2
15
10
6
19
18

3
2
4
3
2
4
3
3
5
1
3
3
2
3
3
4
2
3
3
5

Rank b
10.5
3.5
17
10.5
3.5
17
10.5
10.5
19.5
1
10.5
10.5
3.5
10.5
10.5
17
3.5
10.5
10.5
19.5

Example 21.7

COMPUTE

We use rank a and b to compute Pearson


coefficient of correlation. We need to compute
sa , sb , and sab. They are
sa = 5.92
sb = 5.50
sab = 12.34
Sab

rs =

Sa S b

12.34
( 5.92)(5.50)

= .379

Example 21.7

COMPUTE

Compare .379 to our critical value of


rs=.450,
Since .379 < .450

There is not enough evidence to


believe that the aptitude test scores
and performance ratings are
related.

Identifying Factors
Factors that Identify the Spearman Rank
Correlation Coefficient Test

You might also like