Non-Parametric Tests For One Sample Location Problem

Non-Parametric Tests for One Sample
Location Problem
Simulation Study
Anirban Ray, Mriganka Aulia
27 March 2018
Anirban Ray, Mriganka Aulia Non-Parametric Tests for One Sample Location Problem
Introduction
Under parametric set-up, tests are done in one sample location

problems usually for population mean
Introduction

On the contrary, tests are done for population quantiles in
non-parametric set-up
Introduction

On the contrary, tests are done for population quantiles in
non-parametric set-up
Advantageous, as these are more robust in nature
Objective
In this project, we will consider the following two tests:
Objective
Objective

1. Sign Test
Objective

1. Sign Test
Objective

1. Sign Test
2. Wilcoxon Signed Rank Test
Objective

1. Sign Test
We will verify some of the properties by simulation study,

namely:
Objective

1. Sign Test

namely:
Objective

1. Sign Test

namely:
Distribution Free under Null Hypothesis
Objective

1. Sign Test

namely:
Objective

1. Sign Test

namely:
Asymptotic Normality for Large Samples
Objective

1. Sign Test

namely:
Then, we will estimate the power curves
Objective

1. Sign Test

namely:
Then, we will estimate the power curves

Finally, we will compare these simulated power curves with that
of corresponding parametric test
Sign Test: Assumptions
(X1 , X2 , . . . , Xn ) is a random sample from a location family

with distribution function F (x − θ)
Sign Test: Assumptions

F ∈ Ω0 = {F : F is absolutely continuous, F (0) = 21 }
Sign Test: Testing Problem
We wish test whether the population median is some specified

constant θ0

constant θ0
Alternatively, one can test H0 : ξp = ξp0 , where ξp is the p th
population quantile

constant θ0
population quantile
In this project, we will always consider the test for median

constant θ0
population quantile
Tests for quantiles can be constructed by slight modification

constant θ0
population quantile
Null Hypothesis H0 : θ = θ0

constant θ0
population quantile
Alternative Hypotheses
H1 : θ > θ 0
H2 : θ < θ 0
H3 : θ 6= θ0
Sign Test: Test Statistic
Let Yi = Xi − θ0 , i = 1(1)n
Let Yi = Xi − θ0 , i = 1(1)n
Define S = ni=1 I(Yi ≥ 0)
P
Let Yi = Xi − θ0 , i = 1(1)n
Define S = ni=1 I(Yi ≥ 0)
P
S is the Sign Statistic
Sign Test: Testing Rule
Under H1 , more observations will tend to be greater than θ0

So, we reject H0 in favour of H1 for large values of S

On the other hand, observations will tend to be lesser than θ0
under H2

under H2
So, we reject H0 in favour of H2 for small values of S

under H2
So, we reject H0 in favour of H2 for small values of S
Combining these two cases, we reject H0 in favour of H3 if S is
too large or too small
Sign Test: Null Distribution
Under H0 , S is the sum of n i.i.d. Ber ( 12 ) random variables

Thus, S ∼ Bin(n, 12 ), under H0

For small sample sizes, cut-off points of Binomial distribution
are used to determine the critical region

For small sample sizes, cut-off points of Binomial distribution
are used to determine the critical region
If n is large, one can use cut-off points of Normal distribution
Sign Test: Implementation
In some of the latter sections, we will use SignTest function

from DescTools package
Verification of Distribution Free Property of Sign
Test Statistic under the Null Hypothesis
We have already stated that S has a Binomial distribution free

of θ under H0

of θ under H0
Thus, S is a distribution free statistic under H0

of θ under H0
Thus, S is a distribution free statistic under H0
We wish to verify this by means of simulation study
Sign Test Statistic Distribution Free: Graphical
Verification
To verify, we draw large number (say, R) of samples of fixed

sample size (say, k) from different continuous distributions (say,
Beta, Gamma, Log Normal and Normal)
Verification

We compute the test statistic for all the samples
Verification

We can expect that the frequency distribution of these R Sign
statistics for each distribution will be more or less same
Verification

So, the sample quantiles must be in parity with each other
Verification

Therefore, Quantile - Quantile plot should be close to the 45◦
line
Verification

Therefore, Quantile - Quantile plot should be close to the 45◦
line
We verify this claim for different choices of k
Graph 1
5
Q−Q Plots for Sign Statistics for Sample Size 5
5
4
4
Log Normal Sample Quantiles
Gamma Sample Quantiles
Normal Sample Quantiles

3
3
2
2
1
1
0
0
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
Beta Sample Quantiles Gamma Sample Quantiles Log Normal Sample Quantiles
Graph 2
10
10
10
8
8

6
6
4
4
2
2
0
0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Graph 3
14
14
14
12
12
12
10
10
10

8
8
6
6
4
4
2
2
2 4 6 8 10 12 14 2 4 6 8 10 12 14 2 4 6 8 10 12 14
Graph 4
15
15
15

10
10
10
5
5
5 10 15 5 10 15 5 10 15
Sign Test Statistic Distribution Free: Analytical
Verification
We expect the samples to come from same distribution
Verification

So, we can perform Pearson’s Goodness of Fit Test
Verification

For each sample from different distributions, we compute the
sign statistics
Verification

sign statistics
Then, we test the hypothesis that the sample of R sign
statistics have come from Binomial distribution with
parameters k and 0.5
Verification

sign statistics
We repeat the same for all distribution
Verification

sign statistics
Then, we repeat for different choices of sample size
Verification

sign statistics
All the hypotheses are accepted
Verification

sign statistics
All the hypotheses are accepted
Hence, the underlying distribution has no role in the
distribution of sign statistic
Table 1
Sample Size Beta Gamma Log Normal Normal

5 0.3348326 0.6636682 0.2463768 0.8980510
10 0.9450275 0.5397301 0.5627186 0.0284858
15 0.0424788 0.4337831 0.0154923 0.7026487
20 0.1874063 0.6886557 0.8640680 0.7976012
Verification of Asymptotic Normality of Sign Test
Statistic for Large Samples
For large samples, one uses Normal cut-off points for Sign test
Verification of Asymptotic Normality of Sign Test
Statistic for Large Samples
For large samples, one uses Normal cut-off points for Sign test
We wish to verify this by simulation study
Asymptotic Normality of Sign Test Statistic:
Shapiro-Wilk Normality Test
One of the most commonly used test for Normality

Draw large number (say, R) of samples of fixed sample size
(say, k)

(say, k)
Samples may come from distributions under H0 , or may not be

(say, k)
Generate R sign statistics from these samples and perform the
test

(say, k)
test
Low p-value rejects the null hypothesis of Normality
Table 2
Situation pValue
H0 True Median = Theta_0 1.89113914884773e-15
H1 True Median > Theta_0 3.32390703175591e-24
H2 True Median < Theta_0 1.14623343298736e-24
H3 True Median != Theta_0 1.94129280273761e-15
Justification
Table 2
Situation pValue
Justification
Table 2
Situation pValue
Justification
Clearly, it was shocking to see the rejection with such a low
p-value
Table 2
Situation pValue
Justification
p-value
Table 2
Situation pValue
Justification
p-value
Even more, as here we have used the sample size as 50
Table 2
Situation pValue
Justification
p-value
Table 2
Situation pValue
Justification
p-value
It happens because of the sparseness of the possible values
Table 2
Situation pValue
Justification
p-value
Table 2
Situation pValue
Justification
p-value
Based on a sample of size k, sign statistic can take only the
non-negative integer values not greater than k
Table 2
Situation pValue
Justification
p-value
Table 2
Situation pValue
Justification
p-value
On the other hand, a normal sample could possibly take any
value in this range
Table 2
Situation pValue
Justification
p-value
value in this range
Table 2
Situation pValue
Justification
p-value
value in this range
So, the sensitive parametric rejects the hypothesis
Graphical Verification
We now seek to find some justification graphically

We draw large number (say, R) of samples of fixed sample
size(say, k)

size(say, k)
Based on these, a sample of size R from the distribution of sign
statistic is formed

size(say, k)
statistic is formed
We Plot the frequency distribution of this sample

size(say, k)
statistic is formed
We superimpose the expected Normal density

size(say, k)
statistic is formed
Repeat for different choices of k

size(say, k)
statistic is formed
Close similarity indicates in favour of asymptotic normality

size(say, k)
statistic is formed
Also, empirical distribution functions match closely with
expected Normal CDF

size(say, k)
statistic is formed
expected Normal CDF
So, one can use normality approximation for moderate (10
onwards) sample sizes
Graph 5
Sample Size 5 Empirical CDF and Normal CDF
1.0
0.30
0.8
0.25
0.20
Relative Frequency
0.6
Fn(x)
0.15
0.4
0.10
0.2
0.05
0.00
0.0
0 1 2 3 4 5 −3 −2 −1 0 1 2 3
Sign Statistic x
Plot of Standardized Observations
Graph 6
0.25 Sample Size 10 Empirical CDF and Normal CDF
1.0
0.20
0.8
Relative Frequency
0.15
0.6
Fn(x)
0.10
0.4
0.05
0.2
0.00
0.0
0 1 2 3 4 5 6 7 8 9 10 −4 −2 0 2 4
Sign Statistic x
Graph 7
0.20 Sample Size 15 Empirical CDF and Normal CDF
1.0
0.8
0.15
Relative Frequency
0.6
0.10
Fn(x)
0.4
0.05
0.2
0.00
0.0
1 2 3 4 5 6 7 8 9 11 13 15 −4 −2 0 2 4
Sign Statistic x
Graph 8
Sample Size 20 Empirical CDF and Normal CDF
1.0
0.15
0.8
Relative Frequency
0.6
0.10
Fn(x)
0.4
0.05
0.2
0.00
0.0
1 3 5 7 9 11 13 15 17 −4 −2 0 2 4
Sign Statistic x
Sign Test: Power Function
We now calculate the power function of Sign Test

Given a sample size k, we generate R samples from a known
distribution

distribution
Then, we select some other number in close neighbourhood as
an alternative of θ0

distribution
We compute values of sign statistic based on this value

distribution
We check the p-value of the observed value of the statistic

distribution
Proportion of cases of rejection gives the simulated power
function

distribution
Proportion of cases of rejection gives the simulated power
function
Repeat this process for different sample sizes
Graph 9
1.0
0.8
0.6 Sign Test Power Curves for Different Sample Sizes
Power
Sample Size 10
Sample Size 20
0.4
Sample Size 30
0.2
0.0
0 1 2 3 4 5
True Median
Exponential Distribution with Parameters 1
Wilcoxon Signed Rank Test: Assumptions


F ∈ Ωs = {F ∈ Ω0 , F (x ) = F (−x )}

F ∈ Ωs = {F ∈ Ω0 , F (x ) = F (−x )}
Ω0 = {F : F is absolutely continuous, F (0) = 21 }

F ∈ Ωs = {F ∈ Ω0 , F (x ) = F (−x )}
Ω0 = {F : F is absolutely continuous, F (0) = 21 }
Here, we assume that the underlying distribution is symmetric
about its median
Wilcoxon Signed Rank Test: Testing Problem

constant θ0

constant θ0

constant θ0
Alternative Hypotheses
H1 : θ > θ 0
H2 : θ < θ 0
H3 : θ 6= θ0
Wilcoxon Signed Rank Test: Test Statistic
Let Yi = Xi − θ0 , i = 1(1)n
Let Yi = Xi − θ0 , i = 1(1)n
Define Rj = Rank(| Yj |), j = 1(1)n
Let Yi = Xi − θ0 , i = 1(1)n
Let T = ni=1 Rj I(Yi > 0)
P
Let Yi = Xi − θ0 , i = 1(1)n
Let T = ni=1 Rj I(Yi > 0)
P
T is the Wilcoxon Signed Rank Statistic
Wilcoxon Signed Rank Test: Testing Rule

and will have higher ranks

So, we reject H0 in favour of H1 for large values of T

under H2

under H2
So, we reject H0 in favour of H2 for small values of T

under H2
So, we reject H0 in favour of H2 for small values of T
Combining these two cases, we reject H0 in favour of H3 if T is
too large or too small
Wilcoxon Signed Rank Test: Null Distribution of
test Statistic
T can be expressed as nj=1 jWj , where {Wj }nj=1 are i.i.d.

P
Ber ( 21 ) random variables under the null hypothesis
test Statistic

P

Wj = I(| X |(j) corresponds to a positive observation)
test Statistic

P

Thus, T is distribution free under H0
test Statistic

P

For small sample sizes, cut-off points of exact distribution are
used to determine the critical region
test Statistic

P

For small sample sizes, cut-off points of exact distribution are
used to determine the critical region
If n is large, one can use cut-off points of Normal distribution
Wilcoxon Signed Rank Test: Implementation
In some of the latter sections, we will use wilcox.test function

from stats package
Verification of Distribution Free Property of
Wilcoxon Signed Rank Test Statistic under the Null
Hypothesis
We have already stated that T is a distribution free statistic

under H0
Hypothesis

under H0
Hypothesis

under H0
We will do that by both graphical methods and formal tests
Wilcoxon Signed Rank Test Statistic Distribution
Free: Graphical Verification
First, we draw a large number (say, R) of samples of fixed

sample size (say, k) from different symmetric continuous
distributions (say, Cauchy, Normal, t, Uniform)


Under distribution free property, the frequency distribution of
these R Sign statistics for each distribution will be identical

So, we plot the column diagrams, and expect those to be more
or less same

So, we plot the column diagrams, and expect those to be more
or less same
We repeat this process for different choices of k
Graph 10
Sample Size 10
Cauchy
Normal
t
Uniform
0.03
Relative Frequency
0.02
0.01
0.00
2.5 7.5 12 18 22 28 32 38 42 48 52
Wilcoxon Signed Rank Statistic
Graph 11
Sample Size 15
Cauchy
0.020
Normal
t
Uniform
0.015
Relative Frequency
0.010
0.005
0.000
2.5 12 18 22 28 32 38 42 48 52 58 62 68 72 78 82 88 92 98 110 120
Graph 12
0.014 Sample Size 20
Cauchy
Normal
t
0.012
Uniform
0.010
Relative Frequency
0.008
0.006
0.004
0.002
0.000
5 15 25 35 45 55 65 75 85 95 100 120 140 160 180 200
Graph 13
0.010 Sample Size 25
Cauchy
Normal
t
Uniform
0.008
Relative Frequency
0.006
0.004
0.002
0.000
30 50 70 90 110 130 150 170 190 210 230 250 270 290
Free: Analytical Verification
We expect the samples of test statistic based on different

underlying distributions to come from same distribution

So, probability of rejecting H0 must be same for all of those

We find out the probability of rejection for different choices of
level of significance

We can say that the underlying distribution has no role

We can say that the underlying distribution has no role
Provided they match for all such choices, and also for different
sample sizes
Table 3
## Level of Significance = 0.01

Sample Size Cauchy Normal t Uniform
10 0.0113 0.0091 0.0107 0.0088
15 0.0085 0.0083 0.0071 0.0075
20 0.0098 0.0111 0.0106 0.0089
25 0.0085 0.0102 0.0086 0.0086
Table 4

10 0.0435 0.0454 0.0513 0.0477
15 0.0503 0.0465 0.0501 0.0484
20 0.0470 0.0466 0.0464 0.0507
25 0.0465 0.0489 0.0498 0.0510
Table 5

10 0.0857 0.0864 0.0893 0.0824
15 0.0954 0.0952 0.0969 0.0962
20 0.0973 0.0983 0.0973 0.0991
25 0.0967 0.0909 0.0946 0.1010
Verification of Asymptotic Normality of Wilcoxon
Signed Rank Test Statistic for Large Samples
n(n+1)
T−
q 4
∼ AN(0, 1)
n(n+1)(2n+1)
24
n(n+1)
T−
q 4
∼ AN(0, 1)
n(n+1)(2n+1)
24
So, for large samples, one uses Normal cut-off points for
Wilcoxon Signed Rank test
n(n+1)
T−
q 4
∼ AN(0, 1)
n(n+1)(2n+1)
24
So, for large samples, one uses Normal cut-off points for
Wilcoxon Signed Rank test
We wish to verify this by simulation study
Asymptotic Normality of Wilcoxon Signed Rank
Test Statistic: Shapiro-Wilk Normality Test

(say, k)

(say, k)

(say, k)
test

(say, k)
test
Sometimes high p-value rejects the null hypothesis of Normality

(say, k)
test
But very inconsistent p-value in the sense of high fluctuation

(say, k)
test
But very inconsistent p-value in the sense of high fluctuation
Consistent only for very large sample size
Table 6
Sample Size Replication 1 Replication 2 Replication 3

20 0.0604578 0.1477418 0.0014243
30 0.5633280 0.3140424 0.8130286
40 0.0749628 0.0303223 0.7927796
50 0.0118960 0.3181890 0.7778051
60 0.0281755 0.0840748 0.4462483
Test Statistic: Graphical Verification

size(say, k)

size(say, k)
statistic is formed

size(say, k)
statistic is formed

size(say, k)
statistic is formed

size(say, k)
statistic is formed

size(say, k)
statistic is formed

size(say, k)
statistic is formed
expected Normal CDF

size(say, k)
statistic is formed
expected Normal CDF
So, one can use normality approximation for moderate (10
onwards) sample sizes
Graph 14
50
40 Q−Q Plot for Sample Size 10
WSRT Quantiles
30
20
10
0
0 20 40 60
Normal Quantiles
Graph 15
120
100
WSRT Quantiles
60
40
20
0
0 20 40 60 80 100 120
Normal Quantiles
Graph 16
200
WSRT Quantiles
100
50
0 50 100 150 200
Normal Quantiles
Graph 17
300
250
WSRT Quantiles
150
100
50
50 100 150 200 250 300
Normal Quantiles
Wilcoxon Signed Rank Test: Power Function
We now calculate the power function of Wilcoxon Signed Rank

Test

Test
distribution

Test
distribution

Test
distribution
We compute values of test statistic based on this number and
find the p-value

Test
distribution
find the p-value
Find proportion of cases when H0 is rejected

Test
distribution
find the p-value
Find proportion of cases when H0 is rejected
Repeat this process for different sample sizes
Graph 18
1.0
0.8
0.6 Wilcoxon Signed Rank Test Power Curves for Different Sample Sizes
10
Power
20
30
0.4
0.2
−4 −2 0 2 4
True Median
Logistic Distribution with Parameters 0 and 1
Parametric Test for One Sample Location Problem
Best known test is the Student’s t-test

Designed for samples from Normal distribution

Performs well for other continuous distributions also

This tests for population mean based on the observed sample
mean

mean
Test statistic: t = x̄s/−µ
√0
n

mean
√0
n
Follows tn−1 distribution under H0 : µ = µ0

mean
√0
n
t.test is implemented in stats package

mean
√0
n
t.test is implemented in stats package
Power function can be simulated using the same steps as before
Comparison of Power Curves
We know t-test is uniformly most powerful for Normal testing

problem of population mean among the unbiased tests

So, we can expect it to perform better than the other two tests

On the other hand, for samples from Logistic distribution, we
expect Wilcoxon Signed Rank test to the best best among the
trio

trio
That’s because this is symmetric about median and does not
satisfy the normality assumption for t-test

trio
In both cases, Sign test is expected to perform not as good as
the others

trio
In both cases, Sign test is expected to perform not as good as
the others
However, if we consider Exponential distribution, which is
neither symmetric nor Normal (obviously!), Sign test should
perform better
Graph 19
1.0
0.8
0.6 Comaprison with Parametric Test
t
Power
Sign
Signed Rank
0.4
0.2
0.0
−4 −2 0 2 4
True Median
Normal Distribution with Parameters 0 and 1
Graph 20
1.0
0.8
t
Power
Sign
Signed Rank
0.4
0.2
0.0
−4 −2 0 2 4
True Median
Logistic Distribution with Parameters 0 and 1
Graph 21
1.0
0.8
t
Power
Sign
Signed Rank
0.4
0.2
0.0
−4 −2 0 2 4
True Median
Exponential Distribution with Parameter 1
Summary
We have successfully verified distribution free properties of

both the test statistics
Summary

We also graphically verified their asymptotic behaviours
Summary

We also graphically verified their asymptotic behaviours
The estimated power curves also confirm our intuitions based
on the theoretical assumptions

Non-Parametric Tests For One Sample Location Problem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Non-Parametric Tests For One Sample Location Problem

Uploaded by

Copyright:

Available Formats

Non-Parametric Tests for One Sample

Anirban Ray, Mriganka Aulia

Under parametric set-up, tests are done in one sample location

Under parametric set-up, tests are done in one sample location

Under parametric set-up, tests are done in one sample location

In this project, we will consider the following two tests:

In this project, we will consider the following two tests:

In this project, we will consider the following two tests:

In this project, we will consider the following two tests:

In this project, we will consider the following two tests:

In this project, we will consider the following two tests:

We will verify some of the properties by simulation study,

In this project, we will consider the following two tests:

We will verify some of the properties by simulation study,

In this project, we will consider the following two tests:

We will verify some of the properties by simulation study,

In this project, we will consider the following two tests:

We will verify some of the properties by simulation study,

In this project, we will consider the following two tests:

We will verify some of the properties by simulation study,

In this project, we will consider the following two tests:

We will verify some of the properties by simulation study,

Then, we will estimate the power curves

In this project, we will consider the following two tests:

We will verify some of the properties by simulation study,

Then, we will estimate the power curves

(X1 , X2 , . . . , Xn ) is a random sample from a location family

(X1 , X2 , . . . , Xn ) is a random sample from a location family

We wish test whether the population median is some specified

We wish test whether the population median is some specified

We wish test whether the population median is some specified

We wish test whether the population median is some specified

We wish test whether the population median is some specified

We wish test whether the population median is some specified

S is the Sign Statistic

Under H1 , more observations will tend to be greater than θ0

Under H1 , more observations will tend to be greater than θ0

Under H1 , more observations will tend to be greater than θ0

Under H1 , more observations will tend to be greater than θ0

Under H1 , more observations will tend to be greater than θ0

Under H0 , S is the sum of n i.i.d. Ber ( 12 ) random variables

Under H0 , S is the sum of n i.i.d. Ber ( 12 ) random variables

Under H0 , S is the sum of n i.i.d. Ber ( 12 ) random variables

Under H0 , S is the sum of n i.i.d. Ber ( 12 ) random variables

In some of the latter sections, we will use SignTest function

We have already stated that S has a Binomial distribution free

We have already stated that S has a Binomial distribution free

We have already stated that S has a Binomial distribution free

To verify, we draw large number (say, R) of samples of fixed

To verify, we draw large number (say, R) of samples of fixed

To verify, we draw large number (say, R) of samples of fixed

To verify, we draw large number (say, R) of samples of fixed

To verify, we draw large number (say, R) of samples of fixed

To verify, we draw large number (say, R) of samples of fixed

Normal Sample Quantiles

Normal Sample Quantiles

Normal Sample Quantiles

Normal Sample Quantiles

We expect the samples to come from same distribution

We expect the samples to come from same distribution

We expect the samples to come from same distribution

We expect the samples to come from same distribution

We expect the samples to come from same distribution