Professional Documents
Culture Documents
6
Phil.015
April 12, 2016
HYPOTHESIS TESTING IN BINOMIAL AND NORMAL
DISTRIBUTION MODELS
1. Hypotheses
Recall that in one old segment of the TV show The Odd Couple, Felix claimed
to have ESP. Oscar was skeptical and suggested to test Felixs claim. Oscar
would draw a card at random from a deck of four large cards, each with a different geometric figure on it (e.g., a circle, square, triangle and a cross). Without
showing the card, Felix was asked to identify it. Felix and Oscar repeated the
basic card identification experiment 10 times. Remarkably, Felix made 6 correct
identifications. Felixs score is surprisingly good in view of the fact that the average number of correct identifications is only 2.5 for anyone who does not have
ESP. Well, does Felixs score prove that he actually has ESP? Of course not.
This may have been his lucky day. But Felixs high score may provide some
evidence for Felixs ESP capability. But just how much evidence? It depends
on how strict Oscar chooses to be in terms of the discrepancy between the gathered data (i.e., 6 correct identifications) and the predictions of his no ESP
statistical model (i.e., on average one can make only 2.5 correct identifications).
(ii) A parametrized family of seriously possible probability distribution functions of Xn . Because (i) the probability that Felix correctly identifies a
card drawn by Oscar in any trial is always the same, namely p = 14 , (ii)
there are only two possible outcomes for each trial, called success (correct
identification) and failure (wrong identification), (iii) there are n = 10 trials, where n is fixed, and since (iv) all n trials are statistically independent
of each other, the binomial probability distribution is an appropriate model
for the experiment.
1
Remember that there are 4 cards and without ESP Felix will guess correctly
any card drawn by Oscar with probability p = 14 . If Felix has ESP, then the
probability could be much higher, but we may not know exactly how much
higher. Given the foregoing problem description, it is most adequate to consider
as a model the following parametrized family of binomial probability distribution
functions:
10
pk (1 p)10k ,
=k|p =
k
where the parameter that parametrizes the possible binomial statistical models
is the probability p. Of course, we do not know the exact value of p. The
business of hypothesis testing is to generate statistical inferences about the
likely values of p.
Statisticians often write Xn Bin k | p to indicate that the random variable
(i.e., the test statistic) Xn has a pdf specified by Bin k | p with parameter
p whose value is unknown. This is their way to introduce a parametric family
of statistical models that hopefully includes the correct model with a specific
value for p. Because this value is not known, the next move is to hypothesize a
specific value of p and then let the experimental outcome decide whether that
hypothesis about ps value is acceptable.
Because Oscar does not believe that Felix has ESP, he starts pessimistically
with the so-called null hypothesis
1
H0 : p = ,
4
stating plausibly that Felix has no ESP and Felix is only guessing. In other
words, Oscar believes that the binomial statistical model
X10 Bin k |
1
4
correctly characterizes Felixs ESP capabilities. To emphasize the extant hypothesis H0 , it is common to symbolize
X10 Bin k | H0 )
x
pX10 (y)
Specification of BinX10 k|p
Probability Assignment
1
2
3
4
5
6
0
0.056
0.188
0.282
0.250
0.146
0.058
0.016
0.003
0.000
The graphical representation of pdf pX10 is displayed next. Note that the diagram has its highest values at X10 = 2 and X10 = 3, justifying the mean value
X10 = 2.5. As alluded to above, with no ESP, the correct identification scores
will be quite close to 2 or 3.
pX (x) = BinX10 (x | H0 ) = P X10 = x | p
0.4
0.2
10
0.4
0.2
10
This model would indeed explain Felixs results much better, but Oscar does
not accept it! Oscar is a skeptic! Be that as it may, we should definitely consider
the so-called alternative hypothesis
1
Ha : p > ,
4
stating that in the case of Felix ESP performance the probability of correctly
identifying a card drawn by Oscar is actually greater than the guessing-type
probability. In other words, the correct model for the experiment is somewhere
in the binomial family
X10 Bin k | Ha )
of seriously possible statistical models. The next problem is how to decide which
hypothesis should we accept H0 or Ha in face of fresh observation X10 = 6?
Before moving on, note also that if Oscar were a true believer in Felixs ESP
capabilities, he may as well consider yet another hypothesis, say H0 : p = 0.75,
that leads to the binomial probability distribution
0.4
0.2
10
giving the mean = 10. 0.75 = 7.5 that treats Felixs ESP performance far
too optimistically. This is something Oscar is not prepared to do.
What hypotheses H0 and H0 indicate is that there are many hypotheses that
perhaps explain Felixs experimental result much better than H0 . But Oscar is
not convinced as yet that this might be the case.
then xobs shows strong evidence against H0 (or simply the test result is
highly statistically significant), prompting a rejection of H0 at the 1% significance level! Because the P -value 0.019 in Felixs case is greater than
0.01, Oscar can still retain H0 at the 1% significance level!
(ii) However, if the P -value satisfies
0.01 P Xn xobs | H0 < 0.05,
then xobs shows moderate evidence against H0 (or simply the test result is
statistically significant) still prompting a rejection of H0 but this time
only at the 5% significance level! Since now the P -value 0.019 (obtained
for Felix) is strictly less than 0.05, Oscar must give up his pessimistic
hypothesis H0 at the 5% significance level!
(iii) Finally, the P -value satisfying
0.10 P Xn xobs | H0
indicates no evidence against H0 .
In addition to the P -value approach, described above, there is also a dual critical
value approach, according to which a designated critical value xcr of the test
statistic Xn will determine when H0 ought to be rejected. Specifically, in a
right-tailed test (a prime example is Felixs ESP experiment) H0 is rejected
precisely when xcr xobs . In a right-tail test, the set of values of Xn that are
equal to or larger than the critical value is called the rejection region. All other
values of Xn belong to the nonrejection region.
Question: How do we find the critical value? Answer: The statistician specifies
it prior to the experiment or calculates it from the equation
P Xn xcr | H0 = 0.01
at the 1% significance level. Since P Xn xcr | H0 = 1P Xn xcr 1 | H0
and hence
P Xn xcr 1 | H0 = 0.99,
we can look up the value of xcr 1 in the table for the binomial comulative
probability distribution for sample n and probability specified by H0 .
6
for xcr . Equivalently, we look up the value of xcr 1 in the table for the binomial
comulative probability distribution for sample n and probability specified by H0 ,
satisfying the formula
P Xn xcr 1 | H0 = 0.95.
In the Felixs ESP example, we find that the critical value at the 5% significance
level is approximately xcr 1 = 4.5, i.e., we have xcr = 5.5. What this means is
that any test score above 5.5 leads to the rejection of H0 at the 5% significance
level. Thus now the rejection region for Felixs ESP experiment is given by the
set {6, 7, 8, 9, 10}.
The binominal graph for a biased coin that leads to more likely heads (i.e., the
alternative hypothesis has the form Ha : p(head)> 12 ) in n = 16 tosses has the form
pX (x) = BinX16 (x | H0 ) = P X16 = x | p
0.4
0.2
1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
reject region
The rejection region is given by head numbers X = 12, 13, 14, 15, 16.
0.4
0.2
1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
reject region
In the graph the cutoff point (critical value) xcr is indicated by a vertical line.
In this setting hypothesis testing is really quite simple. In a given experiment
of 16 coin tosses, simply count the total number of heads and verify whether it
is above the cutoff point xcr . If so, hypothesis H0 is retained, and otherwise H0
is rejected.
Of course, we can calculate the P -value P Xn xcr | H0 . If it turns out to
be smaller than 0.01, then H0 is rejected at the 1% significance level. And of
course
similarly for 0.05. Specifically, the binomial table gives P X16 5 |
H0 = 0.1051, which is a bit too weak for rejecting H0 . However, because
P X16 3 | H0 = 0.010, only 3 heads in total in 16 coin tosses would be on
the border of a highly significant test at the 1% significance level.
0.2
1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
reject region
reject region
y
1
pX (x) =
2
(x )2
2 2
e
= 0.5, = 0
10
Because it is often easier and far more universal to work with the normal distribution,
for large binomial samples hypothesis testing is performed in a normal distribution
setting. There is one important technical problem, however. Tables for the normal
probability distribution are available only for special cases, where the mean is = 0
and the standard deviation is = 1. In order to be able to use this rather specialized
table also for the other normal probability distributions (in which in general 6= 0 and
6= 1), we must transform the original test statistic Xn into its so-called Z-statistic
or Z-score (or standardized random variable), defined by
Z =df
n X
X
(x )2
2 2
= 0.5, = 0
reject region
1.64
11
(x )2
2 2
e
= 0.5, = 0
reject region
1.64
(x )2
2 2
e
= 0.5, = 0
reject region
reject region
1.96
1.96
12
Once again, hypothesis H0 is rejected provided that the observed result Z = zobs in a
pertinment experiment falls into the rejection region.
13