You are on page 1of 34

1. Why do inference?

MBB:
Chapters 8 & 9

8.1 , 8.2

2. Confidence interval for 8.3 , 8.4 , 8.5


3. Test of significance for

9.1 , 9.2 , 9.3

4. p value and level of significance 9.3

Topic 6

5. Type I and Type II errors 9.3


6. One tailed tests & CIs

8.8

7. Statistical power 9.3

Estimates and their accuracy:

8. Power of a z-test (one & two sided)

Confidence Intervals & Hypothesis Testing

9. Practical versus statistical significance

Error Types, Power & Sample size

10. Obtaining the sample size

8.9

We take a random sample of 10 males from this year and obtain


their heights.

Up until now we have assumed we know the population


parameters and 2.
We have made predictions about the next observation or the
sample mean etc.

We obtain: 172 178 185 190 175 163 184 176 173 197
Sample statistics: x = 179.3

not very realistic !


Usually we have a sample and want to make predictions about the
population, or some population parameter (e.g. or 2).

s = 9.84

n = 10

What conclusions can we make about the average height


of all the male STAT171 students for this year?
year
The estimate of the population average height

=179.3
for this year is the sample average

Example:
In past years, the height of male STAT171 students has followed a
Normal distribution with = 175 cm & = 15cm.
15cm

But it is very unlikely that is 179.3 !


Could it be 175?
180?
190?

We want to investigate whether the average height of


this years male STAT171 students is any different.
3

If we take new sample we will almost certainly


get a new x i.e. a different

Therefore, it makes sense to think in terms of a range of values that


might be.
i.e. what values of (for this year) are
believable based on the sample we obtained

substitute for Z:
mult by /n:

We know from theory that:


P(-1.96 Z 1.96) = 0.95

subtract X:

If we can assume that X has a Normal


(or approximately Normal)
distribution, we can assume that:

P ( 1.96 Z +1.96 ) = 0.95

Start with

X
P 1.96
+1.96 = 0.95
n

P 1.96
X +1.96
= 0.95
n
n

P X 1.96
X + 1.96
= 0.95
n
n

mult by (-1): P + X + 1.96

+ + X 1.96

= 0.95
n

write with smaller value at left hand end

P X 1.96
X + 1.96
= 0.95
n
n

X
~ Z
n
So, we can write P(-1.96 < Z < +1.96) = 0.95

This is called the 95% C.I. for  All we need is X, n and .

By using a different z critical value, we can change the level of


confidence of the interval. We can have:

(when X is normally distributed and is known)::


There is a 95% chance that the interval:

a 99% confidence interval (zcrit = 2.575),

, X + 1.96
X 1.96

n
n

a 90% confidence interval (zcrit = 1.645),


etc.

includes the true population mean .

But 95% is by far the most commonly used (see why later).

But there is a 5% chance that it doesnt.

In general terms, a 100(1-)%


100(1- )% confidence interval for is:

Note: x is always in the centre of the interval

1 -

/2

- it is still our best single point estimate of .


7

X 1.96

-z/2

/2

z/2

General case of a C.I.:


sample crit * se(est)
estimate
value

99% confidence interval for :


( x z0.005 * /n )

x = 179.3 n = 10 = 15 (assumed)

= ( 179.3 2.575 * 15/10 )

We assume is unchanged from previous years but may be


different (often quite valid assumption).
95% confidence interval for :
( xx 1.96 * //n
n )
= ( 179.3 1.96*15/10
1.96*15/ 10 )
= ( 179.3 9.297 )

= ( 179.3 12.2 )

Recall, is a constant (does


not change) for any specific
population. It is the
interval which changes
from sample to sample.

Note: the more confident that we want to be that the interval


includes , the wider the interval needs to be.
Level of confidence is 100(1100(1-)%.
)%
(alpha) has a distinct meaning done later
90% confidence interval for :

= (170.0 , 188.6)
We are 95% confident that the interval 170.0 to 188.6 includes
the population mean , where is the true mean height for all
the male STAT171 students for this year.
year

There is a 1% chance this


interval does not contain

= (167.1, 191.5)

(x 1.645 * /n )
= ( 179.3 1.645 * 15/10 )
= ( 179.3 7.8 )
9

= (171.5, 187.1)

The assumption is that:

Remember that is fixed for a given population.

There is a 10% chance this


interval does not contain
10

X
~Z
n

It is the confidence interval that is the variable.


Different samples from the same population will give different
sample means and hence different confidence intervals.

This requires:

Therefore:

n to be large enough to assume that X


X is approximately
Normally distributed (using CLT) as covered in Topic 5

X to be Normally distributed

Do say:
say:
There is a 95% chance that the calculated confidence
interval includes .

; OR

AND is known.
There are occasions when is known and is not. But it is much
more common for to be unknown as well.
(we will come back to this in Topic 7)

The statement should be made in terms of the


confidence interval being the variable
Dont say:
say: There is a 95% chance that lies in the
confidence interval - it either does or it doesnt.
11

For the height example, s = 9.84, so it is reasonable, on this


evidence, to assume is still 15 (same as in the past).

12

(test of significance)
The other approach to making conclusions about the population
mean is to use the sample mean to test for a specific hypothesised
possible value of .
i.e. Is the value of xx evidence that could be or could not be a
particular value?
hypothesis testing

We have observed: x = 179.3 n = 10 = 15.0 (assumed)


There are only two possibilities for this years male STAT171
population:
is still 175 and we have observed a sample mean of 179.3,
due purely to chance variation.

Whereas the approach that we have just used is:


what does x tell us about the likely values that could be?
confidence interval

is some other value (unknown, but not 175),


175)
and as a result, we have observed a sample mean of 179.3
due to chance variation about this different (but unknown) .

Both approaches are equally valid (and use the same statistical
theory).
13

In general, X ~ N(
, 2/n) and

The population mean for previous years was 175 cm.

carry out a test of significance

14

If is not 175, then

X
~ Z
n

X 175
15 10

is

NOT

~ Z

because we have subtracted the wrong .


X ~ N(
N(
, 152/10) so
In this case X

X
~ Z
15 10

x 175
10

So we can test whether is 175 by testing whether


15
is a z value.
If the test statistic is a z value, then could be 175.

i.e. that the mean height of this years


Hence, if = 175, STAT171 males is the same as in the past

then,

X 175
~ Z
15 10

If the test statistic is not a z value, then we have evidence


that is not 175.
So is it a z value ??
15

16

z-value believable (high prob of obtaining)


Problem:
Problem:

 our sample was believably from a population with the specified mean.

A z value can be any value in the range - to +.

z-value not believable (low prob of obtaining)

So, we can only determine if it is a reasonable


(or believable by chance) z value.

 our sample was NOT believably from a population with the specified mean.

For our example the test statistic is: zobs =


We can calculate how likely it is to observe by chance a z value
such as we have calculated.
The higher the probability of obtaining the observed mean value
(or one even more extreme), the more willing we will be to believe
that for this year is still 175.

x 179.3 175
=
0.91
n
15 10

We determine whether 0.91 is a reasonable z value by calculating


the probability of obtaining a random z value equal to the observed
z, or a more unusual one (i.e. a z value less favourable to a of 175).
Using z-tables we get
p = P(| Z | 0.91)
2 * 0.1814
0.3628

17

That is, if the true mean were 175, and we take a sample of size
10, we would expect to get a sample mean of 179.3 or a sample
mean further away from 175 (at least 4.3 cm in either direction)
in 36% of the samples, just due to chance variation.
variation

18

Therefore, with a probability this high (~36%) it is quite


reasonable to assume that this year is still 175 and that the
deviation we have observed is explained merely by chance
variation.
That is, there is no substantial (or statistically significant)
significant
evidence in our sample to be able to conclude that is not 175.

Another way of thinking about it:


If = 175 is true, we can expect to observe variation in the
sample mean at least as big as that obtained (4.3 cm) in 36% of
our samples.

This indicates that we do not have an unusual sample for a


population with = 175.
A sample as extreme as the one we have would happen more
than one-third of the time by pure chance (due to random
variability).

This probability is called the p


p-value.
value
19

20

However it is possible that is some other value (e.g. 178) and


that the sample mean this year is randomly varying about that
population mean.

denoted by (alpha)

There is no way we can be sure of either conclusion


(that is or is not 175),
since we have based our conclusion on a probability.

is the maximum probability that will allow the hypothesised value


to be rejected as not likely or not believable.

A probability of 0.363 is reasonably easy to make a decision on.

It (the significance level ) is the cut-off between what is considered:

But what if the probability were

reasonably likely through pure random variation

0.12 (i.e. 12%)

versus

or

0.018

not believable through pure random variation alone

or

0.003 etc

 We have to declare a cut-off!


21

if p >
We conclude that the observed z-value is reasonably likely.
Hence the sample mean is close enough to the hypothesised value
to conclude that the difference can be treated as due to chance.
i.e. Not reject the hypothesised value.
if p
We conclude that the observed z-value is reasonably unlikely.
un
Hence the sample mean is so far away from the hypothesised value
that we can conclude that it is not reasonable that the difference
could have occurred due to chance alone.
i.e. Reject the hypothesised value.
23

22

The level of significance determines the extent of evidence that is


required before the hypothesised value can be rejected as
unlikely and is the choice of the experimenter.
An of 0.05 indicates that the test statistic would have to occur,
by chance, less than 5% of the time, before the test statistic and
hence the hypothesised value could be classified as unlikely.
An of 0.01 indicates that the test statistic would have to occur
less than 1% of the time, by chance, before the hypothesised value
could be rejected as unlikely.
The most common level of significance used is = 0.05 (but it is
your choice).

24

2. Assume that H0 is true and calculate the observed test


statistic value, using 0 = the hypothesised (numeric) value for .

(all tests of significance follow these steps)


1. Set up a null hypothesis H0

- in this case, we have the z-test with test statistic:

and an alternative hypothesis H1


(some texts use Ha - very American)

Z=

and a level of significance (alpha)


The H0 is always for no effect or no change or no difference.
H0: = 0
H1: 0

0 is the generic notation for the numeric


value of the population mean it is a number!

H1: 175

= 0.05

- with observed value:

zobs =

 if we reject H0 we will choose to believe H1


In this case: H0: = 175

X
n

Here 0 = 175, but we still dont


know what actually is, and
most likely never will know.

x 0
n

The zobs is a number.

25

26

4. Compare the p-value with , the level of significance and hence


either reject or dont reject H0.

3. Calculate the p-value, i.e. the probability of obtaining a test


statistic at least as extreme as the observed value, just due
to chance (if the null hypothesis is true).

if p >  Retain H0
evidence to be able to conclude
there is insufficient
in
that the true mean is not 0.

In this case:

i.e. the sample mean is not significantly


different from the hypothesised mean

i.e. p-value = P(| Z | | zobserved |)

if p  Reject H0
there is sufficient evidence (at that level of statistical
significance) to be able to conclude that the true
mean is not 0.

27

i.e. the sample mean is significantly


different from the hypothesised mean

28

(a significance level of 5%)


For the general case, testing at the % level of significance:

We reject H0 if p 0.05:
This is equivalent to obtaining an observed z greater than 1.96 or
less than -1.96 since P(-1.96 Z 1.96) = 0.95
We retain H0 if p > 0.05:
A zobs between -1.96 and 1.96 will result in a p-value > 0.05 and
will mean H0 is retained.

-z/2

1.96 is called the 5% critical value

z/2

Values of z in this range are deemed


reasonably likely to occur by chance
 Retain H0 with Prob = 1

So for the general case, it doesnt matter if:


we compare p with and reject if p
or

Values of z outside this range are deemed reasonably unlikely


to
un
occur by chance  Reject H0 with prob =

we compare zobs with zcritical and reject if | zobs | z/2


29

Both of these probabilities are only these values in the case of H0 being true

30

Again, in carrying out the z-test we are assuming that:


X is Normally distributed. That is:
 X is Normal or
 n is large enough for the CLT;
and

NEVER say that is 175 or is not 175.


Whether we reject or retain H0, we could be making a mistake.

and that we have a


representative sample

For the above example, if we tested that = 176, = 175.5,


= 180 etc, we would retain H0 in every case.

is known.

This does not mean we believe each of these values to be correct


it means that we do not have sufficient evidence to disbelieve
that value as the true population mean height for this year.

Basically the z test is in the form:


zobs

= x - 0
s.e.(x )

= statistic - E[statistic | H0 true]


s.e.(statistic)

This is comparing the distance the observed


sample mean is away from 0 relative to the
inherent variability of the X random variable.

31

32

If we retain H0:
The possibilities are we could be doing so because:

For the decision Retain


Retain H0

1. we used the correct value for ;


OR
2. we used an incorrect value for , and because of an unusual
sample, we obtained an unlikely z value.

 The sample mean is not significantly different from the


hypothesised population mean value;
If referring to the sample mean ...
talk about statistical significance

If we reject H0:

OR

The possibilities are we could be doing so because:


1. the value for is in fact different from that hypothesised;
OR
2. we used the right value for , and because of an unusual
sample, we obtained an unlikely z value.

 There is insufficient evidence to conclude the population


mean is different from
 etc.

If referring to the population mean


... talk about statistical evidence
34

33

For the decision Reject


Reject H0

Always state clearly the level of significance.

 The sample mean is significantly different from the hypothesised


population mean value; If referring to the sample mean ...
talk about statistical significance
OR

For example, if we obtained a p-value of 0.03 - we would:

 There is sufficient evidence to conclude the population mean is


different from
If referring to the population mean
... talk about statistical evidence
If H0 is rejected, we can see the
direction of the difference by
looking back at the data:
*X can be sig. less than 0
*X can be sig. greater than 0

Remember that all


decisions are made at a
pre-specified
significance level.

reject H0 if we are testing at the 5% level of significance.


not reject H0 if we are testing at the 1% level of significance.
Not a problem if we are aware of our level of significance.
In effect here, the sample mean was far enough away from the
hypothesised mean to:
confidently reject it at the 5% level,
but not far enough away to reject it at the 1% level of significance.

35

36

The choice of the level of significance is very important.


= 5% is usually the best level - by far the most commonly used.

There are two types of errors possible when making a decision in


a hypothesis test.

= 1% is also often used.


Whatever choice is made, the level of significance should be
determined in advance of the test.
i.e. determine in advance (of seeing the data) the level of evidence
required.
General terminology:

This occurs if H0 is true and we incorrectly reject it.


Type II error
This occurs if H0 is false and we incorrectly retain it.

If you reject H0 at the:


5% level: result is said to be significant
1% level: result is said to be highly significant
37

This occurs if H0 is true and we incorrectly reject it.

These are both the result of obtaining an


unusual sample.
The problem is, we NEVER know if we have
an unusual sample or not.

38

This occurs if H0 is false and we incorrectly retain it.

because we have observed an unusual sample, but we dont know that

because we have observed an unusual sample, but we dont know this

The probability of this error type IS the significance level .

The probability of this error type is given the notation


::
= P(Type II Error)

= P(Type I Error)

= P(Retain H0 | H0 false)

= P(Reject H0 | H0 true)
because the test is designed so that if H0 is true, we will reject % of z-values

Evaluating Type II error rates is messy they depend on and the


true value of . The further the true is from the hypothesised
value 0 , the easier it will be to reject H0
e.g. P(Type II error | = 180)

Some professions (medical, psychology) call the probability of the


complement of this error type the SPECIFICITY of the test.
Specificity = 1 - = P(Retain H0 | H0 true)
The probability of CORRECTLY retaining H0

Type I error

P(Type II error | = 185)


39

40

There is always the possibility


of making a wrong decision !

Legal Analogy:
A Type II error occurs when H0 is false, and we incorrectly retain
it .

Significance Test

The complement of a Type II error occurs when H0 is false, and we


correctly reject it .

H0 not rejected

Decision

Statisticians call the probability of making this correct decision the


POWER of the test

H0 rejected

Power = P(Reject H0 | H0 false)


=1-

Jury trial
Decision

Situation in reality
H0 true

H0 false

Type II error

Type I error

Situation in reality
Person innocent

Person guilty

Error

Error

The probability of CORRECTLY rejecting H0

Declared not guilty

Some professions call this the SENSITIVITY


SENSITIVITY of the test
(we will call it power)

Declared guilty
41

Minimising error probabilities

42

Relationship between and (one-tailed test)

Unfortunately, we cant make both error probabilities (


and )
low at the same time.
As

decreases
de

increases
in

As

increases
in

decreases
de

The compromise between the two error rates depends on the


cost of each of the error types (cost in various terms
money, human lives, environmental pollution etc.)
The choice of determines the relative importance of the two
types of error.
For most tests, if in doubt, the best general compromise is an of 5%
will usually give a reasonably low (chance of missing the difference if one exists).

However in some circumstances, there will be reasons to have a


very low or a very low .

43

Picture 1: standard
= 0.05  = 0.7405

We can see:

Picture 2: lower
= 0.01  = 0.9074

and

Picture 3: higher
= 0.20  = 0.4371

44

So, at the preliminary screening stage, we want to continue testing


all drugs which may end up being useful 

In trialing drugs to lower blood pressure, the


null hypothesis H0 would be that the drug has
no effect.

a Type I error would not be costly


(doing a bit more testing on a useless drug)

If doing a preliminary screening (lab testing) of a number of drugs:

a Type II error would be costly


(throwing out a useful drug)

a Type I error
would occur if there is evidence the drug is useful (when it isnt)
would lead to more intense testing  only a little bit costly (time and money)

So, set a very high P(type I error),


say = 0.10

a Type II error
would occur if there is insufficient evidence the drug is useful (when it is)

gives a low P(Type II error)

would stop further testing  very costly (in terms of lost potential benefits)
45

In the final stages,


stages if trialing a new drug to lower blood pressure (for
example):
a Type I error would be saying the drug is useful (when it isnt)
 a LOT costly!
Spending masses of money on manufacturing and advertising etc.
a Type II error would occur if we missed a possible useful drug
and dont invest in it
 somewhat costly
(in terms of lost potential benefits to humanity)

46

So, in the final stages the error types have a very different cost than in
the initial stages.
For the blood pressure drug example:
a Type I error would be costly
(manufacturing and marketing a useless drug)
a Type II error would be not be so costly
(not developing a useful drug)
We want P(Type I error) to be very low at the expense of a high P(Type II error).
We want the probability of marketing an ineffective drug to be close to zero.

but more drugs are always being tested.


47

So, set a very low P(Type I error),


say = 0.005

48

Decision based on Confidence Interval

Decision based on zz-test


+1.96 and -1.96 are the cut-off (critical) values for the z-score in a
5% test of significance.
retain H0

zobs =

x 0

reject H0

zobs =

< 1.96

x 0

The 95% confidence interval is the range of values of 0 that


would be retained if we carried out a 5% test of significance.

1.96

CI contains 0  retain H0
CI does not contain 0  reject H0

The significance level % is the maximum chance of false


rejection of H0 we are prepared to put up with.

In general terms: The 100(1 - )% C.I. is the range of values of 0


that would be retained if we carried out an % test of significance.

For a 95% confidence interval for , we used

P
< 1.96 = 0.95
n

In both cases (the z-test and the confidence interval) we used


exactly the same rejection regions.

confidence level = 1 - significance level

= 1

49

50

The 95% C.I. for (the true mean height this year) was (170.0 , 188.6)

Example:
Example:

Testing H0: = 175

The drying time of paint that is being marketed


varies depending on the conditions
(temperature, humidity, air movement, etc.).

versus

H1: 175

 175 lies in the 95% confidence interval for .


Retain H0 at the 5% level of significance.
 There is insufficient evidence to be able to conclude that the true
mean height is different from 175 cm.

Testing H0: = 165 versus

H1: 165

 165 lies outside the 95% confidence interval.


Reject H0 at the 5% level of significance.

A new additive is being assessed as to whether


it decreases drying time (a good thing!).

Based on this sample,


there is evidence the true
mean height for this years
male STAT171 students is
greater than 165 cm.

 There is sufficient evidence to conclude that the true mean


height is different from 165 cm.
51

Without the additive, the drying


time of the paint follows a Normal
distribution with a mean of 75
minutes, and a standard deviation
of 9 minutes.

52

In this case we are only interested in the new paint (with the
additive) if it DECREASES (typical) drying time.

The new drying agent is added to see if it improves


(lowers) the drying time.

If is still 75 (no change in mean) or is greater than 75, (increase


in average drying time) we are NOT interested.

If there is a sufficient decrease in (average) drying


time, the new paint will be marketed.

Hence our test becomes:

An experiment is carried out to test the new paint.

H0: = 75

For this sample of drying times:


n = 25

x = 71.5

H1: < 75

= 9 (assumed)
We carry out a one tailed test instead of a two tailed test, since we
are only interested in more deviant outcomes that support H1.

Q: Has the additive improved the drying time?


i.e. Do we market the new paint?
53

Our test statistic is still

Z=

To evaluate the p-value, we must assume the requirements for the


z-test are satisfied:

X 0

X is normally distributed
BUT:
BUT What is the distn of X?
 no data here, so cant assess 
 told in question X is normal

but now, we have to take into account the sign.


[In two tailed tests we could ignore the sign.]

 sample size = 25, so even without knowing the X distribution


is normal, the CLT should apply ... so it should be is safe to
assume the distribution of X is approx normal

The observed value of the test statistic here is

zobs =

54

x 0
71.5 75
=
1.94
9 25
n

= 9  variability of drying time is the same with and


without the additive

55

56

Comparison of rejection regions

p-value P(Z -1.94) 0.0262

Two tailed test H1: 0

The RR is the left hand tail of the z-distn,


since the alternative is < (less than)

Reject H0 at the 5% level of significance.


We can conclude that there is evidence at the 5% level of
significance that the additive does decrease the average drying
time of the paint.

H1: < 0

One tailed tests

H1: > 0

We could also have compared zobs with the


5% one-tailed critical value ( -1.645)
Reject H0 since zobs < z0.05
(i.e. zobs lies in the rejection region)

 Same conclusion.

57

58

For a one tailed test:


H0: = 0

new drug to decrease blood pressure

H1: < 0

new fertilizer to increase yield

If x > 0  then there is absolutely no evidence that < 0.


Obviously we will retain H0 with a very strong belief (no matter how
large x).
x).
There is no use in formally doing the test.

new packaging to improve sales


new diet to reduce weight
new technique to increase reading speed

For the paint example:

etc, etc

If x = 76.2 Retain H0 at any level of significance.

In each case, we are only interested in whether the new situation is


better.

Similarly for H1: > 0


We will retain H0 automatically if x < 0
59

We dont care to distinguish between the new situation having no


effect and being worse.
60

BUT: There is often no right or wrong answer to the one versus


two tailed test.

By choosing between a one tailed or a two tailed test


you are changing the rejection and non-rejection
regions (and hence the p-value).

It depends on what your research question is.

So you require a reason before you can do a one tailed test the
reason would be based on the question of interest, or the research
hypothesis, as this dictates the H1.

Dont let the data suggest a one tailed test.


Warning (2):
(2): So far, we have only seen confidence intervals which
are two tailed: (x something )

The only real difference in carrying out a one or two tailed test is
the calculation of the p-value.
Everything else is the same.

61

OneOne-sided confidence intervals

62

P < X + 1.645
= 0.95
n

From standard normal tables, we know that


P(Z > 1.645) = 0.95
But if X is normally (or approx) distributed: Z =

The medico would almost certainly do a one tailed


test is the drug worth using to lower blood pressure?
A physiologist or a chemist might want to do a two
tailed test - is the drug affecting blood pressure in any way?

If you dont have a reason  do a two tailed test.

!! You cant do a one tailed test using a twotwo-tailed


confidence interval.

e.g. In testing the effect of a drug on blood pressure:

This gives us an upper limit on the believable value of .


X
n

We get an unbounded 95% confidence interval for ,


being , X + 1.645

So, we can make one directional statements:


X

> 1.645 = 0.95


P
n

This interval can be used to test the


alternative hypothesis H1: < 0
If 0 is in the interval, retain H0
 not enough evidence to conclude < 0

P > X 1.645
= 0.95
n

If 0 is not in the interval, reject H0


 enough evidence to conclude < 0

P < X + 1.645
= 0.95
n

63

If the value of 0 is
outside the 95% onesided CI for , we
would have evidence
that the population
from which the sample
was taken has a mean
less than the value 0 .
64

A greater than alternative hypothesis can also be tested:

This gives us a lower limit on the believable value of .

For H1: > 0

We get an unbounded 100(1-)% confidence interval for ,


being

, +,
X + 1.645
n

and a general significance level ,


P ( Z < z ) = 1

This interval can be used to test the alternative


hypothesis H1: > 0:

P
< z = 1
n

If 0 is in the interval, H0 is retained

z is positive it is
the value which
cuts off an area of
in the upper tail

P < X + z
= 1
n

P > X z
= 1
n

If 0 is not in the interval, H0 is rejected

65

66

Summary of oneone-tailed C.I.s


Testing
versus

H0: = 0
H1: <

< e.g. H0: = 100

The experiment on the new paint gave the following summary


data for drying time:

H1: < 100

P-value = P(Z < zobs )  reject if p-val 0.05


100(1-)% CI for
= ( , X + z*/n )
Testing
versus

n = 25 , x = 71.5 , = 9 (assumed)

 retain H0 if 0 is in CI eg ( , 102)
 reject H0 if 0 not in CI eg ( , 96.7 )

Has the additive improved the drying time?


H0: = 75
H1: < 75

H0: = 0
e.g. H0: = 100
H1: > 0 >
H1: > 100

We need a CI with an upper limit, so


95% CI for = ( , x + z0.05 * /n )
= ( - , 71.5 + 1.645 * 9/25 ) ( - , 74.461 )

P-value = P(Z > zobs )  reject if p-val 0.05


100(1-)% CI for
= (X z*/n , + )

 retain H0 if 0 is in CI eg ( 98.3 , + )
 reject H0 if 0 not in CI eg ( 102 , + )

All the theory of two-tailed tests and confidence intervals holds.


The only difference is all the Type I error probability is being put
into one tail, not two.

We can reject H0 .
 there is evidence that the true mean is less than 75 minutes
 market the new paint
67

68

Using Minitab for Height example:


example

Using Minitab for Paint example:


example

> Stat
> Basic Stats
> 1-Sample Z

Minitab can also be used when you


only have the summary statistics.
> Stat
> Basic Stats
> 1-Sample Z

MTB > OneZ Height;


SUBC>
Sigma 15;
SUBC>
Test 175.
Test of mu = 175 vs mu not = 175
The assumed standard deviation = 15
Variable N
Mean
Height
10 179.30

StDev
9.84

SE Mean
4.74

95.0% CI
Z
P
( 170.00, 188.60) 0.91 0.365

69

Paint drying time example

How to choose the direction of the alternative hypothesis

Correct One tailed z test:


One-Sample Z
Test of mu = 75 vs < 75
The assumed standard deviation = 9
95% Upper
N
Mean SE Mean
Bound
Z
25 71.50
1.80
74.46
-1.94

(From a discussion forum in 2015)

This is not always a simple choice.


What the null hypothesis direction is and what the alternative
hypothesis direction is DEPENDS on what the purpose of the study
is ... and sometimes it is not purely determined by that!
But ... Think of the idea behind inference using hypothesis tests:

P
0.026

For a two tailed z test:


One-Sample Z
Test of mu = 75 vs not = 75
The assumed standard deviation = 9
N Mean SE Mean
95% CI
Z
25 71.50
1.80 (67.97,75.03) -1.94

P
0.052

For the other one tailed z test:


One-Sample Z
Test of mu = 75 vs > 75
The assumed standard deviation = 9
95% Lower
N
Mean SE Mean
Bound
Z
25 71.50
1.80
68.54
-1.94

70

P
0.974

Using the wrong


alternative can lead
to a different
conclusion 

H0 is the "status quo" ... nothing is different from before, or from


what is being claimed. You need to get evidence against this.
H1 is the "action hypothesis ... That something has changed or is
not as claimed. You need to get evidence for this.
Examples will be done in tutorials.

71

72

Summary:
All confidence intervals require the following:

All hypothesis tests require the following:

statement of confidence level and parameter involved:


eg a 99% c.i. for D is ...

the null hypothesis: H0: parameter = value


alternative hypothesis H1: parameter or < or > value
significance level = value
a brief statement of any necessary distributional assumptions and justification
(if required)

correct application of the CI calculations


a meaningful conclusion (only if asked for one)

reference to a CI if already evaluated; or


application of the correct test (many more to come)

For hypothesis tests, use a significance level of 5% (


( = 0.05)
unless otherwise specified.

[one or two sample z, t (pooled or unpooled), paired t, etc.]

evaluation of the test statistic value (and df if appropriate)


evaluation of the p-value (or an interval in which the p-value lies)
then
statement of decision (Reject or Not reject)
a meaningful conclusion
(in terms of the question of interest, including direction of change)

For confidence intervals, use a confidence level of 95% (1-


)
unless otherwise specified.
73

74

A full hypothesis testing example


A sceptic (Eleanor) does not believe that the die is
loaded in this particular way.

Consider the case of the loaded die as dealt with in


Tutorial Exercises.

Eleanor tosses the die 240 times and records


her results.

P (Y = y ) =

y
; y = 1, 2,3, 4,5, 6
21

The total of the uppermost face values was equal


to 993.

It was shown that the expected value and variance of the value (Y)
on the uppermost face of the die were:
13
1
E (Y ) = = 4
3
3
Var(Y ) =

20
2
=2
9
9

; and
Y =

20
1.490712
9

This information is to be used to test the null hypothesis


that the die is loaded in the specified way.
That is, we will test the hypothesis that the average result on the
uppermost face for the die is 4.33.
Assumptions are to be stated (with a brief justification, if required).

75

76

Example
Hypothesis test: H0: = 4.333
versus H1: 4.333

Dealing with the sample mean, the test statistic is Z =


at

= 0.05

X
n

993
= 4.1375
240

1
20
Given: Distribution as stated with = 4 , =
3
9

The sample mean is x =

Assumption: X
X is approximately normally distributed.

We need to use the normal approximation to the exact sampling


distribution of X, which has

Justification:
Here, X is discrete and skewed, so X
X will also be discrete and skewed.
skewed

Mean = E ( X ) = 4
se ( X ) =

However, the approximate normality for the sampling distribution


of X is a reasonable assumption:
the sample size n = 240 should be
sufficiently large for the Central Limit Theorem to work here.

X
n

1
3

20 9
240

1
0.096225
108

=
77

The sample mean X is discrete, so we can obtain a more accurate


approximation from the normal if we use a continuity correction.

zobs =

For the total, the cc would be ,


but we are dealing with the mean = total/240

p-value 2*P(Z 2.0135) 2 * 0.0222 = 0.0444

78

1
1
4.1375 +

4.33
0.19375
2n
480
=

2.0135

0.096225
n
2.22 240

x+

 here, the cc is ()/240 = 1/480 .

Reject the null hypothesis (at = 5%).

To obtain the p-value, we need the area in both tails ...

The mean from the sample is significantly different from the


hypothesized mean for the biased die.

 but the sample mean (4.1375) is less than


the hypothesised mean (4.3333) ...

Hence, there is sufficient evidence to refute the claim that this die
is biased in the way claimed.

 and we want to include our observed mean in the tail area ...
 so the cc here is to add 1/480

Evidence suggests the die has a smaller mean than stated.


 the observed value of the test statistic is

79

80

Some points to note:

(ii) Working with the total, we get the same result as with the
mean:

(i) Without cc, we would get:

zobs =

x
4.1375 4.33
0.195833
=

2.035
0.096225
n
2.22 240

x + 2 n
i

zobs =

p-value 2*P(Z 2.035)


2 * *(0.0212+0.0207)
0.0419

i =1

1
993 + 1040
46.5
2
=

2.0135

23.094011
240 2.22

(iii) We can avoid the need to take account of whether X


X
is larger or smaller than by using absolute values:

The p-value evaluated using the continuity correction is higher than


the pvalue evaluated without it. In this case the decision is exactly
the same THANKFULLY!

1
1
4.1375 4.33
2n =
480 0.19375 2.0135

0.096225
n
2.22 240

x
zobs =

81

From a 2015 discussion forum

Example: confidence interval:

Q1: For the continuity correction (dealing with the mean X and
not the sum of the observations), ... how do you know when to
add or subtract 1/(2n)?

The relevant confidence interval for the true mean of this die is:

2.22
95% c.i. for = 4.1375 1.96

240

A1:This requires a bit more thought ... but it is fairly simple.


For 'a' integer, to find the area:
- strictly greater than 'a' (>a), the cutoff is 'a+0.5'
- greater than or equal to 'a' (a), the cutoff is 'a-0.5'
- strictly less than 'a' (<), the cutoff is 'a-0.5'
- less than or equal to 'a (), the cutoff is 'a+0.5

( 4.1375 + 0.18860 )
( 3.949, 4.326 )

From Minitab, we get:

For a mean from a discrete distribution (which will be an integer


divided by n), simply divide the value above by n.

One-Sample Z
Test of mu = 4.33333 vs not = 4.33333
The assumed standard deviation = 1.49071
N

Mean

SE Mean

95% CI

240

4.1375

0.0962

(3.9489, 4.3261)

82

-2.04

0.042

83

For hypothesis testing, with a alternative we want the tail area


(including our observed mean) DOUBLED. Using the absolute
value saves having to look beforehand to see if x is < 0 or > 0 . 84

From a 2015 discussion forum


Q2: Is the continuity correction for t-values the same as it is
for z-values?

The accuracy of the point estimate for a population mean is


measured by the margin of error (which is half the length of the
two-sided confidence interval). The 100(1-)% two-tailed

confidence interval for a population mean is
X z 2

A2: It is the same in application, but is rarely done, as we usually


look at t-tables but this is a good point!

This requires knowing , the confidence level, and that X is


normally distributed (or n is large enough to apply the CLT and
assume an approximate normal distribution for X).
If we wish to set the maximum distance of the estimate X from
the target to be B (boundary), we get the inequality

z 2

z 2
n

85

The 100(1-)% one-tailed confidence interval for a population


mean is either , X + z or X z , +

Recall that in past years, the height of male STAT171 students has
followed a Normal distribution with = 175 cm & = 15cm.
15cm

If we wish to have a margin of error of no more than 5 cm in a 95%


confidence interval for (this years population average height),
we must take a sample of at least size n, where
2

Here, setting a limit on the one-sided bound results in the


2
inequality
z
n

For the paint drying example, if we want the limit of the CI to be


within 4 minutes of the true average drying time, we would need

z 2 1.96 15
2
n
=
= 5.88 = 34.5744
5

B
2

z 2.33 9
2
n =
= 5.2425 27.48
B 4

Hence, n 35.
A sample size less of 34 or less will cause the margin of error in
estimating to be greater than 5 cm.

86

that is, a sample of at least 28 paint trials.


87

88

The Orion Nebula


When designing an experiment or survey, the most common
question asked is ...

1300 light years away, 24 light years across.

with the naked eye

To naked eye (not a


powerful tool) a dot,

What sample size should be used?


used?
Two issues must be considered when determining a required
sample size:

but with a good (powerful)


telescope much can be
distinguished.

1) What is the minimum difference youd be interested in finding?


For example, the population average used to be 0.75 and wed be interested if
we knew it was 0.73 or smaller, but not anything more (anything above 0.73
will not be actioned such as manufacturing a new drug).

You are using a statistical


telescope when running a
hypothesis test need to
know how strong it is.

2) How sure do you need to be that youve found such a difference?


We usually want this to be fairly high, such as 95% or 99% etc.

The tool we need is power and this should be borne in


mind whenever doing hypothesis testing!

with a fantastic telescope

89

90

That is  Power is the probability we reject H0 when H0 is false.

Recall from Topic 6 there are two types of error which can occur
when testing a hypothesis:

Power is the conditional probability we make the right decision when


there truly is something different from the null happening.

Type I error:
error this occurs when H0 is true, and we wrongly declare
it to be false:
= P(Type I error)
= P(Reject H0 | H0 true)

Power = P(Reject H0 | H0 false)


=1-

Type II error:
error this occurs when H0 is false, and we wrongly declare
it to be true:
= P(Type II error)
= P(Retain H0 | H0 false)

We would prefer power to be 100%, but due to inherent variability,


that can only happen if we always reject, no matter what (even when
H0 is true).

 Power is the probability of NOT making a Type II error.

 This would mean our Type I error rate () was also 100% .

91

92

We need to be able to evaluate power in a given situation to see


how we can increase the probability of finding a true difference
without also increasing the probability of finding a false
difference.

Returning to the paint example:


H0: = 75
H1: < 75
x = 71.5 minutes

That is, for a fixed Type I error rate, how can we improve the
power?

zobs =

We concluded that there was evidence that the new additive reduced
the average drying time of the paint.

93

e.g.

71.5 75
0.870
9 5

Dont reject H0
Now (with n=5 rather than n=25) we cannot conclude that the
additive has a significant effect on the average drying time.

For n = 10,000: zobs


p-value = P(Z -38.89)

94

For very small samples it is often difficult to obtain a


significant result even for large observed differences.
For very large samples - you can quite often obtain a statistically
significant result even for very small, even trivial, observed
differences.

71.5 75
=
38.89
9 10000
0.00

For n = 10,000 what is the minimum difference that we can


declare significant at the 5% level of significance (for the paint
drying example)?

Reject H0 at any significance level.


Now we are very confident that the additive decreases the
average drying time of the paint.

What would have happened if we had obtained the same


x of 71.5, but from a different sample size?
Would it affect the conclusions?

The sample size can affect your conclusions because the larger the
sample size, the more confident we can be about the sample mean
as an estimate of .

p-value = P(Z - 0.870) 0.1922

e.g.

71.5 75
1.94
9 25

p-value = P( Z - 1.94) = 0.0262 Reject H0 at the 5% level.

We will start by investigating how the hypothesis test itself is


affected by things we can control such as sample size and
significance level.

For n = 5
5:: zobs =

= 9 n = 25

95

96

zobs =

Practical significance vs Statistical significance

x 75
9 10000

If a result is statistically significant, we are saying that we are


reasonably confident that the actual population mean is different
from the hypothesized value.

will be declared significant if zobs is less than (or equal to) -1.645
(recall, H1 is < 75).

That difference may or may not be of interest


or importance to us.

So the cut-off value of xx is given by:


xx - 75 - 1.645 * 9/
9/10,000
10,000
x 75 - 1.645 * 0.09 = 75 - 0.148 = 74.85
For n = 10,000, any sample mean of 74.85 minutes or less will be
declared (statistically) significantly less than 75 at the 5% level of
significance.
i.e. a 9 second (or more) difference will be declared significant at
the 5% level.

We need to decide what difference is meaningful in terms of our


experiment.
For the example:
example
We need to think in terms of what difference in
average drying time of the paint is marketable.
For example, we might decide to market the paint only if the
decrease in average drying time is more than 2 minutes.
minutes

97

98

If we observe a difference of two minutes (or more), and it is


significant, we will market the paint.
If we observe a difference of less than two minutes, we wont
care if it is significant or not because we wont market the
paint anyway.
So, if we observe a difference of two minutes or more, we want
the test to be significant at the 5% level.

How large does n have to be to be able to declare such a


Practical sig
(diff =2)

(later we may want to increase the sample size


further, for other reasons)

We want: zobs
99

result significant?

73 75
9 n

Statistical sig
= p-val 5%

1.645
100

Just a matter of solving for n  73 75 1.645


9

If we are able to apply a z-test ( is known and X is normal or


approx normal) for testing

2 n
1.645
9
n 1.645

H0: = 0

9
= 7.4025
2

versus H1: < 0

n ( 7.4025 ) 54.80
2

at the 5% level, we will reject H0 if:

Hence n 55 (as n is integer)


where:

If we have a sample size of 55 (or more) and we observe a decrease in


average drying time of at least 2 minutes, we know in advance that the
result will be significant at the 5% significance level.
If we have a sample size of less than 55 and we observe a decrease of
exactly 2 minutes (or anything less) the result will not be significant
at the 5% level.

x 0 is the minimum difference required for action;


zcrit comes from the tolerance (how often we are prepared
to be wrong when there is no difference);
needs to be known (or estimated from past work).
101

So, once we have decided on what values for these are to be used,
we solve the equation to obtain the minimum sample size that will
achieve this.
 Solve for n:

x 0
1.645
n

x 0
1.645
n

102

If we do not know that X is normal, we have to assume (hope) that


n will be large enough for the CLT (Central Limit Theorem) to
work, so that the distribution of the sample mean will be
approximately normal. Check this after the calculations are done.
Note:
Note will probably be unknown and an estimate will have
to be used instead.
We can use the t-distribution to get the critical value, but this
depends on the degrees of freedom, which depends on the sample
size n, which is what we are trying to determine. 

 In general, for a one sided z-test, the minimum sample size


needed is:
2


n z

To solve the dilemma, use the zcrit as this will give a rough idea in
helping to plan the experiment.
103

104

For any test of significance, there are two possible correct outcomes,
and two incorrect outcomes. Recall:

We should plan what sample size would be required to achieve


significance, before setting up the experiment.
Requirements:
Requirements

H0 true

1. An idea of what practical significance is desired - i.e. the


minimum difference x 0 that would be meaningful.
2. An idea of the likely value of chance variation (from previous
experiments, the literature, etc).
3. Use of the appropriate formula to determine the minimum
sample size required to achieve significance.

Type II ERROR

H0 rejected

Type I ERROR

The errors are the outcomes of making a wrong decision.


= P(Type I error) = Prob(Reject H0 | H0 true)
= P(Type II error) = Prob(Retain H0 | H0 false)

105

The probability of making the correct decision when H0 is false is


called the POWER of the test.

The aim is to minimise for a fixed (usually 5%).


106

Before carrying out the experiment, we decide we will be doing a


one-sided test at the 5% significance level for testing:
H0: = 75
versus H1: < 75
we know:
= 9 and n = 25

Power = Prob(reject H0 | H0 false)


= 1 - Prob(retain H0 | H0 false)
=1-
So, minimising and maximising power are the same thing.
(and hence Power) depends on:
sample size (larger n  higher power )
 BUT this makes the
(higher  higher power )

Type I error rate higher

the true value of the parameters.


We cannot make and smaller together, so we choose as the
maximum Type I error rate we are prepared to put up with.

H0 retained

They have CONDITIONAL probabilities:

Warning:
Warning We may observe sample statistics that are entirely
different from what we are hoping/expecting to observe.

whether the test is one or two tailed

H0 false

We can determine the rejection region in


terms of the value of the sample mean.
mean
When we carry out the test we know
we will reject H0 if zobs -1.645
That is reject if: x 75 1.645
9 25

9
25
x 75 2.961 = 72.039

x 75 1.645
107

108

We know that if H0: = 75 is true, the sampling distribution of


the sample mean is:
2

9
X ~ N 75,
~ N 75,1.82

25

i.e. If we carry out a one-sided z-test at the 5% level of significance


to test
H0: = 75 (vs <) with a sample size of 25,

we will:

reject H0 if X 72.039
not reject H0 if X
X > 72.039

So we can calculate the power of the test in advance by calculating


the probability that X
X 72.039 for different values that (the
true average drying time) might take.

The red area is 0.05,


giving the rejection
region in terms of
values of the sample
mean.

Note that we have no idea:


what the true value of is, or
what value of X will be observed.
109

110

Power = P ( Reject H 0 H 0 false )

= P X 72.039 = 70

That is, if the true mean is 70 and we carry out a one-tailed test
for = 75 at the 5% level of significance testing :

there is a 5% chance that we would reject H0 when we


shouldnt (we have set this rate).
there is an 87% chance that we would reject H0 when we
should - i.e. make the right decision.

X 72.039 70

= 70
= P

9 25

2.039

= PZ

1.8

P ( Z 1.133)
0.8708

We can drop the conditioning here,


as the Z has a standard normal
distribution (if we have conditioned
on the correct value for ).
111

 BUT there is a 13% chance that we would retain H0


- i.e. make a Type II error.

112

We can plot the distribution of the sample mean:


under H0: = 75,
or when true = 70 (or any other specific value).
The test statistic will be significant if X is less than (or equal to) 72.039.

Reject H0
87%
5%
What would happen to the power (area under the black density curve to the left
of 72.04) if:

Reject H0

the true mean is less than 70?

the true mean is greater than 70 (but still less than 75)?

87%

5%

113

When = 65:
65

When = 71:
71 Power = P ( X < 72.039 | = 71)
X 72.039 71

= 71
= P
<
n

9 25

P ( Z < 0.577 ) If is 71, there is a 72% chance of


(correctly) concluding it is less than 75.
0.7190

When = 73:
73 Power = P ( X < 72.039 | = 73)
X 72.039 73

= 73
= P
<
n

9 25

P ( Z < 0.534 )
0.2946

114

If is 73, there is
only a 30% chance
of (correctly)
concluding it is
less than 75.

When = 74.9:
74.9

Power = P ( X < 72.039 | = 65 )

Power = P ( X < 72.039 | = 74.9 )

X 72.039 65

= P
<
= 65
n

9 25

P ( Z < 3.911)
1.0000 (close enough)

X 72.039 74.9

= 74.9
= P
<
n

9 25

P ( Z < 1.589 )
0.0559

Power does not exist when H0 is true, that is when =75 here,
but the power approaches 0.05 (the significance level).

Recall, P ( Type I error ) =P ( Reject H 0 H 0 true )


P ( X < 72.039 | = 75 )
P ( Z < 1.645 ) 0.05

115

116

Plotting Power versus possible values of


Table of calculated powers 

- 75

Power

65

-10

1.000

70

-5

0.871

71

-4

0.719

73

-2

0.295

-0.1

0.056

74.9

> MTB >


> Stat
> Power and Sample Size
> 1-Sample Z..

and the plot of the points


(the power curve would join them)

117

Plot of Power versus difference =

118

x 0

Power increases as the sample size gets larger


 we are more likely to pick up a difference with a
larger sample size than a smaller sample size.

Power increases as the true gets further away from the


hypothesised mean 0  we are more likely to pick up a bigger
true difference in population means than a smaller difference.
119

120

From the Session window


We can calculate the Power for various samples sizes for a
particular .

Power and Sample Size


1-Sample Z Test
Testing mean = null (versus < null)
Calculating power for mean = null + difference
Alpha = 0.05 Assumed standard deviation = 9
Sample Target
Diff
Size
Power Actual Power
With a sample size of
-2
3
0.10
0.103843
96, we get a power of
-2
14
0.20
0.208002
70.28%.
-2
26
0.30
0.304417
-2
40
0.40
0.405399
-2
55
0.50
0.501273
If n was 95 (or less),
-2
73
0.60
0.600180
the power would be
-2
96
0.70
0.702800
less than the pre-2
126
0.80
0.802222
specified value of 70%.
-2
174
0.90
0.900859
-2
220
0.95
0.950655
-2
320
0.99
0.990107

We decided to market the paint if there was a difference of 2


minutes or more.
So we can calculate the sample size required to achieve a power of
20%, 30% etc, assuming that the true difference is 2 minutes less
(i.e. true = 73).

121

122

From the Session window

Different Example:
Example: zz-test power curve for different
[Note: this chart was NOT created in Minitab]

Repeating the output from the previous slide:


Diff
Size
Power Actual Power
-2
3
0.10
0.103843
-2
14
0.20
0.208002
-2
26
0.30
0.304417
-2
40
0.40
0.405399
-2
55
0.50
0.501273
-2
73
0.60
0.600180
-2
96
0.70
0.702800
-2
126
0.80
0.802222
-2
174
0.90
0.900859
-2
220
0.95
0.950655
-2
320
0.99
0.990107

1.00
0.90
0.80

H0: = 120
Ha: > 120

Power (1 - )

0.70
0.60
0.50
0.40
0.30

= 0.10
= 0.05
= 0.01

0.20
0.10
0.00
118

119

120

121

122

123

124

125

126

127

128

True Value of

As the significance level is decreased, the Power decreases  the problem


with making the Type I error rate low, is that we are also making the
power low there is LESS chance of picking up a true difference 

To get the graph, cut and paste these values for Size and
Actual Power into columns in the worksheet, then do
Graph > Scatterplot > With Connect Line
123

124

z-test power curve for different n


Paracetemol tablets are marketed as having exactly 500 mg per
tablet. However, the actual amount per tablet varies.

1.00
0.90
0.80

H0: = 120
Ha: > 120

Power (1 - )

0.70

Suppose it in known that the amount of paracetamol in each tablet


follows a normal distribution with a population standard deviation
of = 4 mg.

0.60
0.50
0.40
0.30

n = 45
n = 90
n = 180
n = 360

0.20
0.10
0.00
118

119

120

121

122

123

124

125

126

127

128

True Value of

We would be interested in a two tailed test here, as a difference in


either direction would be a problem. That is,
H0: = 500
H1: 500

Power increases as n increases.


Here, the power does not exist for values of < 120 what is
actually plotted on the vertical axis is P(Reject), not power. 
[Note: this chart was NOT created in Minitab]

At a 5% significance level, we
would reject H0 for zobs -1.96
or zobs +1.96
125

(cont ...)
We now need to convert this rejection region for zobs into one involving
the sampling distribution of the sample average we will first deal
with a sample size of 25.
When we carry out the test we will
reject H0 if zobs -1.96 or zobs 1.96

X
1.96
n

OR

X 500 1.96
X 498.432

4
25

Obtaining the power for = 499 (say)

Power = P X < 498.432 = 499

P X > 501.568 = 499

X 498.432 499

X 501.568 499

= 499 + P
= 499
= P
<
>
n

4 25
4 25

P ( Z < 0.71) + P ( Z > +3.21)


0.2389 + (~ 0)
0.2389

X
+1.96
n
X 500
+1.96
4 25

X 500
1.96
4 25

126

X 500 + 1.96
X 501.568

Power = area under


the black curve
outside the two
blue lines

4
25
127

Reject H0

Reject H0

128

Specify a two tailed


alternative under Options

The power curve for


n=50 is above that
for n=25.

The power curve is


symmetric around a
difference of zero, and
asymptotes to 1 the
further the distance
the true mean is from
0 = 500 here.
129

The power curve for


n=10 is below that
for n=25.

We can increase the power by increasing the


sample size n (keeping , , the same)
n is under our control

130

Power of a test: Conclusions


The power curve for
=0.10 is above that
for =0.05
The power curve for
=0.01 is below that
for =0.05

We have also seen that


power increases as the
difference between the
true mean and the
We can increase the power by increasing the hypothesised mean 0
significance level (keeping n, , the same) increases but the true
value of is NOT under
is under our control
our control!
131

1. The larger the significance level, , the higher the power of


the test (but the higher the Type I error rate).
This is our decision, but it does have a cost
2. The larger the sample size, the higher the power of the test.
This is our decision, but it does have a cost
3. The larger the difference between the hypothesized value and
the true value of the population parameter (mean or
proportion to be seen later), the higher the power
We have no control over this!!
Often the true value of the parameter is not known, and we
calculate the power of the test for a number of possible true
values of the parameter under study. From this we can sketch a
power curve.
132

The text book uses different terminology from the lecture notes.
For a z-test, it uses a large sample (n30) and estimates the
population standard deviation with the sample standard deviation s.

This is technically incorrect.


incorrect
In order to apply a -test,
test we must have:
1. independent observations

Z=

X
n

2. sigma known (population


population st.dev.)
3. either:

(a) X normally distributed; or


(b) n sufficiently large that the sample mean X is
approx normally distributed (application of CLT)

zobs =

x
n

If sigma is not known (the more realistic scenario),


we have a t-test which leads us on to Topic 7

133

You might also like