You are on page 1of 19

z, t, and F tests

Making inferences from


experimental sample to population
using statistical tests
CONTENTS OF TALK
Distributions and probability what is a statistical test?
The normal distribution
Inferences from sample to population: hypothesis testing
Central limit theorem
z tests and example
t tests and example
F tests/ANOVA and example
. and finally.

Statistical tests and SPM


Distribution & Probability
If we know something about the distribution of events, we know something about the
probability that one of these events is likely to occur.
e.g. I know that 75% of people have brown eyes, therefore there is a probability of .75 that the next
person I meet will have brown eyes.

We can use information about distributions to decide how probable it is that the results of
an experiment looking at variable x support a particular hypothesis about the distribution of
variable x in the population.
= central aim of experimental science
This is how statistical tests work: test a sample distribution (our experimental results)
against a hypothesised distribution, resulting in a p value for how likely it is that we would
obtain our results under the null hypothesis (null hypothesis = there is no effect or
difference between conditions) i.e. how likely it is that our results were a fluke!

e.g. in an experiment I measure RT in two different conditions and find a difference between
conditions. I want to know whether my data can statistically support the hypothesis that
there is a genuine difference in RT between these two conditions in the population as a
whole, i.e. that the data come from a population where the means of the two conditions are
different. The null hypothesis is therefore that the means are the same, and I want a
probability of less than .05 of getting the results we obtained under the null hypothesis

A statistical test allows me to test how likely it is that the sample data
come from a parent population with a particular characteristic
The Normal Distribution
Many continuous variables follow a normal distribution, and it plays a special role
in the statistical tests we are interested in;
P(x)
The x-axis represents the values of a particular
variable
68% of dist. The y-axis represents the proportion of members
of the population that have each value of the
1 s.d. 1 s.d.
variable
The area under the curve represents probability
x e.g. area under the curve between two values on
X
the x-axis represents the probability of an individual
having a value in that range

Mean and standard deviation tell you the basic features of a distribution
mean = average value of all members of the group
standard deviation = a measure of how much the values of individual members vary
in relation to the mean
The normal distribution is symmetrical about the mean
68% of the normal distribution lies within 1 s.d. of the mean
Sample to Population
Testing Hypotheses

t,z, and F tests mathematically compare


the distribution of an a normal distribution whose
experimental sample to parameters represent some
i.e. the mean and hypothesised feature of the
standard deviation of population, which you think
your results your results support

How does this work? (without going


through the derivation of the equations!)
CENTRAL LIMIT THEOREM
Central Limit Theorem
Special feature of normal distribution which underlies its use in statistical tests
Take k samples from a population, and calculate the mean of each sample. The
distribution of those means will approximate a normal distribution (for certain
variable types). As k tends to infinity, the distribution of sample means tends to
a normal distribution
Because the means of samples tend towards a normal distribution in this way,
we can convert the mean of our sample distribution (the experimental results)
into a value from a standardised normal distribution.

A z-test achieves this conversion by


P( X ) performing a linear transformation the
equation is given on the next slide
This can be thought of as expressing
68% of dist.
your results and your hypothesis in the
1 s.d. 1 s.d. same units.
so the z-statistic represents a value on
X the x-axis of the standard distribution, for
m
population sample mean which we know all the p-values
mean
z-tests:
What are they?
formula: xm x sample mean

z m population mean

population standard deviation

Plug in the values, and get a z-value which corresponds to a location on the x-
axis of a standardised normal distribution (m=0, =1)
For the standardised normal distribution we know the probability of any
particular value coming from it (area under the curve)
this is what you read off from a table of z-values
Because we are dealing with the probabilities of hypotheses about our sample,
there is always a chance you are wrong. Choosing the significance level
represents how big you want this chance to be
P<.05 = a 5% chance that you would obtain your result under the null
hypothesis (Type 1 error)
z-tests:
Worked Example
Battery of psychological tests to judge IQ from which we have obtained
distribution:
Mean = 50
S.D. = 10
Represents disrtibution of entire population
We would like to find out probability of various scores, for ex. Which are
those scores that are so high they can only be obtained by 10% of the
population
Need to transform the distribution to a STANDARD NORMAL
DISTRIBUTION:
Thus we now have a z distribution z=X-m = X-50
10
No change in the data since new distribution has same shape + observations
stand in same relation to each other (same as converting inches to
centimeters) we have performed a LINEAR TRANSFORMATION
Now, a score that was 60 is 1, i.e. the score is 1 S.D. above the mean
A z score represents the number of S.D. that observation Xi is above or
below the mean.
t-tests:
Testing Hypotheses About Means
x sample mean
formula: ( x m) n m population mean
t s sample standard deviation
s x
x

n = size of sample

For a z-test you need to know the population mean and s.d. Often you dont know the s.d. of the
hypothesised or comparison population, and so you use a t-test. This uses the sample s.d. instead.
This introduces a source of error, which decreases as your sample size increases
Therefore, the t statistic is distributed differently depending on the size of the sample, like a family of normal
curves. The degrees of freedom (d.f. = sample size 1) represents which of these curves you are relating
your t-value to. There are different tables of p-values for different degrees of freedom.
larger sample = more squashed t-statistic distribution = easier to get significance

Kinds of t-tests (formula is slightly different for these different kinds):


Single-sample: tests whether a sample mean is significantly different from 0
Independent-samples: tests the relationship between two independent populations
Paired-samples: tests the relationship between two linked populations, for example means obtained in
two conditions by a single group of participants
t-tests:
Worked Example of Single Sample t-test
We know that finger tapping speed in normal population:
Mean=100ms per tap
Finger tapping speed in 8 subjects with caffeine addiction:
Mean = 89.4ms
Standard deviation = 20ms
Does this prove that caffeine addiction has an effect on tapping speed?
Null Hypothesis H0: tapping speed not faster after caffeine
Preselected significance level was 0.05
Calculate from t value, for ex. T(7)= 8 (89.4 -100) = -1.5
20
Find area below t(7) = -1.5, get 0.07: i.e. 7% of the time we would expect a
score as low as this
This value is above 0.05 => We could NOT reject H0!
We cant conclude that caffeine addiction has an effect on tapping speed
F-tests / ANOVAs:
What are they?
ANOVA = analysis of variance
involves calculating an F value whose significance is tested (similarly to a z or t value)

Like t-tests, F-tests deal with differences between or among sample means, but with any
number of means (each mean corresponding to a factor)
Q/ do k means differ? A/ yes, if the F value is significant
Q/ how do the k factors influence each other? A/ look at the interaction effects

ANOVA calculates F values by comparing the variability between two conditions


with the variability within each condition (this is what the formula does)
e.g. we give a drug that we believe will improve memory to a group of people and
give a placebo to another group. We then take dependent measures of their
memory performance, e.g. mean number of words recalled from memorised lists.
An ANOVA compares the variability that we observe between the two conditions to
the variability observed within each condition. Variability is measured as the sum of
the difference of each score from the mean.
Thus, when the variability that we predict (between the two groups) is much greater
than the variability we don't predict (within each group) then we will conclude that
our treatments produce different results.
F-tests / ANOVAs:
What are they?
ANOVA calculates an F value, which has a distribution related to the sample size and
number of conditions (degrees of freedom)
F MS factors

MS error

The formula compares the variance between and within conditions or factors as
discussed above we wont worry about the derivation! (n.b. MS = mean squares)
If the F statistic is significant, this tells us that the means of the factors differ significantly
=> are not likely to have come from the same population = our variable is having an effect

When can we use ANOVAs?


The formula is based on a model of what contributes to the value of any particular data
point, and how the variance in the data is composed. This model makes a number of
assumptions that must be met in order to allow us to use ANOVA
homogeneity of variance
normality
independence of observations
Remember: when you get a significant F value, this just tells you that there is a significant
difference somewhere between the means of the factors in the ANOVA. Therefore, you
often need to do planned or post-hoc comparisons in order to test more specific
hypotheses and probe interaction effects
ANOVAs:
Worked Example
Testing Differences between independent sample means: Following rTMS
over the Right Parietal cortex, are the incorrectly cued trials in a cued RT
task slowed down compared to the correctly cued trials?
Repeated measures ANOVA:
1 group of 14 healthy volunteers
Perform 100 trials pre- and 100 trials post- stimulation
Real vs Sham rTMS on two separate days
Within-session factors:
Correct vs Incorrect trials
Pre vs Post
Between-session factors:
Real vs Sham rTMS
Null Hypothesis H0: there is no difference in the RTs of incorrectly cued trials
Many possibilities if H0 is rejected:
All means are different from each other: meanICpreR vs. meanICpostR vs.
meanICpreS vs. meanICpostS
Means in the Real condition are different from means in the Sham
Interaction of means might be different (pre_post in Real diff. pre_post in Sham)
Why do we care?
Statistical tests in SPM
Example in a simple block design of the effect of a
drug on right hand movement versus rest:

Factorial Design: 2x2

DRUG
Real Placebo
fMRI:
Acquired
8 measurements, move
2 of each condition

Subjects:
12 healthy volunteers rest
Counterbalanced order
Why do we care?
Statistical tests in SPM
We perform ANOVAs, t-tests, and f-tests when we create a design matrix and specify
contrasts
Reminder: GLM equation to explain our data y
y=Xb+e
X is the design matrix: enter this into SPM to tell program how to divide up the
imaging data into the different conditions
Each element in the matrix represents one condition
Xb+e =y
b are the regressors: Allocate regressors specific values to test specific
hypotheses (i.e. CONTRASTS) between conditions
e = error In this case:
Column1 = right
movement with 1 Y= (b1x1+b2x2+b3x3+b4x4)+e
drug
Column2 = rest 2
with drug
Column3 = right
movement with 3
placebo
Column4 = rest 4
with placebo
Why do we care?
t-tests in SPM
A t-contrast is a linear combination of
parameters: c x b
If we think that 1 regressor in our design
matrix (e.g. b ) could lead to an interesting
1

activation, we compute:
1xb +0xb +0xb +0xb and divide by SD
1 2 3 4

Our question: Is the mean activity in


condition 1 significantly different from the
mean activity in all other conditions?
Why do we care?
t-tests in SPM
In SPM, we make the weights sum to 0
when testing specific hypotheses
T-tests in our study would include:
Main effects of movement across all sessions:
1 -1 1 -1
Main effects of the drug:
Increases: 1 1 -1 -1 1
Decreases: -1 -1 1 1 2
Interaction increases: 1 -1 -1 1 3
Interaction decreases: -1 1 1 -1 4
Why do we care?
F-tests in SPM
An F-test models multiple linear hypotheses: does the design matrix X
model anything?
F-contrasts in our previous example
Are there any differences of the drug and placebo altogether? (i.e. increases
AND decreases)
1 0 -1 0
0 1 0 -1
Used if we want to make more general inferences about data that
1) might not be found with simple averaging (cancellations?)
2) to test for effects that are jointly expressed by some one-dimensional
contrasts
'all effects of interest' to check whether there is any effect at all
3) in case data is modelled in a more complex way (hrf & deriv)
4) when you have multiple regressors and think that the effect expresses itself in
some of them, not only one
5) If you do not have a very clear hypothesis: might be useful to derive more
hypotheses to be tested with t-contrasts
=> more details will be given later in the course
References
http://obelia.jde.aca.mmu.ac.uk/rd/arsham/
opre330.htm#ranova
Statistical Methods for Psychology
(2001), by David Howell
SPM website:
http://www.fil.ion.ucl.ac.uk/spm/

You might also like