You are on page 1of 32

The purpose of statistics

Global warming

Temperature will
increase with CO2
Global warming

Temperature will
increase with CO2
Global warming

Temperature will
increase with CO2
Global warming

Temperature will
increase with CO2
Global warming

Temperature will
increase with CO2
Mean height of men between
30 and 35 in Cheltenham and
Gloucester
Assume a representative sample was taken
Mean height of men between
30 and 35 in Cheltenham and
Gloucester
Mean height of men between
30 and 35 in Cheltenham and
Gloucester
Mean height of men between
30 and 35 in Cheltenham and
Gloucester
Should be
asking questions
- how many people?
- how was
representative?
Quantitative methods
•  Mainly you will want to know
–  About relationships between 2 variables
•  Can be predictor (independent) and predicted
(dependent), but may not be
–  About differences in the average between
group
•  e.g. The mean height is shorter in group X
Quantitative methods
•  Three things you can find out

1.  Obvious differences or clear relationship

2.  No differences or no relationship

3.  Who knows???


Who knows?
•  This is very subjective. Different people
will draw different conclusions

•  Statistics are used to make clear cut


decisions

•  Will give a number to indicate certainty


of results (or probability or a p value)
Statistics and the null
hypothesis
•  The null hypothesis is where there is no
difference between averages, or no
relationship
•  Null hypotheses can be rejected,
hypotheses for significant differences or
relationships can only be supported
•  p values indicate the probability that the
null hypothesis is correct
Statistics and the null
hypothesis
•  The null hypothesis is where there is no
difference between averages, or no
relationship
•  Null hypotheses can be rejected,
hypotheses for significant differences or
relationships can only be supported
•  p values indicate the probability that the
null hypothesis is correct
Important - I highly recommend you never speak of null
Hypotheses ever again - base hypotheses on logic, not stats
P values
•  Range from 0 to 1.
•  1 =
P values
•  Range from 0 to 1.
•  1 = the null hypothesis is fully supported
–  Therefore there is no difference between
means or no relationship between
variables
•  0 =
P values
•  Range from 0 to 1.
•  1 = the null hypothesis is fully supported
–  Therefore there is no difference between
means or no relationship between
variables
•  0 = the null hypothesis is fully rejected
–  The means are different or the relationship
exists
P values
•  For most sciences a cut off p value of
0.05 is used for significance
•  p < 0.05 = significant
•  p ≥ 0.05 = not significant

•  A p value of 0.05 means you can be


95% certain about your results being
significant.
Some examples
Mass of tomatoes 1.2

from 10 plants 1
5 grown with tomato
feed, 5 without 0.8

Mass (kg)
0.6

With: 1.2, 1.3, 1.1, 0.4

0.9, 1.2 (kg) 0.2

Without: 0.5, 0.7, 0.9, feed no feed

0.7, 0.6 (kg)


p = 0.00127
Some examples
Mass of tomatoes 1.2

from 10 plants 1.15

5 grown with tomato 1.1

feed, 5 without

Mass (kg)
1.05

With: 1.2, 1.3, 1.1, 1

0.9, 1.2 (kg) 0.95

0.9
feed no feed
Without: 0.9, 1.3, 0.9,
1.2, 1.2 (kg)
p = 0.720
Some examples
Uplift of rock against time:

Time (mya) 0 20 40 60 80 100 120


Height (m) 8 6 5 3 2 0 0
9

6
Height (m)

2
p = 0.000034
1

0
0 20 40 60 80 100 120 140
Time (mya)
Some examples
Uplift of rock against time:

Time (mya) 0 20 40 60 80 100 120


Height (m) 8 8 9 3 9 3 8
10

6
Height (m)

2
p = 0.53
1

0
0 20 40 60 80 100 120 140
Time (mya)
Some examples
Mass of tomatoes 1.2

from 10 plants 1
5 grown with tomato
feed, 5 without 0.8

Mass (kg)
0.6

With: 1.2, 1.3, 1.1, 0.4

0.9, 1.2 (kg) 0.2

Without: 0.5, 0.7, 0.9, feed no feed

0.7, 0.6 (kg)


p = 0.00127
Some examples
Mass of tomatoes 1.2

from 10 plants 1
5 grown with tomato
feed, 5 without 0.8

Mass (kg)
0.6

With: 1.2, 1.3, 1.1, 0.4

0.9, 1.2 (kg) 0.2

Without: 1.5, 0.7, 0.4, feed no feed

0.2, 0.6 (kg)


p = 0.0831
Variability, precision and
sample size
Calculation of an average is the average of a
sample - it is hopefully a good guide to the
actual average of the population you are
measuring
The closer values from samples, the smaller the
variance or standard deviation
5, 4, 5, 4, 5 - mean = 4.6. S.D. = 0.54
1, 9, 1, 9, 3 - mean = 4.6. S.D. = 4.10
Variability, precision and
sample size
Precision of a mean value is given by the use of
Standard Error or Confidence Intervals

S.E. = S.D. / √n 95% C.I. = 1.96 x S.E.

Therefore precision of the mean decreases with


variability and increases with sample size.

More samples = more likely to get a significant


difference
Plotting graphs of means - plot
precision as error bar
1.4 2.5

1.2
2
1

1.5
Mass (kg)

Mass (kg)
0.8

0.6
1

0.4
0.5
0.2

0 0
feed no feed feed no feed

High precision - means likely to Low precision - means unlikely


be significantly different to be significantly different
Inferential Statistics vs
Descriptive Statistics
•  Calculations of means, medians, SD,
IQR, skewness etc are descriptive
statistics

•  Calculations of p values are inferential


statistics

•  Which are best?


Descriptive statistics are best
•  A good graph explains a lot
•  Do 95% C.I. overlap?

•  Never conduct inferential stats without


first calculating the descriptives - these
will explain (or help to explain) the
results of the test to you
Quantitative methods
•  Mainly you will want to know
–  About relationships between 2 variables
•  Can be predictor (independent) and predicted
(dependent), but may not be
–  About differences in the average between
group
•  e.g. The mean height is shorter in group X
A few other things…
•  Do data show what you expect?
–  Chi squared tests

•  Are there differences or relationships


between a whole range of values
–  Multivariate stats - not covered in the
module, but possible extra session for
those interested

You might also like