You are on page 1of 25

Modifyuse bio.

IB book
IB Biology Topic 1:
Statistical Analysis
http://www.patana.ac.th/Second
ary/Science/c4b/1/stat1.htm

An investigation of shell length


variation in a mollusc species
A marine gastropod (Thersites bipartita) has
been sampled from two different locations:
Sample A: Shells found in full marine conditions
Sample B: Shells found in brackish water
conditions.

sample size = 10 shells


length of the shell measured as shown

Analysis of Gastropod Data

measured height of shells (ruler)


Units: mm + / - 1 mm (ERROR)
Significant digits
Uncertainty

must be consistent.

all measuring devices!


reflects the precision of the measurement

There should be no variation in the precision of raw


data

1.1.1 Error bars and the


representation of variability in data.
Biological systems are subject to a genetic
program and environmental variation
collect a set of data it shows variation
Graphs: show variation using error bars
show range of the data or
standard deviation

Mean & Range for each group


Marine
Brackish

Graph Mean & Range for each group


Quick
comparison
of the 2
data sets

1.1.2 Calculation of Mean and Std Dev

3 classes of data
Mean
arithmetic mean (avg): measure of the central
tendency (middle value)

Std Dev

Measures spread around the mean


Measure of variation or accuracy of measurement

1.1.2 Calculation of Mean and Std Dev


Std Dev of sample = s
is for the sample not the
total population
Pop 1. Mean = 31.4
s = 5.7
Pop 2. Mean =41.6
s = 4.3

Graphing Mean and Std Dev: Error Bars

Mean +/- 1 std dev


no overlap between
these two populations

The question being


considered is:
Is there a significant
difference between the
two samples from
different locations?

or
Are the differences in
the two samples just
due to chance
selection?

Graphing Mean and Std Dev: Error Bars

StdDev graph compares


68% of the population
% begins to show that
they look different.

Range graph :
misleads us to think
the data may be similar

1.1.3 Standard deviation and the


spread of values around the mean.
1. StdDev is a measure of how spread out the
data values are from the mean.
2. Assume:
1. normal distribution of values around the
mean
2. data not skewed to either end
3. 68% of all the data values in a sample can
be found between the mean +/- 1 standard
deviation

http://www.patana.ac.th/Secondary/
Science/c4b/1/stat1.htm#gastro
Animation of mean and standard deviation

1.1.3 Standard deviation and the


spread of values around the mean.
4. 95% of all the data values in a sample can
be found between the mean + 2s and the
mean -2s.

1.1.4 Comparing means and standard


deviations of 2 or more samples.
Sample w/ small StdDev suggests narrow variation
Sample w/ larger StdDev suggests wider variation
Example: molluscs
Pop 1. Mean = 31.4 Standard deviation(s)= 5.7
Pop 2. Mean =41.6 Standard deviation(s) = 4.3

1.1.4 Comparing means and standard


deviations of 2 or more samples.
Pop 2 has a greater mean shell length but
slightly narrower variation.
Why this is the case would require further
observation and experiment on
environmental and genetic factors.

http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm#gastro

1.1.5 Comparing 2 samples with t-Test


Null Hypothesis:
There is no significant difference between
the two samples except as caused by
chance selection of data.
OR
Alternative hypothesis:
There is a significant difference between
the height of shells in sample A and sample
B.
http://www.patana.ac.th/Secondary/Science/c4b/1/stat1.htm#gastro

1.1.5 Comparing 2 samples with t-Test

For the examples you'll use in biology, tails is always 2 , and type can be:
1, paired
2,Two samples equal variance
3, Two samples unequal variance

Good idea to graph it

Bar chart
Error bars
Stats

T-test: Are the mollusc shells from the


two locations significantly different?
T-test tells you the probability (P) that the 2
sets are basically the same. (null hypothesis)

P varies from 0 (not likely) to 1 (certain).


higher P = more likely that the two sets are the
same, and that any differences are just due to
random chance.
lower P = more likely that that the two sets are
significantly different, and that any differences are
real.

T-test: Are the mollusc shells from the


two locations significantly different?
In biology the critical P is usually 0.05 (5%)
(biology experiments are expected to
produce quite varied results)
If P > 5% then the two sets are the same
(i.e. accept the null hypothesis).

If P < 5% then the two sets are different


(i.e. reject the null hypothesis).

For t test, # replicates as large as possible


At least > 5

Drawing Conclusions
1. State null hypothesis & alternative hypothesis
(based on research ?)
2. Set critical P level at P=0.05 (5%)
3. Write the decision rule
If P > 5% then the two sets are the same (i.e. accept
the null hypothesis).
If P < 5% then the two sets are different (i.e. reject
the null hypothesis).
4. Write a summary statement based on the decision.
The null hypothesis is rejected since calculated
P = 0.003 (< 0.05; two-tailed test).
5. Write a statement of results in standard English.
There is a significant difference between the height
of shells in sample A and sample B.

1.1.6 Correlation & Causation


Sometimes youre looking for an association
between variables.
Correlations see if 2 variables vary together
+1 = perfect positive correlation
0 = no correlation
-1 = perfect negative correlation

Relations see how 1 variable affects another

Pearson correlation (r)


Data are continuous
& normally
distributed

Spearmans rank-order correlation (r s)


Data are not continuous
& normally distributed
Usually scatterplot for
either type of correlation
both correlation
coefficients indicate a
strong + corr.
large females pair with
large males
Dont know why, but it
shows there is a
correlation to investigate
further.

Causative: Use linear regression


Fits a
straight line
to data
Gives slope
& intercept
m and c in
the equation
y = mx + c

Doesnt PROVE causation, but


suggests it...need further investigation!

You might also like