You are on page 1of 27

Statistical tools

for the Honours course Professor D B Hibbert

Tools
x

Normal probability plots by the rankit method ANOVA

David Brynn Hibbert 1999

Rankit
x x

Sort data (x1 xn) in order of increasing magnitude Write down cumulative frequency for each value (ie how many data points have lesser or equal value) Calculate cumulative % frequency = 100 x cumulative frequency / (n+1) Plot the Z-score associated with the cumulative % frequency against x [in Excel = NORMSINV(% frequency/100) ]

David Brynn Hibbert 1999

Example exptl design


Cumulative Effect frequency % Z

A -0.375 AMC 0.025 AM 0.125 C 0.125 MC 0.825 M 1.925

1 2 4 4 5 6

14.29 28.57 57.14 57.14 71.43 85.71

-1.07 -0.57 0.18 0.18 0.57 1.07


Z - score

1.50 1.00 0.50 0.00 -1.5 -0.50 -1.00 -1.50 -1 -0.5 0 0.5 1 1.5

M MC

2.5

Va lue of effect

David Brynn Hibbert 1999

ANOVA
(Analysis of Variance)
ANOVA partitions the total variablity in a set of data, taken under different conditions, between the variablity attributable to each source of difference. Often it is used to determine the significance of variance compared to random uncertainty

David Brynn Hibbert 1999

Types of problem solved by ANOVA


x x x

Which of many variables are important for a method. If a linear relationship between variables is significant For a round robin analysis between several laboratories, what is the inter-laboratory precision (reproducibility) and what the intra-laboratory precision (repeatability). Is the inter-laboratory precision significantly greater than the intra-laboratory precision?

David Brynn Hibbert 1999

One way classification


For repeated measurements under one set of different conditions (labs, analysts, methods ) A Replicates nj B C Variables (k)

x1,1 x 2,1 xi ,1

x1, 2 x 2, 2 xi , 2

x1, j x 2, j xi , j

Different variables of interest in COLUMNS Replicates (or whatever groups each variable) in ROWS

David Brynn Hibbert 1999

Calculation
1. Subtract global mean from each value 2. Total sum of squares = SST 3.
3.1 Average each column 3.2 Square average and multiply by number of rows 3.3 Sum = SSc
= ( xi , j x ) x=

x
j

i, j

n
j

SS T = xi , j x
i j

xi , j x i SS C = n j nj j = nj xj x
j

4. Residual SSR = SST SSc

David Brynn Hibbert 1999

Names
SST is the total sum of squares. Also the corrected sum of squares x SSc is the sum of squares due to the factor studied. Also the treatment sum of squares, the heterogeneity sum of squares, or the between column sum of squares x SSR is the residual sum of squares. Also the within column sum of squares
x

David Brynn Hibbert 1999

ANOVA Table
Source Between variables Within variables Sum of Squares SSc Degrees of freedom k- 1 Mean squares SSc/(k - 1) Expected mean squares

+ n j
2

2 c

SSR

N-k

SSR/(N-k)

TOTAL

SST

N-1

David Brynn Hibbert 1999

Decisions
Use a one-tailed F-test to compare the mean square values to decide if the variance between columns is significantly different from the residual variance

F=

+ n j
2

2 c

Compare with F0.05 ', k-1, N-k

David Brynn Hibbert 1999

F-test
x x

The Fisher F distribution is used to compare variances For two sets of data with standard deviations 2 s1 and s2 s1
s1 > s2

F=

2 2

The F distribution is at a given probability level (eg 0.05 = 95%), and at the relevant number of degrees of freedom (n 1) for numerator and denominator As we know s1 > s2 a one tailed test is used

David Brynn Hibbert 1999

Example for ANOVA #1


The following % of methanol in a distillate were determined by replicate analyses from 2 drums DRUM A 49 44 70 50 58 DRUM B 44 57 34 48 50 Is Is the the mean mean concentration concentration in in Drum Drum A A significantly significantly different different than than that that in in Drum DrumB B? ?

David Brynn Hibbert 1999

ANOVA calculations for #1


DRUM A Grand mean - 1.4 - 6.4 = 50.4 19.6 - 0.4 7.6 Mean 3.8 DRUM B -6.4 6.6 -16.4 -2.4 -0.4 -3.8

SST = 844.4

SSc = 5 x 3.82 + 5 x (-3.8)2 = 144.4

David Brynn Hibbert 1999

ANOVA Table #1
Source Between variables Within variables TOTAL Sum of Squares 144.4 700 844.4 Degrees of freedom 1 8 9 Mean squares 144.4 87.5 Expected mean squares

2 + 5c2 2

David Brynn Hibbert 1999

F test #1
144. 40 F= = 1. 65 87. 50 F0.05' ,1,8 = 5. 3
F < Ftable Therefore the difference is NOT significant at 95% probability

David Brynn Hibbert 1999

Calculation of s
SS c = SSc/(k-1) = 2 + njc2 SS R = SSR/(N-k) = 2

= SS R C =
SSC SS R nj

David Brynn Hibbert 1999

DRUM A DRUM B 49 44 44 57 70 34 50 48 58 50

In Excel use Data Analysis add in: One-way ANOVA Check: Headers in first row

Anova: Single Factor SUMMARY Groups Count DRUM A 5 DRUM B 5

Sum Average Variance 271 54.2 103.2 233 46.6 71.8

=144.4/87.5 Probability of 144.4 being greater than 87.5 by chance F value at 95% probability

ANOVA rce of Varia Between G Within Gro Total

SS 144.4 700
844.4

df
1 8 9

MS F P-value F crit 144.4 1.650286 0.234867 5.317645 87.5

David Brynn Hibbert 1999

ANOVA Example #2
The following data shows the stability of a fluorescent molecule under different storage conditions. Determine which storage conditions lead to significantly different signals. Storage method A: Freshly prepared B: 1 hr in dark C: 1 hr in subdued light D: 1 hr in bright light Fluorescence 102, 100, 101 101 101 104 97, 95, 99 90, 92, 94

David Brynn Hibbert 1999

ANOVA calculations #2
k=4 A B C D

nj = 3

4 2 3
3

3 3 6
4

-1 -3 1
-1 SSc = 186

-8 -6 -4
-6

Sum of squares = 210

Mean

David Brynn Hibbert 1999

ANOVA Table #2
Source Between variables Within variables TOTAL Sum of Squares 186 24 210 Degrees of freedom 3 8 11 Mean squares 62 3 Expected mean squares

2 + 3c2 2

David Brynn Hibbert 1999

F test #2
62 F= = 20. 7 3 F0.05' ,3,8 = 4. 07
F > Ftable Therefore there is a SIGNIFICANT difference at 95% probability

David Brynn Hibbert 1999

A 102 100 101

B 101 101 104

C 97 95 99

D 90 92 94

Anova: Single Factor SUMMARY Groups Count A 3 B 3 C 3 D 3

Excel Single factor ANOVA


Sum Average Variance 303 101 1 306 102 3 291 97 4 276 92 4

ANOVA rce of Varia Between G Within Gro Total

SS
186 24 210

df
3 8 11

MS

F P-value 62 20.66667 0.0004 3

F crit 4.06618

David Brynn Hibbert 1999

Fixed and Random Models


If the columns are unique variables (eg light regimes, analysts) then the model is determining if any of the variables is different. They are fixed effects x If the columns are random examples of a normally distributed variable (eg male analysts) the model is only concerned with the effect of all columns taken together. This is a random effect
x

David Brynn Hibbert 1999

Least Significant Difference


LSD is used to test which variables are significantly different in a fixed effect ANOVA by comparison with a calculated smallest expected value.
x x

Arrange mean of each variable in ascending order Calculate s t (2/n) where s is the within - variables estimate of , n is the number of rows, and t is the Student's-t value for the degrees of freedom of this s at 95% cl If the difference between means is > calculated LSD then it is SIGNIFICANT If the difference between means is < calculated LSD then it is NOT SIGNIFICANT

David Brynn Hibbert 1999

LSD for example #2


t0.05",8 = 2. 31, s = 3. 0 , n = 3 st 0.05",8 2 / n = 3. 26
Means in order D = 92 C = 97 A = 101 B = 102 5 4 1 Yes Yes No Difference Significant?

David Brynn Hibbert 1999

Interpretation
D & C differ significantly from each other and from A & B x A & B do not differ significantly from each other
x x

Therefore: The amount of exposure to light is important.

David Brynn Hibbert 1999

You might also like