Statistical Tools

Statistical tools
for the Honours course Professor D B Hibbert
Tools
x
Normal probability plots by the rankit method ANOVA
David Brynn Hibbert 1999
Rankit
x x
Sort data (x1 xn) in order of increasing magnitude Write down cumulative frequency for each value (ie how many data points have lesser or equal value) Calculate cumulative % frequency = 100 x cumulative frequency / (n+1) Plot the Z-score associated with the cumulative % frequency against x [in Excel = NORMSINV(% frequency/100) ]
Example exptl design

Cumulative Effect frequency % Z
A -0.375 AMC 0.025 AM 0.125 C 0.125 MC 0.825 M 1.925
1 2 4 4 5 6
14.29 28.57 57.14 57.14 71.43 85.71
-1.07 -0.57 0.18 0.18 0.57 1.07

Z - score
1.50 1.00 0.50 0.00 -1.5 -0.50 -1.00 -1.50 -1 -0.5 0 0.5 1 1.5
M MC
2.5
Va lue of effect
ANOVA
(Analysis of Variance)
ANOVA partitions the total variablity in a set of data, taken under different conditions, between the variablity attributable to each source of difference. Often it is used to determine the significance of variance compared to random uncertainty
Types of problem solved by ANOVA

x x x
Which of many variables are important for a method. If a linear relationship between variables is significant For a round robin analysis between several laboratories, what is the inter-laboratory precision (reproducibility) and what the intra-laboratory precision (repeatability). Is the inter-laboratory precision significantly greater than the intra-laboratory precision?
One way classification

For repeated measurements under one set of different conditions (labs, analysts, methods ) A Replicates nj B C Variables (k)
x1,1 x 2,1 xi ,1
x1, 2 x 2, 2 xi , 2
x1, j x 2, j xi , j
Different variables of interest in COLUMNS Replicates (or whatever groups each variable) in ROWS
Calculation
1. Subtract global mean from each value 2. Total sum of squares = SST 3.
3.1 Average each column 3.2 Square average and multiply by number of rows 3.3 Sum = SSc
= ( xi , j x ) x=
x
j
i, j
n
j
SS T = xi , j x
i j
xi , j x i SS C = n j nj j = nj xj x
j
4. Residual SSR = SST SSc
Names
SST is the total sum of squares. Also the corrected sum of squares x SSc is the sum of squares due to the factor studied. Also the treatment sum of squares, the heterogeneity sum of squares, or the between column sum of squares x SSR is the residual sum of squares. Also the within column sum of squares
x
ANOVA Table
Source Between variables Within variables Sum of Squares SSc Degrees of freedom k- 1 Mean squares SSc/(k - 1) Expected mean squares
+ n j
2
2 c
SSR
N-k
SSR/(N-k)
TOTAL
SST
N-1
Decisions
Use a one-tailed F-test to compare the mean square values to decide if the variance between columns is significantly different from the residual variance
F=
+ n j
2
2 c
Compare with F0.05 ', k-1, N-k
F-test
x x
The Fisher F distribution is used to compare variances For two sets of data with standard deviations 2 s1 and s2 s1
s1 > s2
F=
2 2
The F distribution is at a given probability level (eg 0.05 = 95%), and at the relevant number of degrees of freedom (n 1) for numerator and denominator As we know s1 > s2 a one tailed test is used
Example for ANOVA #1

The following % of methanol in a distillate were determined by replicate analyses from 2 drums DRUM A 49 44 70 50 58 DRUM B 44 57 34 48 50 Is Is the the mean mean concentration concentration in in Drum Drum A A significantly significantly different different than than that that in in Drum DrumB B? ?
ANOVA calculations for #1

DRUM A Grand mean - 1.4 - 6.4 = 50.4 19.6 - 0.4 7.6 Mean 3.8 DRUM B -6.4 6.6 -16.4 -2.4 -0.4 -3.8
SST = 844.4
SSc = 5 x 3.82 + 5 x (-3.8)2 = 144.4
ANOVA Table #1
Source Between variables Within variables TOTAL Sum of Squares 144.4 700 844.4 Degrees of freedom 1 8 9 Mean squares 144.4 87.5 Expected mean squares
2 + 5c2 2
F test #1
144. 40 F= = 1. 65 87. 50 F0.05' ,1,8 = 5. 3
F < Ftable Therefore the difference is NOT significant at 95% probability
Calculation of s
SS c = SSc/(k-1) = 2 + njc2 SS R = SSR/(N-k) = 2
= SS R C =
SSC SS R nj
DRUM A DRUM B 49 44 44 57 70 34 50 48 58 50
In Excel use Data Analysis add in: One-way ANOVA Check: Headers in first row
Anova: Single Factor SUMMARY Groups Count DRUM A 5 DRUM B 5
Sum Average Variance 271 54.2 103.2 233 46.6 71.8
=144.4/87.5 Probability of 144.4 being greater than 87.5 by chance F value at 95% probability
ANOVA rce of Varia Between G Within Gro Total
SS 144.4 700
844.4
df
1 8 9
MS F P-value F crit 144.4 1.650286 0.234867 5.317645 87.5
ANOVA Example #2
The following data shows the stability of a fluorescent molecule under different storage conditions. Determine which storage conditions lead to significantly different signals. Storage method A: Freshly prepared B: 1 hr in dark C: 1 hr in subdued light D: 1 hr in bright light Fluorescence 102, 100, 101 101 101 104 97, 95, 99 90, 92, 94
ANOVA calculations #2
k=4 A B C D
nj = 3
4 2 3
3
3 3 6
4
-1 -3 1
-1 SSc = 186
-8 -6 -4
-6
Sum of squares = 210
Mean
ANOVA Table #2
Source Between variables Within variables TOTAL Sum of Squares 186 24 210 Degrees of freedom 3 8 11 Mean squares 62 3 Expected mean squares
2 + 3c2 2
F test #2
62 F= = 20. 7 3 F0.05' ,3,8 = 4. 07
F > Ftable Therefore there is a SIGNIFICANT difference at 95% probability
A 102 100 101
B 101 101 104
C 97 95 99
D 90 92 94
Anova: Single Factor SUMMARY Groups Count A 3 B 3 C 3 D 3
Excel Single factor ANOVA

Sum Average Variance 303 101 1 306 102 3 291 97 4 276 92 4
ANOVA rce of Varia Between G Within Gro Total
SS
186 24 210
df
3 8 11
MS
F P-value 62 20.66667 0.0004 3
F crit 4.06618
Fixed and Random Models

If the columns are unique variables (eg light regimes, analysts) then the model is determining if any of the variables is different. They are fixed effects x If the columns are random examples of a normally distributed variable (eg male analysts) the model is only concerned with the effect of all columns taken together. This is a random effect
x
Least Significant Difference

LSD is used to test which variables are significantly different in a fixed effect ANOVA by comparison with a calculated smallest expected value.
x x
Arrange mean of each variable in ascending order Calculate s t (2/n) where s is the within - variables estimate of , n is the number of rows, and t is the Student's-t value for the degrees of freedom of this s at 95% cl If the difference between means is > calculated LSD then it is SIGNIFICANT If the difference between means is < calculated LSD then it is NOT SIGNIFICANT
LSD for example #2

t0.05",8 = 2. 31, s = 3. 0 , n = 3 st 0.05",8 2 / n = 3. 26
Means in order D = 92 C = 97 A = 101 B = 102 5 4 1 Yes Yes No Difference Significant?
Interpretation
D & C differ significantly from each other and from A & B x A & B do not differ significantly from each other
x x
Therefore: The amount of exposure to light is important.

Statistical Tools

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Tools

Uploaded by

Copyright:

Available Formats

Statistical tools

for the Honours course Professor D B Hibbert

Normal probability plots by the rankit method ANOVA

David Brynn Hibbert 1999

David Brynn Hibbert 1999

Example exptl design

A -0.375 AMC 0.025 AM 0.125 C 0.125 MC 0.825 M 1.925

14.29 28.57 57.14 57.14 71.43 85.71

-1.07 -0.57 0.18 0.18 0.57 1.07

David Brynn Hibbert 1999

David Brynn Hibbert 1999

Types of problem solved by ANOVA

David Brynn Hibbert 1999

One way classification

David Brynn Hibbert 1999

4. Residual SSR = SST SSc

David Brynn Hibbert 1999

David Brynn Hibbert 1999

David Brynn Hibbert 1999

Compare with F0.05 ', k-1, N-k

David Brynn Hibbert 1999

David Brynn Hibbert 1999

Example for ANOVA #1

David Brynn Hibbert 1999

ANOVA calculations for #1

SSc = 5 x 3.82 + 5 x (-3.8)2 = 144.4

David Brynn Hibbert 1999

David Brynn Hibbert 1999

David Brynn Hibbert 1999

David Brynn Hibbert 1999

Anova: Single Factor SUMMARY Groups Count DRUM A 5 DRUM B 5

Sum Average Variance 271 54.2 103.2 233 46.6 71.8

ANOVA rce of Varia Between G Within Gro Total

MS F P-value F crit 144.4 1.650286 0.234867 5.317645 87.5

David Brynn Hibbert 1999

David Brynn Hibbert 1999

Sum of squares = 210

David Brynn Hibbert 1999

David Brynn Hibbert 1999

David Brynn Hibbert 1999

A 102 100 101

B 101 101 104

Anova: Single Factor SUMMARY Groups Count A 3 B 3 C 3 D 3

Excel Single factor ANOVA

ANOVA rce of Varia Between G Within Gro Total

F P-value 62 20.66667 0.0004 3

David Brynn Hibbert 1999

Fixed and Random Models

David Brynn Hibbert 1999

Least Significant Difference

David Brynn Hibbert 1999

LSD for example #2

David Brynn Hibbert 1999

Therefore: The amount of exposure to light is important.

David Brynn Hibbert 1999

You might also like