You are on page 1of 7

Chapter 4 Statistics

Population
N, ,

4-1 The Gaussian Distribution

Sample
n, x, s
x1,x2, ... xn

Mean and Standard Deviation


Population vs. Sample
population (true) mean ()

sample mean ( x ); the mean of a limited sample drawn from a population of data
x 1 + x 2 + + x n
n

x = xi n =
i =1

population standard deviation ()

sample standard deviation (s)


n

(x i )2 N

s=

i =1

(x i x )2

(n 1)

i =1

Degrees of freedom: n 1
xn can be calculate from (n 1 ) data points and x .
Ex

Mean and Standard Deviation of (7, 18, 10, 15)

x = 7 + 18 + 10 + 15 = 12.5
4
s=

(7 12.5 )2 + (18 12.5 )2 + (10 12.5 )2 + (15 12.5 )2


= 4.9
4 1

x & s should end at the same decimal place.


Median
(7, 18, 10, 15)

(10 +15)/2 = 12.5

Range
(7, 18, 10, 15)
18 7 = 11
Replicate Data on the Calibration of a 10-ml Pipet
Trial

Vol, mL

Trial

Vol, mL

Trial

Vol, mL

Trial

Vol, mL

Trial

Vol, mL

1
2
3
4
5
6
7
8
9
10

9.988
9.973
9.986
9.980
9.975
9.982
9.986
9.982
9.981
9.990

11
12
13
14
15
16
17
18
19
20

9.980
9.989
9.978
9.971
9.982
9.983
9.988
9.975
9.980
9.994

21
22
23
24
25
26
27
28
29
30

9.992
9.984
9.981
9.987
9.978
9.983
9.982
9.991
9.981
9.969

31
32
33
34
35
36
37
38
39
40

9.985
9.977
9.976
9.983
9.976
9.990
9.988
9.971
9.986
9.978

41
42
43
44
45
46
47
48
49
50

9.986
9.982
9.977
9.977
9.986
9.978
9.983
9.980
9.983
9.979

x = 9.982 mL

s = 0.0060
4-1

Exercise

Calculate 10 s using n.
Calculate 10 ss using n 1.

Group 1; 1-5, 6-10, 11-15, 16-20, ...


Group 2; 1-5, 6-10, 11-15, 16-20, ...
Mean of using n

Mean of s using n 1

10 x 5 measurements

0.0053

0.0059

5 x 10 measurements

0.0058

0.0061

2 x 25 measurements

0.0059

0.0061

1 x 50 measurements

0.0060

0.0060

Standard Deviation & Probability


Normalized Gaussian Curve

y=

( x )2
1
exp

2
2

x
, dz = dx)
Normal Error Curve (z =

ydx =

1
e
2

( ydx = 1)
z 2
2 dz

Max at z = 0, Symmetric

0.399

0.4

Decaying exponentially
Probability; Area under the curve

0.3
Range
1
2
3

inflection
point

0.242

Probability (Area)
68.3%
95.5%
97.7%

0.2
w1/2 = 2.354
0.1

4-2 Students t
0.0
"Student"

W. S. Gossett (1908)

0
w=4

Confidence Intervals
confidence level (%)

x , s, finite number of observations (n) "predict"

90% chance that true value lies in this interval


50% chance that true
value lies in this interval

Confidence interval of :

=x

ts
n

4-2

Tab 4-2

Values of Students t
Confidence Level

degrees of freedom
1
2
3
4
5

Ex

50%

90%

1.000
0.816
0.765
0.741
0.727
0.674

6.314
2.920
2.353
2.132
2.015
1.645

95%

98%

99%

12.706
4.303
3.182
2.776
2.571
1.960

31.812
6.965
4.541
3.747
3.365
2.326

63.657
9.925
5.841
4.604
4.032
2.576

99.5%

99.9%

127.32 636.619
14.089 31.598
7.453 12.924
5.598
8.610
4.773
6.869
2.807
3.291

Calculating Confidence Intervals


12.6, 11.9, 13.0, 12.7, 12.5
x = 12.54

s = 0.40

degrees of freedom = n 1 = 4

(50%) = x

ts
(0.741) 0.40
= 12.54 0.13
= 12.54
n
5

(90%) = x

ts
(2.132) 0.40
= 12.54 0.38
= 12.54
n
5

Values of t
n

Tab

degrees of freedom
1
2
3
4
5

Confidence Level
50%

0.707
0.471
0.382
0.331
0.297

90%
4.465
1.686
1.176
0.953
0.823

95%

98%

99%

99.5%

99.9%

8.984
2.484
1.591
1.241
1.050

22.494
4.021
2.270
1.676
1.374

45.012
5.730
2.920
2.059
1.646

90.029
8.134
3.726
2.504
1.949

450.158
18.243
6.462
3.851
2.804

By making enough measurements,


we can make the uncertainty much smaller than the standard deviation!
Comparison of Means with Student's t
95% confidence 5% chance of wrong conclusion
Case 1: Comparing a Measured Result to a Known Value
= 0.0319 % Ni; "true" mean
Using a new method; 0.0329%, 0.0322%, 0.0330%, 0.0323%

x = 0.03260% Ni, s = 0.00041%, n = 4, t = 3.182


95% confidence interval: 0.03195 ~ 0.03325 %
The new method produces a value different from the known one.
The chance that they are the same is < 5%.

95% confidence interval

0.0320

0.0325
4-3

0.0330

Case 2: Comparing Replicate Measurements


t-test: If tcal > ttable, the difference is significant.
If 1 = 2,
tcal =

x1 x 2
spooled

n1n2
n1 + n2

(x i x1)

set 1

spooled =

(x 2 x 2 )

set 2

n1 + n2 2

s12 (n1 1) + s22 (n2 1)


=
n1 + n2 2

degrees of freedom = n1 + n2 2
Ex

Lord Rayleighs Nobel Prize in 1904 for the discovery of Ar


Tab 4-3

Masses of N2 from air and chemically generated nitrogen

x = 2.310109 g, s = 0.000143 g, n1 = 7
from dry air by removing O2: Cu(s) +

1 O (g)
2 2

CuO(s)

x = 2.299472 g, s = 0.001379 g, n2 = 8
from decomposition of N2O
spooled =
tcal =

0.0001432 (7 1) + 0.0013792 (8 1)
= 0.001017
7+82

2.310109 2.299472
0.001017

7 8 = 20.2
7+8

degrees of freedom = 7 + 8 2 = 13
ttab(99.9%) < 4.587 << 20.2

N2 from air is denser.

4-3 Q Test for Bad Data


When n 4,

Q=

Gap
Range

If Q(observed) > Qrej, discard the data point.

Gap = 0.11
12.47 12.48

12.53

12.56

12.67

Q = 0.11/0.20 = 0.55 < 0.64

Range = 0.20
Tab 4-4 Q for rejection

Qrej(90% confidence)

4
5
6
7
8

0.76
0.64
0.56
0.51
0.47
4-4

4-4 Finding the Best Straight Line


Method of Least Squares

(xi,yi)

Assumption: Errors are from the y values.

di

Uncertainties in all of the y values are similar.


y = mx + b

y = mx + b
di = yi (mxi + b)
Seek to minimize the squares of the deviations (variance).


2
m d i = 0 m x i 2 + b x i = x i y i


2
m x i + nb = y i

di = 0

b
xi yi xi
n
yi
Solution: Cramers Rule m =
, b=
xi 2 xi
xi n
m=

n (x i y i ) x i y i

b=

where D = n

(x i )2 ( x i )

xi 2 xi yi
xi yi
xi 2 xi
xi n
(x i 2 ) y i (x i y i ) x i
D

How Reliable are Least-Squares Parameters?

m
n x i y i x i y i nx k x i
=
=
y k y k
D
D
2

y2 2 2
m
m
2
2
=
yk y
= 2 n x k 2nx k
y k
y k
D k
k
k

y2 2
= 2 n x k 2 2n x i x k + n
D
i
k
2
n y
=
D
y 2 xi 2
Similarly, b 2 =
D

(d i d )

where y s y =

( x i )

( x i ) + ( x i )2

y2 2
2
= 2 n x i n

di 2

( x i )2

n2
degrees of freedom
*2 degrees of freedom were used up in determining m and b.

y (y) = [m (m)] x + [b (b)]

m = y

n
D

b = y

(x i 2 )
D
4-5

(d i 2 )
n2

Linear correlation coefficient

(x i x ) ( y i y )
(x i x )2 (y i y )2

r=

r = 0
no correlation

r = 1 complete correlation

Tab 4-5
2

xi

yi

xi yi

xi

1
3
4
6

2
3
4
5

2
9
16
30

1
9
16
36

14

14

57

62

y = mx + b

D = 462 14 = 52
m=

4 57 14 14
= 0.61538
52

b=

62 14 57 14
= 1.34615
52

d i 2 = 0.076923
y = 0.19612
r=

m = y

(x i x ) ( y i y )
(x i x )2 (y i y )2

n
= 0.054394
D

b = y

xi 2
D

= 0.21415

8
= 0.9923
13 5

y (0.20) = 0.615 (0.054) x + 1.35 (0.21)


The first digit of the uncertainty is the last significant figure.
4-5 Constructing a Calibration Curve
Tab 4-6

Spectrophotometer readings for protein analysis


sample (g)

Absorbance

reagent
0

0.099

0.099

0.100

0.185

0.187

0.188

standard

10

0.282

0.272

0.272

solutions

15

0.392

0.345

0.347

20

0.425

0.425

0.430

25

0.483

0.488

0.496

blank

4-6

0.5
out of
the linear range

0.4
absorbance
of
unknown

0.3

0.2

linear range

0.1

0.0

10

15

m = 0.01630

b = 0.1040

m = 0.00022

b = 0.0026

20

25

y = 0.0059

Finding the Protein in an Unknown


A Practical Example: Determination of x from the calibration curve
y (y) = [m (m)] x + [b (b)]

x=

y y b b
m m

y 2 1 x 2n x i 2 2x x i
=
+

D
D
D
m 2 k
k; # of y-measurements

For y = 0.373,

x = 16.5 (0.4) g of protein

4-6 A Spreadsheet for Least Squares


Getting Started
1. Fight the activation energy.
2. Sit in front of a computer.
Homework
Derive Eqs. (4-13) and (4-14).
3, 6, 8, 10, 11, 15 (calculate r), 16

4-7

You might also like