02 - Statistical Analysis - Chem32 PDF

9/1/2016
Overview
accuracy and precision
errors in chemical analysis
measures of location (measures of
central tendency)
measures of dispersion
confidence interval of the mean
outliers
Q-test , Grubbs test
significance testing
t-test, F-test
Statistical Treatment
of Data
Chem 32
1st Sem 2016-2017
Precision
Accuracy vs. Precision

Accuracy
the closeness of a
measurement to its true
or accepted value
Repeatability: replicate measurements done

on the same sample using the same
conditions (same measurement procedure,
same operators, same measuring system,
same operating conditions and same
location) over a short period of time
Precision
the agreement between

multiple measurements
made in the same way
Reproducibility : different locations

(different laboratories), different
operators, different measuring systems,
different time
3
Precision
Absolute and relative error
Intermediate precision: replicate

measurements on the same or similar
samples employing the same
measurement procedure, same location
over an extended period of time, but may
include other conditions involving changes
absolute error (E) in the measurement

of quantity x
E = xi -
= true or accepted value
relative error (Er)
(expressed as percent)
Er = xi -
* 100
6
9/1/2016
Types of Error
Sample problem
systematic (or determinate) error
An analyst found 2.62 g calcium in a
- affect the accuracy of results
sample which actually contains 2.52 g
random (or indeterminate) error
calcium. Calculate the (a) absolute
- affect the precision of results
error and (b) relative error in percent
gross error
- lead to outliers
7
Sources of systematic errors
instrumental error
- caused by nonideal instrument
behavior, by faulty calibrations, or by
use under inappropriate conditions
- usually corrected by calibration
method error
- arises from nonideal chemical or
physical behavior of analytical systems
e.g. interferences
slowness of reactions
incomplete reaction
species instability
nonspecificity of reagents
side reaction
9
10
Effect of systematic error
personal error
- results from personal limitations of
the analyst
e.g. insensitivity to color changes
tendency to estimate scale
readings to improve precision
Constant error
- the magnitude of error does not depend
on the size of the quantity measured
11
12
9/1/2016
Effect of systematic error
Detection of systematic errors

blanks
proportional error
- it may increase or decrease in proportion
analysis of a standard (reference

material)
to the size of the sample taken for

analysis
Ex.
- the concentration of interfering species
analysis by a second, independent

method
increases as the concentration of an

analyzed substance increases
13
14
Random errors
- represent random fluctuations in
procedures and measuring devices
(including the human observer) that
are beyond the control of the analyst
near the performance limit of the
instrument
instrument noise
drift in electronic circuit
vibrations
temperature, etc.
Type of error
Qualitative
description
Quantitative
measure
systematic error
trueness
bias
(total) error
accuracy
uncertainty
random error
precision
measures of spread
(s, RSD) under
specified conditions
Illustration of the links between some fundamental concepts

used to describe the quality of measurement results
15
Absolute and relative uncertainty
16
Absolute and relative uncertainty
absolute uncertainty: margin of uncertainty

associated with a measurement
Ex.
If a buret is calibrated to read within 0.02
mL, the absolute uncertainty for measuring
12.35 mL is 0.02 mL
relative uncertainty: compares the size of
the absolute uncertainty with the size of its

associated measurement
Ex.
For buret reading of 12.35 0.02 mL , the
relative uncertainty is:
Relative Uncertaint y =
Absolute Uncertaint y
= 0.002
Measured Value
Relative Uncertaint y =
17
0.02 mL
= 0.002
12.35 mL
18
9/1/2016
Properties of the
normal distribution
3.42
6.66
68.3%
95.4%
99.7%
19
Measures of central tendency
20
Measures of dispersion
Arithmetic mean
Range, R
Deviation, d
Standard deviation
Relative average
deviation, in %
Relative standard
deviation
Variance, V
Median
Mode
- value that occurs most frequently
21
RSD expressed as parts per hundred (pph) or %: coefficient

of variation (CV)
22
Sample problem
Sample problem
An analyst reported the following

amount (mg/L) of Cl- in a given
sample:
A student found the following values

for the copper content (g/100 g) in an
ore sample.
15.67
39.17, 39.99, 39.21, 39.54, 39.43

and 37.72
15.69
16.03
Calculate the mean, range, standard

deviation, coefficient of variation.
Calculate the mean, average deviation

and standard deviation
23
24
9/1/2016
Confidence interval
Confidence interval
CI for the mean is the range of values

within which the population mean is
expected
to
lie
with
a
certain
probability
Confidence level is the probability that
the true mean lies within a certain
interval and is often expressed as a
percentage
If is unknown and s can be

calculated:
= sample mean
t = Students t, taken from the Table
25
26
Sample problem
Analysis of an insecticide gave the
following values for % of the chemical
lindane:
7.47
6.98
7.27
Calculate the CI for the mean value at
the 90% confidence level.
27
Sample problem
Determination of the cadmium level of a
blood
sample
by
ion-selective
measurement gave the following results
(mq/L).
139.2
139.8
140.1
139.4
Determine the 95% confidence interval

of
the
mean
for
the
cadmium
concentration measurements.
29
90% CI for = 7.24 0.42
28
If is unknown and s can be

calculated:
If is unknown and can be

calculated:
30
9/1/2016
Sample Problem
If is known and can be calculated:
Atomic absorption analysis for copper

concentration in aircraft engine oil gave a
value of 8.53 g Cu/mL.
Pooled results of many analyses showed
s = 0.32 g Cu/mL.
Calculate 90% and 99% confidence
limits if the above result were based on
(a) 1, (b) 4, (c) 16 measurements.
31
90%
32
99%
How many replicate measurements are

needed to decrease the 90%
confidence interval to 8.53 0.05 ?
4
16
At N=16: There is a 90% chance that
the true mean lies within the range
8.53 0.13 (8.40 to 8.66)
33
Seatwork
sample mean = 13.77.
N=30
= 5.88
use 95% confidence level
to calculate the
confidence limit
Calculate the number of
replicates needed to
decrease the confidence
interval to 13.77 0.50
34
Sample Problem:
Conf. Level,%
50
68
80
90
95
96
99
99.7
99.9
z
0.67
1.0
1.29
1.64
1.96
2.00
2.58
3.00
3.29
35
The population standard deviation for the

amount of aspirin in a batch of analgesic
tablets is known to be 7 mg of aspirin. What
is the 95% confidence interval for the
analgesic tablets if an analysis of five tablets
yields a mean of 245 mg of aspirin?
36
9/1/2016
Test for outliers

to show whether the outlying
value/s could reasonably arise from
chance variation or are so extreme
as to indicate some other causes
to provide objective criteria for
taking investigative or corrective
action
There is a 95% probability that the population

mean is between 239 mg and 251 mg of aspirin
How many replicate measurements are

needed to decrease the 95% confidence
interval to 245 3?
Dixons Q-test - for small data set

Grubbs test - ISO recommended
Ans. : 21
37
38
Table 3. Qcritical values for Q-test for

outliers
Q-test
If Qcalc > Qcritical : suspect value is rejected
Grubbs test
If Gcalc > Gcritical : suspect value is rejected
39
40
Sample problem
Table 4. Gcritical values for Grubbs test for outliers
Apply Q-test and Grubbs test (at 90%

confidence level) to the following data and
decide whether a value should be rejected
0.403
0.410
0.401
0.380
0.400
0.413
0.408
41
42
9/1/2016
Sample problem
Sample problem
A student found the following values for

the % iron content of an ore sample:
You titrate 0.1000 N HCl vs. NaOH and

find that for 4 trials the N of the NaOH
is:
39.17
39.54
39.99
39.43
39.21
38.72
0.0968, 0.0979, 0.1020 and 0.0985
Which is the most probable outlier? Can

you legitimately discard it?
Should you drop the value of 0.1020 as

an outlier? [use Grubbs test at 95%CL]
43
44
Cautious approach to rejection

1. Reexamining data if gross error has
been made
- importance of properly kept lab
notebook
2. If possible study the precision of the
procedure
3. Repeat the analysis
4. Apply Q-test
5. If tests indicate retention also report
median and range (where there is no
influence from outlying result)
Significance testing
T-test
- testing for significant difference between the
(1) means and a reference value
(2) two data sets (difference of means)
(3) difference between pairs of measurements
F-test
- testing for significant difference between
the spreads of two data sets (difference of s)
46
45
Null and alternative hypothesis
Steps in significance testing
Null hypothesis
H0: A = B
1. State the null hypothesis
The means are equal
2. State the alternative hypothesis

Alternative hypothesis
Ha: A B The means are not equal
two-tailed test
3. Select the appropriate test

4. Choose the level of significance for
the test
Ha: A > B Mean A is greater than mean B

one-tailed test
5. Calculate the test statistic

6. Obtain the critical value for the test
7. Compare the test statistic with the
critical value
Ha: A < B Mean A is less than mean B

one-tailed test
47
48
9/1/2016
Null hypothesis
One-sided/two-sided probabilities
H0:
Test statistic
Alternative hypothesis and rejection region
If Ha: reject H0 if tcalc ttab or
tcalc -ttab
a) one-tailed test with a significance level of 0.05 and 3 degrees

of freedom
b) two-tailed test with a significance level of 0.05 and 3 degrees
of freedom
49
t-test: comparison of experimental

mean with a reference value
(one-sample t-test)
If Ha:
>
reject H0 if tcalc ttab
If Ha:
<
reject H0 if tcalc -ttab
50
Sample problem
The following results (%K) were obtained
from the AAS analysis of a standard
reference material containing 38.90% K:
38.92
= sample mean
37.40
37.11
= reference value or stated value

s = sample standard deviation
Is there a significant difference (at

95%CL) between the mean of the results
and the certified value?
n = sample size
To determine whether the difference between the
experimental mean and the accepted value is due to
random error or to an actual systematic error
51
Sample problem
A new procedure for the rapid
determination of sulfur in kerosene was
tested on a sample known to contain
0.123% S. The results were
0.112, 0.118, 0.115 and 0.119 % S
Do the data indicate that there is a bias
in the method?
53
52
Sample problem
Analysis of five replicates of a vitamin
preparation known to contain 500.0
mg of vit C gives 502.0, 500.0,
505.0, 501.0 and 504.0 mg. Is the
difference between the experimental
mean and the true value due to
random error or is there a determinate
error in the method?
54
9/1/2016
t-test: comparison of two experimental means
Sample problem
The manufacturer claim that the mean
fat content of his burger is around
20%. Shown below is the result of
fat analysis for his sample. Was the
manufacturers claim true at 90%
confidence level?
55
t-test: comparison of two experimental means

Steps:
(1) Formulate a null hypothesis that the two means are
identical
H0 :
b. look for ttab (deg.of freedom= na + nb -2)

(3) Compare tcalc and ttab:
No to H0
Yes to H0
F-test: comparison of two standard dev.

*establishes if there is a significant difference between
standard deviations (or variances)
(2) Perform t-test

a. calculate tcalc (note: sa & sb NOT significantly different
[F-test first])
a. if tcalc > ttab

b. if tcalc < ttab
56
*answer the question: are the spreads different i.e. do

the two sets of data come from two separate
populations?
Two forms:
1. Is the precision of procedure A higher
than the precision of procedure B (a
one-tailed test)?
2. Is the precision of procedure A
significantly different from the precision
of procedure B (a two-tailed test)?
57
F-test: comparison of two standard dev.
58
F values at the 95% probability level
(1) Calculate Fcalc

where s1> s2
(2) Look for Ftab

(3) Compare Fcalc and Ftab
a. if Fcalc > Ftab
sa and sb are
significantly different
b. if Fcalc < Ftab
sa and sb are NOT

significantly different
59
60
10
9/1/2016
Sample problem
Sample problem
A proposed method for the determination of

chemical oxygen demand of wastewater was
compared with the standard (mercury salt)
method. The following results were obtained for
a sewage effluent sample:
The amount of 14CO2 in a plant sample

is measured to be:
28, 32, 27, 39 & 40 counts/min
(mean = 33.2).
Mean (mg/L)
std dev (mg/L)
72
72
3.31
1.51
Standard method
Proposed method
For each method 8 determinations were made.

Is the precision of the proposed method
significantly greater than that of the standard
method?
61
62
Before determining the amount of Na2CO3 in

an unknown sample, a student decided to
check her procedure by analyzing a sample
known to contain 98.76% w/w Na2CO3. Five
replicate determinations of the %w/w
Na2CO3 in the standard were made with the
following results
A new colorimetric method for determining

the glucose content of blood serum was
compared with the standard Folin-Wu
method. The results were as follows:
New method (mg/dL glucose):
125 123 130 131
126
Are the mean values significantly

different at a 95% confidence level?
Sample problem
Sample problem
127
The amount of radioactivity in a blank

is found to be:
28, 21, 28, & 20 counts/min
(mean = 24.2).
129
Folin-Wu method, (mg/dL glucose):

130 128
131
129 127 125
Are the mean values significantly different at
63
the 95% CL?
Sample problem
98.71
98.59
98.62
98.44
98.58
Is the mean for these five trials significantly

different from the accepted value at the
95% confidence level (=0.05)
64
Sample problem
Two barrels of wine were analyzed for

their alcohol content in order to
determine whether they were from
different sources. On the basis of six
analyses, the average content of the
first barrel was established to be
12.61% ethanol. Four analyses of
the second barrel gave a mean of
12.53% alcohol. The ten analyses
yielded a pooled value of s of
0.070%. Do the data indicate a
difference between the wines?
65
Method used for routine analysis gives

s = 0.06. Modification of the method
gives a pooled estimate of s = 0.04 for
a statistical sample with 12 degrees of
freedom. Has the modification
improved the precision of the analysis?
66
11
9/1/2016
Paired t-test
Sample problem
Uses the same type of procedure as the

normal t test except that we analyzed pairs of
data.
Procedure:
- Calculate the difference, di, between the
paired values for each sample
- Calculate the average difference, , and the
standard deviation of the differences, sd
H0:
= 0 (there is no difference between the
two samples)
Ha :
0
n= no. of paired samples
67
Sample
Microbiological
Electrochemical
129.5
132.3
89.6
91.0
76.6
73.6
52.2
58.2
110.8
104.2
50.4
49.9
72.4
82.1
141.4
154.1
75.0
73.4
10
34.1
38.1
11
60.3
60.1
Mareceket. al. developed a new electrochemical

method for rapidly determining the
concentration of the antibiotic monensinin
fermentation vats. The standard method for the
analysis, a test for microbiological activity, is
both difficult and time consuming. Samples
were collected from the fermentation vats at
various times during production and analyzed
for the concentration of monensinusing both
methods. The results, in parts per thousand
(ppt), are reported in the following table.
Is there a significant difference between the
methods at = 0.05?
68
z-test
for a large no. of results so that s is a good

estimate of
Null hypothesis
H0: =
Test statistic
Alternative hypothesis and rejection region
If Ha: reject H0 if zcalc ztab or
zcalc -ztab
69
If Ha:
>
reject H0 if zcalc ztab
If Ha:
<
reject H0 if zcalc -ztab
70
Sample problem
A class of 30 students determined the
activation energy of a chemical reaction to
be 27.7 kcal/mol (mean value) with a
standard deviation of 5.2 kcal/mol. Are
the data in agreement with the literature
value of 30.8 kcal/mol at
(1) the 95% confidence level and
(2) the 99% confidence level?
Estimate the probability of obtaining a mean
equal to the literature value.
71
72
12

02 - Statistical Analysis - Chem32 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02 - Statistical Analysis - Chem32 PDF

Uploaded by

Copyright:

Available Formats

9/1/2016

Accuracy vs. Precision

Repeatability: replicate measurements done

the agreement between

Reproducibility : different locations

Absolute and relative error

Intermediate precision: replicate

absolute error (E) in the measurement

systematic (or determinate) error

An analyst found 2.62 g calcium in a

- affect the accuracy of results

sample which actually contains 2.52 g

random (or indeterminate) error

calcium. Calculate the (a) absolute

- affect the precision of results

error and (b) relative error in percent

Sources of systematic errors

Sources of systematic errors

Sources of systematic errors

Effect of systematic error

Effect of systematic error

Detection of systematic errors

analysis of a standard (reference

to the size of the sample taken for

analysis by a second, independent

increases as the concentration of an

Illustration of the links between some fundamental concepts

Absolute and relative uncertainty

Absolute and relative uncertainty

absolute uncertainty: margin of uncertainty

relative uncertainty: compares the size of

the absolute uncertainty with the size of its

Measures of central tendency

RSD expressed as parts per hundred (pph) or %: coefficient

An analyst reported the following

A student found the following values

39.17, 39.99, 39.21, 39.54, 39.43

Calculate the mean, range, standard

Calculate the mean, average deviation

CI for the mean is the range of values

If is unknown and s can be

Determine the 95% confidence interval

90% CI for = 7.24 0.42

If is unknown and s can be

If is unknown and can be

If is known and can be calculated:

Atomic absorption analysis for copper

How many replicate measurements are

The population standard deviation for the

Test for outliers

There is a 95% probability that the population

How many replicate measurements are

Dixons Q-test - for small data set

Table 3. Qcritical values for Q-test for

If Qcalc > Qcritical : suspect value is rejected

If Gcalc > Gcritical : suspect value is rejected

Table 4. Gcritical values for Grubbs test for outliers

Apply Q-test and Grubbs test (at 90%

A student found the following values for

You titrate 0.1000 N HCl vs. NaOH and

0.0968, 0.0979, 0.1020 and 0.0985

Which is the most probable outlier? Can

Should you drop the value of 0.1020 as