Minitab Tutorial

Statistics For Analytical Chemistry
by using MINITAB program
By Mohamed Salama
Digitally signed by Mohamed Salama
DN: cn=Mohamed Salama, o, ou,
email=msalama78@gmail.com, c=EG
Date: 2009.12.28 03:45:02 +03'00'
I t d ti
Introduction
p
Graph in MINITAB:
y A pictorial gallery from which to choose a graph type.
y Flexibility in customizing graphs, from sub setting of
Flexibility in customizing graphs from sub setting of
data to specifying titles and footnotes.
y Ability to change most graph elements, such as fonts,
symbols, line, placement of tick marks and data
display after the graph is creates.
y Ability to automatically update graphs.
Introduction
Enhancements to Specific Graphs:
y Multiple levels of categorical variables.
y Contour plots use color ramps and label contour lines.
p p
y Use summarized data in making a bar chart.
y Fit regression lines and distributions to selected
g
graphs.
y You can use Empirical CDF (cumulative distribution
function) graphs to evaluate the fit of a distribution to
)
your data or to compare different sample distributions.
Introduction
Data, Limits and Details:
y A worksheet can contain up to 4000 columns, 1000
constants and up to 10,000,000 rows depending on
p , , p g
how much memory your computer has.
y Three stored constants have default values (you can
change them if you wish):
y K998 = ∗ (missing), Κ999 = 2.71828 (e) and K1000 =
3 14159 (pi)
3.14159 (pi).
Introduction
p
Report Pad:
In report pad you can,
y Store MINITAB results and graphs in a single
document.
y Add comments and headings. g
y Rearrange your output.
y Change font sizes.
y Print entire output from an analysis.
y Create Web‐ready reports.
C ti d t t i MINITAB
Creating a data set using MINITAB
y To creating a data set you must know the variables and
o c eat g a data set you ust o t e a ab es a d
records.
y The variables are stored in columns
The variables are stored in columns.
y The individual records go in rows.
y Variable names are inserted at the top of column in the
row with no numerical designation.
C ti tt dd t l
Creating a patterned data column
y If there is more than level for the variable.
y Make sequence for each level and replications.
y Know the first value, last value and the increment
units.
y Know the levels for labeling, replicates in each level
and replicates for each level (the whole sequence).
d li t f h l l (th h l )
y You can make all this by manual data entry.
D i l i d i f
Data manipulation and creation of
new variables
new variables
Data Coding:
y Variable coding is done by creating a new variable
have a name with categories corresponding to the
l l
levels.
y Could coding a numerical variable into a categorical
variable (Numeric to Text).
variable (Numeric to Text)
Data manipulation and creation of
new variables
new variables
Transformation of variables:
y Arithmetic operation such as addition, multiplication,
division, exponentiation, log‐transformation,……..
y Logarithmic transformation: sometimes used in
h f d
statistical analysis for normalizing data or for stabilizing
variances.
y Square root transformation.
y Others, there are many transformation available under
the Functions box (antilog, arccosine, cosine, etc.)
( )
G hi l di l fd t
Graphical display of data
y Histograms.
g
y Pie Chart.
y Scatter Plots.
All this kind of graphics are often used for data
visualization.
Some graphs may help in assessing the shape of
distribution of data, whereas other types my help in
summarizing data at hand or in describing
relationships between variables of interest.
Producing a report output
y There are many different ways to produce a report
e e a e a y d e e t ays to p oduce a epo t
using MINITAB project.
y Use Microsoft Word, by copy and paste the data which
will be reported.
y Use MINITAB Word.
C l i
Conclusion
Minitab Statistical Software:
y Easy to use.
y State of the Art, Graphs and Graph Editing.
y Regression Analysis.
g y
y Statistical Process Control.
y Measurement Systems Analysis.
y Reliability/Survival Analysis.
Conclusion
Minitab Statistical Software (continue):
y Multivariate Analysis.
y Nonparametric.
y Simulations and Distributions.
y Data and File Management.
y General statistics.
y Analysis of variance.
C l i
Conclusion
tab Stat st ca So t a e (co t ue):
Minitab Statistical Software (continue):
y Quality Tools.
y Design of experiments.
y Power and sample size.
y Time Series and Forecasting.
y Tables.
T bl
y Macros and Customizability.
Statistics Introduction
Statistics is the science of…….
Statistics is the science of
y Collecting, Describing and Interpreting data
To make……
y Predictions and Decision.
Includes……
y Describing the problem, gathering data, summarizing
Describing the problem gathering data summarizing
data, analyzing data and communicating meaningful
conclusions.
y Lab chemist are concerned with the chemical analysis
ab c e st a e co ce ed t t e c e ca a a ys s
processes that quantify analytes in different matrices.
y All these processes are subject to:
y Systematic variation. (e.g. Instrument effects, matrix
effect).
y Random variation. (e.g. measurement errors)
y Statistics is a tool to help us understand the effects of
random variation.
y Probability:
y Usually assumes knowledge of the population.
y Distributions are known.
Distributions are known
y The theory behind statistics.
y Principle of inferential statistics:
y Only have a sample.
y p
y Use this to infer details about the population.
g
Definition according to ISO 5725‐1:1994
y Accuracy.
y Trueness.
y Bias.
y Laboratory bias (total bias).
y Bias of measurement method.
y Laboratory component o bias.
y P i i
Precision
y Repeatability (conditions & limit).
y Reproducibility (conditions & limit)
Reproducibility (conditions & limit).
Accuracy
Trueness Precision
Bias Repeatability
p y Reproducibility
p y
conditions r conditions R
Within lab
Withi l b within lab
ithi l b
variation +
Between lab
variation
Accuracy:y
y The closeness of agreement between a test result and
the accepted reference value.
Trueness:
y The closeness of agreement between the average value
obtained from large series of test result and an
accepted reference value.
Bias:
y The difference between the expectation of the test
result and an accepted reference value. (sys. error)
St ti ti I t d ti
abo ato y as ( ota
Laboratory Bias (Total Bias):as):
y The difference between the expectation of the test
results from a particular lab and an accepted reference
value.
l
Bias of the measurement method:
y The deference between the expectation of test result
Th d f b t th t ti f t t lt
obtained from all laboratories using that method and
an accepted reference value.
p
Laboratory component of bias:
y The difference between the lab bias and the bias of the
measurement method.
St ti ti I t d ti
ec s o :
Precision:
y The closeness of agreement between independent test
results obtained under stipulated conditions. (r. error)
Repeatability:y
y Precision under repeatability conditions.
y Conditions where independent test results are
obtained with the same operator on identical test
items in the same lab by the same operator using the
same equipment within short intervals of time.
q p
Reproducibility:
y Precision under reproducibility conditions.
p y
y Condition where test results are obtained with the
same method on identical test items in different labs
with different operators using different equipment.
i h diff i diff i
E l bl
Example problems
y Example problem I:
y A reference material known to contain 1.00% by weight
of a particular component is studied by four analysts
(A–D)
(A D), each analyst performing 5 replicate
each analyst performing 5 replicate
measurements. The results are as follows:
A 1.03 1.05 1.03 1.07 1.07
B 0.98 0.88 1.09 1.06 0.94
C 1.06 1.11 1.13 1.04 1.26
D 0 98
0.98 1 03
1.03 1 02
1.02 0 99
0.99 1 03
1.03
Comment on the random and systematic errors in the
results A‐D
results A D
S l ti
Solution
y Inter the data in the MINITAB
Bias:
y From mean menu select Stat > Quality Tools > Gage
Study > Gage Linearity and Bias Study.
> Gage Linearity and Bias Study
y From the appeared window choose:
y Part Numbers: from variables A, B, C or D.
y Reference Value: from the reference column.
y Measurement Data: the same chooses from Part Number
y Option: to select the methods of estimating repeatability
O ti t l t th th d f ti ti t bilit
standard deviation from:
y Sample Range.
y Sample Standard Deviation.
S l ti
Solution
Precision:
y From mean menu select Stat > Basic Statistics >
Display Descriptive Statistics.
F h d i d h
y Variables: all variables A, B, C and D.
y Statistics: to select the parameter need to be calculated:
y Mean.
y Standard Deviation. (Precision)
y Minimum.
Minimum
y Maximum.
y Range.
y Graphs: you can select which graph describe the data.
Graphs: you can select which graph describe the data
E i
Exercises
y Exercise I:
y A standard sample of pooled human blood serum
contains 42.0g of albumin per litter.
y Five laboratories (A‐E) each do six determinations of the
Five laboratories (A E) each do six determinations of the
albumin concentrations (on the same day), with the
following results (g/l):
A 42.5 41.6 42.1 41.9 41.1 42.2
B 39.8 43.6 42.1 40.1 43.9 41.9
C 43.5 42.8 43.8 43.1 42.7 43.3
D 35.0 43.0 37.1 40.5 36.8 42.2
E 42.2 41.6 42.0 41.8 42.6 39.0
y Comment on the random and systematic errors in the
C h d d i i h
results A‐D
F d t l f St ti ti
Fundamentals of Statistics
y Measures of Central Tendency:
easu es o Ce t a e de cy:
y Average (M):
y Median (Md):
which is simply the middle value of the sample when the
measurements are arranged in numerical order. (if n is
even, then the median is the average of the two middle
values of the order sample).
p )
y Mode:
which is defined as the most frequent value in a
frequency distribution.
F d t l f St ti ti
y Measures of Dispersion:
easu es o spe s o :
y Range (R):
y Standard Deviation (S):
y Coefficient of Variation (CV):
y Measures of Dispersion:
y Variance (S2):
y Pooled Standard Deviation (Sp):
E l bl
Example problems
y Example problem II:
y Ten measurements of the ratio of two peak areas in LC
experiment gave the following values:
0.921 0.2898 0.2923 0.302 0.3

0.296 0.2947 0.2986 0.29 0.288
Calculate the average, standard deviation and relative
standard deviation.
S l ti
Solution
l
y Variables: ratio column.
V i bl i l
y Mean.
y Standard Deviation.
y Variance.
y Coefficient of variation.
y N total.
E i
Exercises
y Exercise II:
y To investigate the reproducibility of a method for the
determination of selenium in food, nine measurements
were made on a single batch of brown rice with the
listed below results:
No. 1 2 3 4 5 6 7 8 9
Se 0.07 0.07 0.08 0.07 0.07 0.08 0.08 0.09 0.08
(ug/g)
Calculate the average, median, mode, range, SD, and RSD
E i Cl i l A l i
Errors in Classical Analysis
y Distribution of Errors:
st but o o o s:
y Continuous random variable: values from interval
numbers, absence of gaps.
y Continuous probability distribution: distribution of
continuous random variable.
y Most important continuous probability distribution: the
normal distribution.
f(X)
Normal Distribution: Bell shaped
p
mean, median and mode are equal
X
random variable has infinity range
g μ
Mean Median Mode
Normal Distribution
Normal Distribution
1
1 − ( − μ ) 2
f (X ) =
2
X
2σ
e
2πσ 2
f ( X ) : density of random variable X

π = 3.14159;
3 14159 e = 22.71828
71828
μ : ppopulation
p mean
σ : population standard deviation
X : value
l off random
d i bl ( −∞ < X < ∞ )
variable
N l Di t ib ti
Normal Distribution
(a) Changing (b) Increasing
shiftsthec
shifts thecurve
r ealongthea
alongtheaxis
is increasesthespreadandflattensthec
increases thespreadandflattensthecurve
re
1 =6
1 = 2=6
2=12
140 160 180 200 140 160 180 200
1 =160 2=174 1 = 2=170

N l Di t ib ti
Normal Distribution
(c) Probabilities and numbers of standard deviations
Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997
− + −2 +2 −3 +3
68% chance of falling 95% chance of falling 99.7% chance of falling
between − and + between − 2 and +2 between − 3 and +3
N l Di ib i
Normal Distribution
N l Di t ib ti
Normal Distribution
Probability is
the area under P (c ≤ X ≤ d ) = ?
the curve!
f(X))
f(
X
c d
Normal Distribution
Normal Distribution
An infinite number of normal distributions

means an infinite number of tables to look up!
E l bl
Example problems
y Example problem III:
Part 1:
y An analytical chemist wishing to evaluate a new method
carries out preliminary investigation in which he makes
i t li i i ti ti i hi h h k
six replicates determination of the Dimethoate content
of a standard solution which is known to have
Dimethoate content of 60 ppb. Each determination
requires the preparation of 10ml sample and the
determination using GC NPD are:
determination using GC‐NPD are:
58.2 61 56.6 61.5 53.8 56.9
Can the analyst draw any conclusions from this limited
amount of data?
S l ti
Solution
Part I:
F h d i d h
y Variables: Determination column.
y Mean.
y Standard Deviation.
y Variance.
Variance
y Coefficient of variation.
y N total.
Graphs: you can select which graph describe the data
E l bl
Example problems
Part 2:
y In addition to the six determination we have just
considered, the analytical chemist has many
id d th l ti l h i t h
determinations of the Dimethoate content of his
standard solution by his old method, this particular
solution has been used for QC purposes over a period of
weeks.
Q 1: Estimate the % determinations which would be
greater than 65.0.
Q 2: Estimate which value is likely to be exceeded by the
highest 10% determinations.
E l bl
Example problems
Part 2:
y The sixty most recent determinations are:
61.0 65.4 60.0 59.2 57.0 62.5 57.7 56.2 62.9 62.5
56.5 60.2 58.2 56.5 64.7 54.5 60.5 59.5 61.6 60.8
58.7 54.4 62.2 59.0 60.3 60.8 59.5 60.0 61.8 63.8
64.5 66.3 61.1 59.7 57.4 61.2 60.9 58.2 63.0 59.5
56.0 59.4 60.2 62.9 60.5 60.8 61.5 58.5 58.9 60.5
61.2 57.8 63.4 58.9 61.5 62.3 59.8 61.7 64.0 62.7
S l ti
Solution
Part II:
y From mean menu select Stat > Tables > Tally
Individual Variables.
F h d i d h
y Variables: R. Dete. column.
y Display: to select the parameter need to be calculated:
y Counts.
y Percents.
Normal Distribution
Normal Distribution
X −μ 6.2 − 5
Z= = = 0.12
σ 10
Normal Distribution Standardized
Normal Distribution
σ = 10
σZ =1
6.2 X 0.12 Z
μ =5 μZ = 0
C fid I t l
Confidence Intervals
y The problem:
y How large are the error bounds when we use data from a
sample to estimate parameter of the underlying
population.
population
y Compute confidence intervals for µ.
when σ2 is known.
when σ2 is unknown.
Suppose an estimate for the mean ( ) is given, and we
want to describe the precision of the estimate.
We do this by giving a range of likely values for the
parameter. Such a range is called Confidence Interval.
t S h i ll d C fid I t l
El t f CI E ti ti
Elements of CI Estimation
A probability that the population parameter falls
somewhere within the Interval.
h h h l
Sample
Confidence Interval
Statistic
Confidence Limit Confidence Limit

(Lower) (Upper)
Confidence limits for Mean
Parameter =
Statistic ± Its Error μ = X ± Error
X − μ = Error = μ − X
X − μ Error
Z = =
σ X σ X
Error = Z σ x
μ = X ± Zσ X
C fid I t l
σ
X ± Z •σ X = X ± Z •
n
σx_
_
X
μ −1.645σ x μ +1.645σ x
90% Samples
μ −1.96σ x μ +1.96σ x
95% Samples
95 p
μ − 2.58σ x μ + 2.58σ x
99% Samples
F t ff t I t l idth
Factor affect Interval width
y Data Variation
D V i i
y measured by σ Intervals Extend from
y Sample Size X ‐ Zσ
Z to X Z σ
X + Z
x x
σX = σX / n
y Level of Confidence
(1 ‐ α)
C fid I t l
Concluding Remark:
As smaller we choose α
ll h as more confident
fd
we get that the interval contains the
parameter µ But at the same time the
parameter µ. But at the same time the
confidence interval gets wider and is
therefore less precise.
CI ( known –
CI (σ k H dl T )
Hardly True)
y Assumptions:
y Population standard deviation is known.
y Population is normally distributed.
p y
y If not normal, use large samples.
y Confidence interval estimate:
CI (σ unknown)
CI (σ
y Assumptions:
y Population standard deviation is unknown.
y Sample size must be large enough for central limit
p g g
theorem or population must be normally distributed.
y Use Student’s t distribution.
y Confidence interval estimate:
E l bl
Example problems
y Example problem IV:
y The DDT content of a fish samples in ppb was
determined using GC‐ECD.
y The following values were obtained:
102 97 99 98 101 106
What are the 95% and 99% confidence limits for the DDT
concentration?
S l ti
Solution
Standardized normal distribution:
y From mean menu select Stat > Basic Statistics > 1‐
sample Z (test and confidence interval).
F h d i d h
y Samples in columns: data column.
y Standard deviation: but the value of SD previous
calculated.
y Options: to choose the confidence level.
p
S l ti
Solution
T distribution:
sample T (test and confidence interval).
F h d i d h
Options: to choose the confidence level
E l bl
Example problems
y Example problem V:
y The absorbance scale of a spectrometer is tested at a
particular wavelength with a standard solution which
has an absorbance given as 0 47
has an absorbance given as 0.47.
y Ten measurements of the absorbance with the
spectrometer give average absorbance 0.461 and
standard deviation 0.003.
Find the 95% confidence interval for the mean absorbance
as measured by the spectrometer and hence decide
whether a systematic error is present.
S l ti
Solution
l ( d d l)
S l i l d l
calculated.
E i
Exercises
y Exercise III:
y A 0.1M solution of acid was used to titrate 10ml of 0.1M
solution of alkali and the following volumes of acid were
record:
9.88 10.18 10.23 10.39 10.25
Calculate the 95% confidence limits of the mean and use
them to decide if there is any evidence of systematic
error.
error
St d t’ t Test
Student’s t T t
y When solving probability problems for the sample
e so g p obab ty p ob e s o t e sa p e
mean, one of the steps was to convert the sample mean
values to z‐scores using the following formula:
where and
y What happens if we do not know the population
standard deviation σ?, if we substitute the population
standard deviation σ with the sample standard
deviation s can we use the standard normal table?
Answer: No
Student’ss tt Test
Student
y This question was addressed in 1908 when W.S. Gosset
s quest o as add essed 908 e .S. Gosset
found that if we replace σ with the sample standard
deviation s the distribution becomes a t‐distribution.
If:
If
then T has a t–distribution
has a t distribution with n‐1 degrees of
with n 1 degrees of
freedom. the t‐distribution is similar to z‐curve in that
it is bell shaped, but the shape of the t‐distribution
changes with the degrees of freedom. We will use the
T‐tables to get the critical t‐values at different levels of
α and degrees of freedom.
and degrees of freedom
Student’s t Test
y One sample t‐test:
y When using t‐test TAKE CARE !!! Is it :
one‐tailed
il d or two‐tailed
il d
Student’s t Test
y Independent sample t‐test (equal variances):
y Degrees of freedom = (n
D f f d ( 1 + n
2) ‐
) 2
Student’s t Test
y Independent sample t‐test (unequal variances):
Student’s t Test
y Independent sample t‐test (unequal variances)
Independent sample t test (unequal ariances)
y In such complicated case degrees of freedom is
calculated from:
St d t’ t Test
Student’s t T t
y Paired sample t‐test:
y The sign of the difference is very important.
E l bl
Example problems
y Example problem VI:
y In a method for determination of mercury by cold‐vapor
atomic absorption the following values were obtained
for a standard reference material containing 38 9%
for a standard reference material containing 38.9%
mercury:
38 9%
38.9% 37 4%
37.4% 37 1%
37.1%
Is there any evidence of systematic error?
S l ti
Solution
Graphical Summary.
h l
y Variables: data column.
V i bl d l
y Confidence level: to choose the confidence level.
> 1
y Perform hypothesis test: record the value of Mean
E l bl
Example problems
y Example problem VII:
y In a comparison of LC and GC methods for the
determination of Primicarb in vegetables the following
results (ug/g) were obtained:
Mean SD
HPLC method 28.00 0.30
GC‐NPD method 26.25 0.23
For each method 10 determinations where made.
Assuming that the two samples have standard deviation
which are not significantly different, could you decide
that both method give results having means which
differ significantly?
S l ti
Solution
l ( d d l)
y Summarized Data:
S i d D
y First:
y Sample size, Mean and Standard Deviation.
p ,
y Second:
y Sample size, Mean and Standard Deviation.
y Assume equal variances.
A l i
p y g p
E l bl
Example problems
y Example problem VIII:
y The pesticide levels (ppm) in four thoroughly
homogenized apple samples are determined once by
each (a) a chromatographic method and (b) an
immunoassay method. The results are:
Apple Pesticide Level
Chrom. Immuno.
1 7.1 7.6
2 6.1 6.8
3 5.0 4.8
4 6.0 5.7
Is there any evidence that the two analytical methods give
significantly different results?
S l ti
Solution
y From mean menu select Stat > Basic Statistics > Paired
T (test and confidence interval).
( d d l)
y Samples in Columns:
S l i C l
y First Sample:
y The column of variables need to paired evaluates.
p
y Second Sample:
y The column of variables need to paired evaluates.
O ti t h th fid l l
E l bl
Example problems
y Example problem IX:
y It is suspected that an acid‐base titrimetric method has
a significant indicator error and thus tends to give
results with a positive systematic error (i e positive bias)
results with a positive systematic error (i.e. positive bias)
y To test this an exactly 0.1M solution of acid is used to
titrate 25.0ml of an exactly 0.1M solution of alkali with
the following results (ml):
( )
25.06 25.18 24.87 25.51 25.34 25.41
Test for positive bias in these results.
S l ti
Solution
Using 1‐Sample Z:
F h d i d h
calculated.
y Options: to choose the confidence level and data one tail
p
or two tail.
S l ti
Solution
Using 1‐Sample T:
F h d i d h
y Options: to choose the confidence level and data one tail
or two tail.
y Perform hypothesis test: record the value of Mean.
S l ti
Solution
Using Gage Linearity and Bias Study:
y From mean menu select Stat > Quality Tools > Gage
Study > Gage Linearity and Bias Study.
F h d i d h
y Part Numbers: from variables A, B, C or D.
y Reference Value: from the reference column.
Reference Value: from the reference column
y Measurement Data: the same chooses from Part Number
y Option: to select the methods of estimating repeatability
standard deviation from:
y Sample Range.
y S
Sample Standard Deviation.
l S d d D i i
T t
f ‐ Test
y f – Test is used to compare the standard deviation of
two samples and to make a test to determine whether
the population from which they come have equal
variances.
E l bl
Example problems
y Example problem X:
y A proposed method for the determination of TCDD in
water was compared with the standard method. The
following results were obtained for a river water sample:
Mean (ug/l) SD (ug/l)
Standard method 72 0
72.0 3 31
3.31
Proposed method 72.0 1.51
For each method 8 determinations were mad. Is the
precision of the proposed method is significantly greater
than that of the standard method?
S l ti
Solution
Variances.
y Summarized Data:
S i d D
y First:
y Sample size and Variance.
p
y Second:
y Sample size and Variance.
y Storage: the data could be calculated and recorded in
St th d t ld b l l t d d d d i
MINITAB worksheet.
p
H th i T ti
Hypothesis Testing
y A hypothesis is a claim (assumption) about the
population parameter.
l
y Examples of parameters are
population mean or
l ti I claim the mean GPA of
proportion. this class is μ = 3.5!
y The parameter must be
identified before analysis.
H th i T ti
Hypothesis Testing
Assume the
A th
population
g
mean age is 50. 5
H 0 : μ = 50
( ) Identify the Population
Is X = 20 likely if μ = 50 ? Take a Sample
N nott lik
No, likely!
l !
REJECT
Null Hypothesis ( X = 20 )
H th i T ti
Hypothesis Testing
Sampling Distribution of X
It is unlikely that ... Therefore,
we would ld gett a we reject
j t th
the
sample mean of null hypothesis
this value ... that m = 50.
50
... if in fact this were
th population
the l ti mean.
20 μ = 50 X
If H0 is true
H th i T ti
Hypothesis Testing
H0: Innocent
Jury Trial Hypothesis Test
The Truth The Truth
Verdict Innocent Guilty Decision H0 True H0 False
Do Not Type II
Innocent Correct Error Reject 1 ‐ α
Error (β )
H0
Type I
T I Power
Error Correct Reject
Guilty Error
H0 (1 ‐ β )
(α )
H th i T ti
Hypothesis Testing
If you reduce the probability of one
error, the other one increases so that
everything else is unchanged.
α
H th i T ti
Hypothesis Testing
y True value of population parameter
y β Increases when the difference between hypothesized
parameter and its true value decrease.
y Significance level β
y β increases when α decreases
α
y Population standard deviation β σ
y β increases when σ
c eases e σ increases
c eases
y Sample size β
y β increases when n decreases
n
E i
Exercises
y Exercise IV:
y A new operator is given a sample containing a known
concentration 50ug/kg of profenofos and instructed to
make eight determination.
make eight determination
y He obtained the following results:
49.4 49.8 50.8 49.3 51.3 50.0 50.8 51.8
a) Is there evidence that operator is biased?
b)) C
Calculate a 95%
% confidence interval and hence state
the maximum possible bias for operator.
c) What sample size is needed to estimate operator bias
to within ±
i hi 0.3ug/kg.
0 3 /k
A l i fV i (ANOVA)
Analysis of Variance (ANOVA)
y The statistical tests described previously are used in
the comparison of two sets of data, or to compare a
h f fd
single sample of measurements with a standard or
reference value frequently.
reference value frequently
y However, it is necessary to compare three or more sets
of data and in that case we can make a use of a very
powerful statistical method with a great range of
applications.
l
A l i f V i
(ANOVA)
A l i fV i (ANOVA)
y If there is only one source of variation apart from this
measurement area, a one‐way ANOVA calculation is
appropriate.
appropriate
y If there are two sources of variation we use two‐way
If there are two sources of variation we use two way
ANOVA calculation.
y And so on…..
E l bl
Example problems
y Example problem XI:
y A sample of fruit is analyzed for its pesticide content by
a liquid chromatographic procedure, but four different
extraction procedures A – D are used, the concentration
extraction procedures A D are used the concentration
in each case being measured three times.
y The results are indicated in the shown table:
Results A B C D
1 10.5 9.9 9.9 9.2
2 11 5
11.5 10 8
10.8 91
9.1 85
8.5
3 10.7 10.8 8.9 9.0
y Is there any evidence that the four different sample
preparation methods yield different results?
S l ti
Solution
y From mean menu select Stat > ANOVA > One – Way
Analysis of Variance (unstacked).
l ( k d)
y Responses (in separate column):
R (i l )
y All variables A, B, C and D.
y Confidence level.
Note: Iff the p

p‐value is below a specified
p f significance
g f yyou
can declare the statistic to be statistically significant
and reject the test's null hypothesis.
χ2‐Squared test
S dt t
y The significance tests so far described in this course, in
general, assume that the data analyzed:
l h h d l d
y Be continuous, interval data comparison a whole
population or sampled randomly from a population
population or sampled randomly from a population.
y Have a normal distribution.
y Sample size should not differ hugely between the
p g y
groups.
χ2= ∑ (O‐E)2/E
y In contrast, chi‐squared test is concerned with
frequency .
y i.e. the number of times a given event occurs.
i e the number of times a given event occurs
E l bl
Example problems
y Example problem XII:
y The numbers of glassware breakages reported by four
laboratory workers over a given period are given below:
24 27 11 9
Is there any evidence that the workers differ in their
reliability?
Solution:
Ho: the same number of breakages by each worker.
Ha: different number of breakages by each workers.
S l ti
Solution
y From mean menu select Stat > Tables > Chi‐Square
goodness –
d of – Fit test (one variable).
( bl )
y Responses (in separate column):
R (i l )
y All variables A, B, C and D.
y Confidence level.
Note: Iff the p

p‐value is below a specified
p f significance
g f yyou
can declare the statistic to be statistically significant
and reject the test's null hypothesis.
C l i f Si ifi T t
Conclusion from Significance Test
y as we have just explained that a significance test at for
example P = o.o5 level involves a 5% risk that a null
l P l l i l % i k th t ll
hypothesis will be rejected even though it is true.
y This type of error is known as a type 1 error
yp yp ((the risk of
such an error can be reduced by altering the
significance level of the test to P = 0.01 or even P = 0.001
y This however is not only possible type of error it is also
possible to retain a null hypothesis even when it is false.
This is called type 2 error In order to calculate the
This is called type 2 error. In order to calculate the
probability of type 2 error it is necessary to postulate an
alternative to the null hypothesis known as alternative
h
hypothesis.
h i
C l i f Si ifi T t
y Consider the situation where a certain chemical
product is meant to contain 3% of phosphorus by
d f h h b
weight.
y It is suspected that this proportion has increased.
I i d h hi i h i d
y To test such increase the composition is analyzed by a
standard method with known standard deviation of
t d d th d ith k t d d d i ti f
0.036%.
y Suppose 4 measurements are made and a significance
test is performed at level of P = 0.05. a one‐tailed test is
required, as we are interested only in an increase.
q , y
C l i f Si ifi T t
Ho, µ = 3.0%
y The solid line shows the
h ldl h h
sampling distribution of the
mean if null hypothesis is true.
yp
This sampling distribution has
mean 3.0 and Standard
deviation = 0 036/√4 %. If the
deviation = 0.036/√4 % If the
sample mean lies above the
indicated critical value Xc the
null hypothesis is rejected,
ll h th i i j t d
thus the shaded region with
area 0.05 represent the
probability of a type 1 error
C l i f Si ifi T t
y The only way in which both
errors can be reduced is by
b d d b
increasing the sample size. The
effect of increasing n to 9 for
example, is illustrated in the
shown figure the resultant
decrease in the standard error
of the mean produces a
decrease in both types of error
for a given value of Xc.
C l i f Si ifi T t
y the probability that a false null hypothesis is rejected
is known as the power of the test. That is the power of
k h f h h h f
a test is (1‐ the probability of a type 2 error).
y In the studied example the power is a function of the
mean specified in the alternative hypothesis and
depends on the sample size the significance level of
depends on the sample size, the significance level of
the test and whether the test is one‐ or two‐ tailed.
y If two or more test are available to test the same
hypothesis, it may be useful to compare the powers of
the tests to decide which is more appropriate.
the tests to decide which is more appropriate
E l bl
Example problems
y Example problem XIII:
y The nitrate level (mg/l) in a sample of river water was
measured four times, with the following results:
0.404 0.400 0.398 0.379
Can the last value be rejected as an outlier?
S l ti
Solution
y From mean menu select Stat > Basic Statistics > Display
p y
Descriptive Statistics.
y Variables: Determination column.
Variables: Determination column
y From mean menu select Stat
F l t St t > Basic Statistics
B i St ti ti > 1‐sample T
l T
(test and confidence interval).
y Perform hypothesis test: record the value of Mean
f yp
THANK YOU

Minitab Tutorial

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Minitab Tutorial

Uploaded by

Copyright:

Available Formats

Statistics For Analytical Chemistry

0.921 0.2898 0.2923 0.302 0.3

f ( X ) : density of random variable X

140 160 180 200 140 160 180 200

1 =160 2=174 1 = 2=170

Shaded area = 0.683 Shaded area = 0.954 Shaded area = 0.997

An infinite number of normal distributions

Confidence Limit Confidence Limit

102 97 99 98 101 106

9.88 10.18 10.23 10.39 10.25

25.06 25.18 24.87 25.51 25.34 25.41

Note: Iff the p

Note: Iff the p

0.404 0.400 0.398 0.379

You might also like