Professional Documents
Culture Documents
by using MINITAB program
By Mohamed Salama
Digitally signed by Mohamed Salama
DN: cn=Mohamed Salama, o, ou,
email=msalama78@gmail.com, c=EG
Date: 2009.12.28 03:45:02 +03'00'
I t d ti
Introduction
p
Graph in MINITAB:
y A pictorial gallery from which to choose a graph type.
y Flexibility in customizing graphs, from sub setting of
Flexibility in customizing graphs from sub setting of
data to specifying titles and footnotes.
y Ability to change most graph elements, such as fonts,
symbols, line, placement of tick marks and data
display after the graph is creates.
y Ability to automatically update graphs.
Introduction
Enhancements to Specific Graphs:
y Multiple levels of categorical variables.
y Contour plots use color ramps and label contour lines.
p p
y Use summarized data in making a bar chart.
y Fit regression lines and distributions to selected
g
graphs.
y You can use Empirical CDF (cumulative distribution
function) graphs to evaluate the fit of a distribution to
)
your data or to compare different sample distributions.
Introduction
Data, Limits and Details:
y A worksheet can contain up to 4000 columns, 1000
constants and up to 10,000,000 rows depending on
p , , p g
how much memory your computer has.
y Three stored constants have default values (you can
change them if you wish):
y K998 = ∗ (missing), Κ999 = 2.71828 (e) and K1000 =
3 14159 (pi)
3.14159 (pi).
Introduction
p
Report Pad:
In report pad you can,
y Store MINITAB results and graphs in a single
document.
y Add comments and headings. g
y Rearrange your output.
y Change font sizes.
y Print entire output from an analysis.
y Create Web‐ready reports.
C ti d t t i MINITAB
Creating a data set using MINITAB
y To creating a data set you must know the variables and
o c eat g a data set you ust o t e a ab es a d
records.
y The variables are stored in columns
The variables are stored in columns.
y The individual records go in rows.
y Variable names are inserted at the top of column in the
row with no numerical designation.
C ti tt dd t l
Creating a patterned data column
y If there is more than level for the variable.
y Make sequence for each level and replications.
y Know the first value, last value and the increment
units.
y Know the levels for labeling, replicates in each level
and replicates for each level (the whole sequence).
d li t f h l l (th h l )
y You can make all this by manual data entry.
D i l i d i f
Data manipulation and creation of
new variables
new variables
Data Coding:
y Variable coding is done by creating a new variable
have a name with categories corresponding to the
l l
levels.
y Could coding a numerical variable into a categorical
variable (Numeric to Text).
variable (Numeric to Text)
Data manipulation and creation of
new variables
new variables
Transformation of variables:
y Arithmetic operation such as addition, multiplication,
division, exponentiation, log‐transformation,……..
y Logarithmic transformation: sometimes used in
h f d
statistical analysis for normalizing data or for stabilizing
variances.
y Square root transformation.
y Others, there are many transformation available under
the Functions box (antilog, arccosine, cosine, etc.)
( )
G hi l di l fd t
Graphical display of data
y Histograms.
g
y Pie Chart.
y Scatter Plots.
All this kind of graphics are often used for data
visualization.
Some graphs may help in assessing the shape of
distribution of data, whereas other types my help in
summarizing data at hand or in describing
relationships between variables of interest.
Producing a report output
y There are many different ways to produce a report
e e a e a y d e e t ays to p oduce a epo t
using MINITAB project.
y Use Microsoft Word, by copy and paste the data which
will be reported.
y Use MINITAB Word.
C l i
Conclusion
Minitab Statistical Software:
y Easy to use.
y State of the Art, Graphs and Graph Editing.
y Regression Analysis.
g y
y Statistical Process Control.
y Measurement Systems Analysis.
y Reliability/Survival Analysis.
Conclusion
Minitab Statistical Software (continue):
y Multivariate Analysis.
y Nonparametric.
y Simulations and Distributions.
y Data and File Management.
y General statistics.
y Analysis of variance.
C l i
Conclusion
tab Stat st ca So t a e (co t ue):
Minitab Statistical Software (continue):
y Quality Tools.
y Design of experiments.
y Power and sample size.
y Time Series and Forecasting.
y Tables.
T bl
y Macros and Customizability.
Statistics Introduction
Statistics Introduction
Statistics is the science of…….
Statistics is the science of
y Collecting, Describing and Interpreting data
To make……
y Predictions and Decision.
Includes……
y Describing the problem, gathering data, summarizing
Describing the problem gathering data summarizing
data, analyzing data and communicating meaningful
conclusions.
Statistics Introduction
Statistics Introduction
y Lab chemist are concerned with the chemical analysis
ab c e st a e co ce ed t t e c e ca a a ys s
processes that quantify analytes in different matrices.
y All these processes are subject to:
y Systematic variation. (e.g. Instrument effects, matrix
effect).
y Random variation. (e.g. measurement errors)
y Statistics is a tool to help us understand the effects of
random variation.
Statistics Introduction
Statistics Introduction
y Probability:
y Usually assumes knowledge of the population.
y Distributions are known.
Distributions are known
y The theory behind statistics.
y Principle of inferential statistics:
y Only have a sample.
y p
y Use this to infer details about the population.
Statistics Introduction
Statistics Introduction
g
Definition according to ISO 5725‐1:1994
y Accuracy.
y Trueness.
y Bias.
y Laboratory bias (total bias).
y Bias of measurement method.
y Laboratory component o bias.
y P i i
Precision
y Repeatability (conditions & limit).
y Reproducibility (conditions & limit)
Reproducibility (conditions & limit).
Statistics Introduction
Statistics Introduction
Accuracy
Trueness Precision
Bias Repeatability
p y Reproducibility
p y
conditions r conditions R
Within lab
Withi l b within lab
ithi l b
variation +
Between lab
variation
Statistics Introduction
Statistics Introduction
Accuracy:y
y The closeness of agreement between a test result and
the accepted reference value.
Trueness:
y The closeness of agreement between the average value
obtained from large series of test result and an
accepted reference value.
Bias:
y The difference between the expectation of the test
result and an accepted reference value. (sys. error)
St ti ti I t d ti
Statistics Introduction
abo ato y as ( ota
Laboratory Bias (Total Bias):as):
y The difference between the expectation of the test
results from a particular lab and an accepted reference
value.
l
Bias of the measurement method:
y The deference between the expectation of test result
Th d f b t th t ti f t t lt
obtained from all laboratories using that method and
an accepted reference value.
p
Laboratory component of bias:
y The difference between the lab bias and the bias of the
measurement method.
Statistics Introduction
Statistics Introduction
St ti ti I t d ti
Statistics Introduction
ec s o :
Precision:
y The closeness of agreement between independent test
results obtained under stipulated conditions. (r. error)
Statistics Introduction
Statistics Introduction
Repeatability:y
y Precision under repeatability conditions.
y Conditions where independent test results are
obtained with the same operator on identical test
items in the same lab by the same operator using the
same equipment within short intervals of time.
q p
Reproducibility:
y Precision under reproducibility conditions.
p y
y Condition where test results are obtained with the
same method on identical test items in different labs
with different operators using different equipment.
i h diff i diff i
Statistics Introduction
Statistics Introduction
E l bl
Example problems
y Example problem I:
y A reference material known to contain 1.00% by weight
of a particular component is studied by four analysts
(A–D)
(A D), each analyst performing 5 replicate
each analyst performing 5 replicate
measurements. The results are as follows:
A 1.03 1.05 1.03 1.07 1.07
B 0.98 0.88 1.09 1.06 0.94
C 1.06 1.11 1.13 1.04 1.26
D 0 98
0.98 1 03
1.03 1 02
1.02 0 99
0.99 1 03
1.03
Comment on the random and systematic errors in the
results A‐D
results A D
S l ti
Solution
y Inter the data in the MINITAB
Bias:
y From mean menu select Stat > Quality Tools > Gage
Study > Gage Linearity and Bias Study.
> Gage Linearity and Bias Study
y From the appeared window choose:
y Part Numbers: from variables A, B, C or D.
y Reference Value: from the reference column.
y Measurement Data: the same chooses from Part Number
y Option: to select the methods of estimating repeatability
O ti t l t th th d f ti ti t bilit
standard deviation from:
y Sample Range.
y Sample Standard Deviation.
S l ti
Solution
Precision:
y From mean menu select Stat > Basic Statistics >
Display Descriptive Statistics.
y From the appeared window choose:
F h d i d h
y Variables: all variables A, B, C and D.
y Statistics: to select the parameter need to be calculated:
y Mean.
y Standard Deviation. (Precision)
y Minimum.
Minimum
y Maximum.
y Range.
y Graphs: you can select which graph describe the data.
Graphs: you can select which graph describe the data
E i
Exercises
y Exercise I:
y A standard sample of pooled human blood serum
contains 42.0g of albumin per litter.
y Five laboratories (A‐E) each do six determinations of the
Five laboratories (A E) each do six determinations of the
albumin concentrations (on the same day), with the
following results (g/l):
A 42.5 41.6 42.1 41.9 41.1 42.2
B 39.8 43.6 42.1 40.1 43.9 41.9
C 43.5 42.8 43.8 43.1 42.7 43.3
D 35.0 43.0 37.1 40.5 36.8 42.2
E 42.2 41.6 42.0 41.8 42.6 39.0
y Comment on the random and systematic errors in the
C h d d i i h
results A‐D
F d t l f St ti ti
Fundamentals of Statistics
y Measures of Central Tendency:
easu es o Ce t a e de cy:
y Average (M):
y Median (Md):
which is simply the middle value of the sample when the
measurements are arranged in numerical order. (if n is
even, then the median is the average of the two middle
values of the order sample).
p )
y Mode:
which is defined as the most frequent value in a
frequency distribution.
F d t l f St ti ti
Fundamentals of Statistics
y Measures of Dispersion:
easu es o spe s o :
y Range (R):
y Standard Deviation (S):
y Coefficient of Variation (CV):
Fundamentals of Statistics
Fundamentals of Statistics
y Measures of Dispersion:
y Variance (S2):
y Pooled Standard Deviation (Sp):
E l bl
Example problems
y Example problem II:
y Ten measurements of the ratio of two peak areas in LC
experiment gave the following values:
Calculate the average, standard deviation and relative
standard deviation.
S l ti
Solution
y From mean menu select Stat > Basic Statistics >
Display Descriptive Statistics.
l
y From the appeared window choose:
y Variables: ratio column.
V i bl i l
y Statistics: to select the parameter need to be calculated:
y Mean.
y Standard Deviation.
y Variance.
y Coefficient of variation.
y N total.
y Graphs: you can select which graph describe the data.
E i
Exercises
y Exercise II:
y To investigate the reproducibility of a method for the
determination of selenium in food, nine measurements
were made on a single batch of brown rice with the
listed below results:
No. 1 2 3 4 5 6 7 8 9
Se 0.07 0.07 0.08 0.07 0.07 0.08 0.08 0.09 0.08
(ug/g)
Calculate the average, median, mode, range, SD, and RSD
E i Cl i l A l i
Errors in Classical Analysis
y Distribution of Errors:
st but o o o s:
y Continuous random variable: values from interval
numbers, absence of gaps.
y Continuous probability distribution: distribution of
continuous random variable.
y Most important continuous probability distribution: the
normal distribution.
f(X)
Normal Distribution: Bell shaped
p
mean, median and mode are equal
X
random variable has infinity range
g μ
Mean Median Mode
Normal Distribution
Normal Distribution
1
1 − ( − μ ) 2
f (X ) =
2
X
2σ
e
2πσ 2
shiftsthec
shifts thecurve
r ealongthea
alongtheaxis
is increasesthespreadandflattensthec
increases thespreadandflattensthecurve
re
1 =6
1 = 2=6
2=12
− + −2 +2 −3 +3
68% chance of falling 95% chance of falling 99.7% chance of falling
between − and + between − 2 and +2 between − 3 and +3
N l Di ib i
Normal Distribution
N l Di t ib ti
Normal Distribution
Probability is
the area under P (c ≤ X ≤ d ) = ?
the curve!
f(X))
f(
X
c d
Normal Distribution
Normal Distribution
Can the analyst draw any conclusions from this limited
amount of data?
S l ti
Solution
Part I:
y From mean menu select Stat > Basic Statistics >
Display Descriptive Statistics.
y From the appeared window choose:
F h d i d h
y Variables: Determination column.
y Statistics: to select the parameter need to be calculated:
y Mean.
y Standard Deviation.
y Variance.
Variance
y Coefficient of variation.
y N total.
y Graphs: you can select which graph describe the data.
Graphs: you can select which graph describe the data
E l bl
Example problems
y Example problem III:
Part 2:
y In addition to the six determination we have just
considered, the analytical chemist has many
id d th l ti l h i t h
determinations of the Dimethoate content of his
standard solution by his old method, this particular
solution has been used for QC purposes over a period of
weeks.
Q 1: Estimate the % determinations which would be
greater than 65.0.
Q 2: Estimate which value is likely to be exceeded by the
highest 10% determinations.
E l bl
Example problems
y Example problem III:
Part 2:
y The sixty most recent determinations are:
61.0 65.4 60.0 59.2 57.0 62.5 57.7 56.2 62.9 62.5
56.5 60.2 58.2 56.5 64.7 54.5 60.5 59.5 61.6 60.8
58.7 54.4 62.2 59.0 60.3 60.8 59.5 60.0 61.8 63.8
64.5 66.3 61.1 59.7 57.4 61.2 60.9 58.2 63.0 59.5
56.0 59.4 60.2 62.9 60.5 60.8 61.5 58.5 58.9 60.5
61.2 57.8 63.4 58.9 61.5 62.3 59.8 61.7 64.0 62.7
S l ti
Solution
Part II:
y From mean menu select Stat > Tables > Tally
Individual Variables.
y From the appeared window choose:
F h d i d h
y Variables: R. Dete. column.
y Display: to select the parameter need to be calculated:
y Counts.
y Percents.
Normal Distribution
Normal Distribution
X −μ 6.2 − 5
Z= = = 0.12
σ 10
Normal Distribution Standardized
Normal Distribution
σ = 10
σZ =1
6.2 X 0.12 Z
μ =5 μZ = 0
C fid I t l
Confidence Intervals
y The problem:
y How large are the error bounds when we use data from a
sample to estimate parameter of the underlying
population.
population
y Compute confidence intervals for µ.
when σ2 is known.
when σ2 is unknown.
Suppose an estimate for the mean ( ) is given, and we
want to describe the precision of the estimate.
We do this by giving a range of likely values for the
parameter. Such a range is called Confidence Interval.
t S h i ll d C fid I t l
El t f CI E ti ti
Elements of CI Estimation
A probability that the population parameter falls
somewhere within the Interval.
h h h l
Sample
Confidence Interval
Statistic
X − μ = Error = μ − X
X − μ Error
Z = =
σ X σ X
Error = Z σ x
μ = X ± Zσ X
C fid I t l
Confidence Intervals
σ
X ± Z •σ X = X ± Z •
n
σx_
_
X
μ −1.645σ x μ +1.645σ x
90% Samples
μ −1.96σ x μ +1.96σ x
95% Samples
95 p
μ − 2.58σ x μ + 2.58σ x
99% Samples
F t ff t I t l idth
Factor affect Interval width
y Data Variation
D V i i
y measured by σ Intervals Extend from
y Sample Size X ‐ Zσ
Z to X Z σ
X + Z
x x
σX = σX / n
y Level of Confidence
(1 ‐ α)
C fid I t l
Confidence Intervals
Concluding Remark:
As smaller we choose α
ll h as more confident
fd
we get that the interval contains the
parameter µ But at the same time the
parameter µ. But at the same time the
confidence interval gets wider and is
therefore less precise.
CI ( known –
CI (σ k H dl T )
Hardly True)
y Assumptions:
y Population standard deviation is known.
y Population is normally distributed.
p y
y If not normal, use large samples.
y Confidence interval estimate:
CI (σ unknown)
CI (σ
y Assumptions:
y Population standard deviation is unknown.
y Sample size must be large enough for central limit
p g g
theorem or population must be normally distributed.
y Use Student’s t distribution.
y Confidence interval estimate:
E l bl
Example problems
y Example problem IV:
y The DDT content of a fish samples in ppb was
determined using GC‐ECD.
y The following values were obtained:
What are the 95% and 99% confidence limits for the DDT
concentration?
S l ti
Solution
Standardized normal distribution:
y From mean menu select Stat > Basic Statistics > 1‐
sample Z (test and confidence interval).
y From the appeared window choose:
F h d i d h
y Samples in columns: data column.
y Standard deviation: but the value of SD previous
calculated.
y Options: to choose the confidence level.
p
y Graphs: you can select which graph describe the data.
S l ti
Solution
T distribution:
y From mean menu select Stat > Basic Statistics > 1‐
sample T (test and confidence interval).
y From the appeared window choose:
F h d i d h
y Samples in columns: data column.
y Options: to choose the confidence level.
Options: to choose the confidence level
y Graphs: you can select which graph describe the data.
E l bl
Example problems
y Example problem V:
y The absorbance scale of a spectrometer is tested at a
particular wavelength with a standard solution which
has an absorbance given as 0 47
has an absorbance given as 0.47.
y Ten measurements of the absorbance with the
spectrometer give average absorbance 0.461 and
standard deviation 0.003.
Find the 95% confidence interval for the mean absorbance
as measured by the spectrometer and hence decide
whether a systematic error is present.
S l ti
Solution
y From mean menu select Stat > Basic Statistics > 1‐
sample Z (test and confidence interval).
l ( d d l)
y From the appeared window choose:
y Samples in columns: data column.
S l i l d l
y Standard deviation: but the value of SD previous
calculated.
y Options: to choose the confidence level.
y Graphs: you can select which graph describe the data.
E i
Exercises
y Exercise III:
y A 0.1M solution of acid was used to titrate 10ml of 0.1M
solution of alkali and the following volumes of acid were
record:
Calculate the 95% confidence limits of the mean and use
them to decide if there is any evidence of systematic
error.
error
St d t’ t Test
Student’s t T t
y When solving probability problems for the sample
e so g p obab ty p ob e s o t e sa p e
mean, one of the steps was to convert the sample mean
values to z‐scores using the following formula:
where and
y What happens if we do not know the population
standard deviation σ?, if we substitute the population
standard deviation σ with the sample standard
deviation s can we use the standard normal table?
Answer: No
Student’ss tt Test
Student
y This question was addressed in 1908 when W.S. Gosset
s quest o as add essed 908 e .S. Gosset
found that if we replace σ with the sample standard
deviation s the distribution becomes a t‐distribution.
If:
If
then T has a t–distribution
has a t distribution with n‐1 degrees of
with n 1 degrees of
freedom. the t‐distribution is similar to z‐curve in that
it is bell shaped, but the shape of the t‐distribution
changes with the degrees of freedom. We will use the
T‐tables to get the critical t‐values at different levels of
α and degrees of freedom.
and degrees of freedom
Student’s t Test
y One sample t‐test:
y When using t‐test TAKE CARE !!! Is it :
one‐tailed
il d or two‐tailed
il d
Student’s t Test
y Independent sample t‐test (equal variances):
y Degrees of freedom = (n
D f f d ( 1 + n
2) ‐
) 2
Student’s t Test
y Independent sample t‐test (unequal variances):
Student’s t Test
y Independent sample t‐test (unequal variances)
Independent sample t test (unequal ariances)
y In such complicated case degrees of freedom is
calculated from:
St d t’ t Test
Student’s t T t
y Paired sample t‐test:
y The sign of the difference is very important.
E l bl
Example problems
y Example problem VI:
y In a method for determination of mercury by cold‐vapor
atomic absorption the following values were obtained
for a standard reference material containing 38 9%
for a standard reference material containing 38.9%
mercury:
38 9%
38.9% 37 4%
37.4% 37 1%
37.1%
Is there any evidence of systematic error?
S l ti
Solution
y From mean menu select Stat > Basic Statistics >
Graphical Summary.
h l
y From the appeared window choose:
y Variables: data column.
V i bl d l
y Confidence level: to choose the confidence level.
y From mean menu select Stat > Basic Statistics > 1‐
> 1
sample T (test and confidence interval).
y From the appeared window choose:
y Samples in columns: data column.
y Options: to choose the confidence level.
y Perform hypothesis test: record the value of Mean
E l bl
Example problems
y Example problem VII:
y In a comparison of LC and GC methods for the
determination of Primicarb in vegetables the following
results (ug/g) were obtained:
Mean SD
HPLC method 28.00 0.30
GC‐NPD method 26.25 0.23
For each method 10 determinations where made.
Assuming that the two samples have standard deviation
which are not significantly different, could you decide
that both method give results having means which
differ significantly?
S l ti
Solution
y From mean menu select Stat > Basic Statistics > 2‐
sample T (test and confidence interval).
l ( d d l)
y From the appeared window choose:
y Summarized Data:
S i d D
y First:
y Sample size, Mean and Standard Deviation.
p ,
y Second:
y Sample size, Mean and Standard Deviation.
y Assume equal variances.
A l i
y Options: to choose the confidence level.
y Graphs: you can select which graph describe the data.
p y g p
E l bl
Example problems
y Example problem VIII:
y The pesticide levels (ppm) in four thoroughly
homogenized apple samples are determined once by
each (a) a chromatographic method and (b) an
immunoassay method. The results are:
Apple Pesticide Level
Chrom. Immuno.
1 7.1 7.6
2 6.1 6.8
3 5.0 4.8
4 6.0 5.7
Is there any evidence that the two analytical methods give
significantly different results?
S l ti
Solution
y From mean menu select Stat > Basic Statistics > Paired
T (test and confidence interval).
( d d l)
y From the appeared window choose:
y Samples in Columns:
S l i C l
y First Sample:
y The column of variables need to paired evaluates.
p
y Second Sample:
y The column of variables need to paired evaluates.
y Options: to choose the confidence level.
O ti t h th fid l l
y Graphs: you can select which graph describe the data.
E l bl
Example problems
y Example problem IX:
y It is suspected that an acid‐base titrimetric method has
a significant indicator error and thus tends to give
results with a positive systematic error (i e positive bias)
results with a positive systematic error (i.e. positive bias)
y To test this an exactly 0.1M solution of acid is used to
titrate 25.0ml of an exactly 0.1M solution of alkali with
the following results (ml):
( )
Test for positive bias in these results.
S l ti
Solution
Using 1‐Sample Z:
y From mean menu select Stat > Basic Statistics > 1‐
sample Z (test and confidence interval).
y From the appeared window choose:
F h d i d h
y Samples in columns: data column.
y Standard deviation: but the value of SD previous
calculated.
y Options: to choose the confidence level and data one tail
p
or two tail.
y Graphs: you can select which graph describe the data.
S l ti
Solution
Using 1‐Sample T:
y From mean menu select Stat > Basic Statistics > 1‐
sample T (test and confidence interval).
y From the appeared window choose:
F h d i d h
y Samples in columns: data column.
y Options: to choose the confidence level and data one tail
or two tail.
y Perform hypothesis test: record the value of Mean.
S l ti
Solution
Using Gage Linearity and Bias Study:
y From mean menu select Stat > Quality Tools > Gage
Study > Gage Linearity and Bias Study.
y From the appeared window choose:
F h d i d h
y Part Numbers: from variables A, B, C or D.
y Reference Value: from the reference column.
Reference Value: from the reference column
y Measurement Data: the same chooses from Part Number
y Option: to select the methods of estimating repeatability
standard deviation from:
y Sample Range.
y S
Sample Standard Deviation.
l S d d D i i
T t
f ‐ Test
y f – Test is used to compare the standard deviation of
two samples and to make a test to determine whether
the population from which they come have equal
variances.
E l bl
Example problems
y Example problem X:
y A proposed method for the determination of TCDD in
water was compared with the standard method. The
following results were obtained for a river water sample:
Mean (ug/l) SD (ug/l)
Standard method 72 0
72.0 3 31
3.31
Proposed method 72.0 1.51
For each method 8 determinations were mad. Is the
precision of the proposed method is significantly greater
than that of the standard method?
S l ti
Solution
y From mean menu select Stat > Basic Statistics > 2‐
Variances.
y From the appeared window choose:
y Summarized Data:
S i d D
y First:
y Sample size and Variance.
p
y Second:
y Sample size and Variance.
y Storage: the data could be calculated and recorded in
St th d t ld b l l t d d d d i
MINITAB worksheet.
y Options: to choose the confidence level.
p
H th i T ti
Hypothesis Testing
y A hypothesis is a claim (assumption) about the
population parameter.
l
y Examples of parameters are
population mean or
l ti I claim the mean GPA of
proportion. this class is μ = 3.5!
y The parameter must be
identified before analysis.
H th i T ti
Hypothesis Testing
Assume the
A th
population
g
mean age is 50. 5
H 0 : μ = 50
( ) Identify the Population
Is X = 20 likely if μ = 50 ? Take a Sample
N nott lik
No, likely!
l !
REJECT
Null Hypothesis ( X = 20 )
H th i T ti
Hypothesis Testing
Sampling Distribution of X
It is unlikely that ... Therefore,
we would ld gett a we reject
j t th
the
sample mean of null hypothesis
this value ... that m = 50.
50
... if in fact this were
th population
the l ti mean.
20 μ = 50 X
If H0 is true
H th i T ti
Hypothesis Testing
H0: Innocent
Jury Trial Hypothesis Test
The Truth The Truth
Verdict Innocent Guilty Decision H0 True H0 False
Do Not Type II
Innocent Correct Error Reject 1 ‐ α
Error (β )
H0
Type I
T I Power
Error Correct Reject
Guilty Error
H0 (1 ‐ β )
(α )
H th i T ti
Hypothesis Testing
If you reduce the probability of one
error, the other one increases so that
everything else is unchanged.
α
H th i T ti
Hypothesis Testing
y True value of population parameter
y β Increases when the difference between hypothesized
parameter and its true value decrease.
y Significance level β
y β increases when α decreases
α
y Population standard deviation β σ
y β increases when σ
c eases e σ increases
c eases
y Sample size β
y β increases when n decreases
n
E i
Exercises
y Exercise IV:
y A new operator is given a sample containing a known
concentration 50ug/kg of profenofos and instructed to
make eight determination.
make eight determination
y He obtained the following results:
49.4 49.8 50.8 49.3 51.3 50.0 50.8 51.8
a) Is there evidence that operator is biased?
b)) C
Calculate a 95%
% confidence interval and hence state
the maximum possible bias for operator.
c) What sample size is needed to estimate operator bias
to within ±
i hi 0.3ug/kg.
0 3 /k
A l i fV i (ANOVA)
Analysis of Variance (ANOVA)
y The statistical tests described previously are used in
the comparison of two sets of data, or to compare a
h f fd
single sample of measurements with a standard or
reference value frequently.
reference value frequently
y However, it is necessary to compare three or more sets
of data and in that case we can make a use of a very
powerful statistical method with a great range of
applications.
l
A l i f V i
Analysis of Variance (ANOVA)
(ANOVA)
A l i fV i (ANOVA)
Analysis of Variance (ANOVA)
y If there is only one source of variation apart from this
measurement area, a one‐way ANOVA calculation is
appropriate.
appropriate
y If there are two sources of variation we use two‐way
If there are two sources of variation we use two way
ANOVA calculation.
y And so on…..
E l bl
Example problems
y Example problem XI:
y A sample of fruit is analyzed for its pesticide content by
a liquid chromatographic procedure, but four different
extraction procedures A – D are used, the concentration
extraction procedures A D are used the concentration
in each case being measured three times.
y The results are indicated in the shown table:
Results A B C D
1 10.5 9.9 9.9 9.2
2 11 5
11.5 10 8
10.8 91
9.1 85
8.5
3 10.7 10.8 8.9 9.0
y Is there any evidence that the four different sample
preparation methods yield different results?
S l ti
Solution
y From mean menu select Stat > ANOVA > One – Way
Analysis of Variance (unstacked).
l ( k d)
y From the appeared window choose:
y Responses (in separate column):
R (i l )
y All variables A, B, C and D.
y Confidence level.
24 27 11 9
Is there any evidence that the workers differ in their
reliability?
Solution:
Ho: the same number of breakages by each worker.
Ha: different number of breakages by each workers.
S l ti
Solution
y From mean menu select Stat > Tables > Chi‐Square
goodness –
d of – Fit test (one variable).
( bl )
y From the appeared window choose:
y Responses (in separate column):
R (i l )
y All variables A, B, C and D.
y Confidence level.
y This however is not only possible type of error it is also
possible to retain a null hypothesis even when it is false.
This is called type 2 error In order to calculate the
This is called type 2 error. In order to calculate the
probability of type 2 error it is necessary to postulate an
alternative to the null hypothesis known as alternative
h
hypothesis.
h i
C l i f Si ifi T t
Conclusion from Significance Test
y Consider the situation where a certain chemical
product is meant to contain 3% of phosphorus by
d f h h b
weight.
y It is suspected that this proportion has increased.
I i d h hi i h i d
y To test such increase the composition is analyzed by a
standard method with known standard deviation of
t d d th d ith k t d d d i ti f
0.036%.
y Suppose 4 measurements are made and a significance
test is performed at level of P = 0.05. a one‐tailed test is
required, as we are interested only in an increase.
q , y
C l i f Si ifi T t
Conclusion from Significance Test
Ho, µ = 3.0%
y The solid line shows the
h ldl h h
sampling distribution of the
mean if null hypothesis is true.
yp
This sampling distribution has
mean 3.0 and Standard
deviation = 0 036/√4 %. If the
deviation = 0.036/√4 % If the
sample mean lies above the
indicated critical value Xc the
null hypothesis is rejected,
ll h th i i j t d
thus the shaded region with
area 0.05 represent the
probability of a type 1 error
C l i f Si ifi T t
Conclusion from Significance Test
y The only way in which both
errors can be reduced is by
b d d b
increasing the sample size. The
effect of increasing n to 9 for
example, is illustrated in the
shown figure the resultant
decrease in the standard error
of the mean produces a
decrease in both types of error
for a given value of Xc.
C l i f Si ifi T t
Conclusion from Significance Test
y the probability that a false null hypothesis is rejected
is known as the power of the test. That is the power of
k h f h h h f
a test is (1‐ the probability of a type 2 error).
y In the studied example the power is a function of the
mean specified in the alternative hypothesis and
depends on the sample size the significance level of
depends on the sample size, the significance level of
the test and whether the test is one‐ or two‐ tailed.
y If two or more test are available to test the same
hypothesis, it may be useful to compare the powers of
the tests to decide which is more appropriate.
the tests to decide which is more appropriate
E l bl
Example problems
y Example problem XIII:
y The nitrate level (mg/l) in a sample of river water was
measured four times, with the following results:
Can the last value be rejected as an outlier?
S l ti
Solution
y From mean menu select Stat > Basic Statistics > Display
p y
Descriptive Statistics.
y From the appeared window choose:
y Variables: Determination column.
Variables: Determination column
y Statistics: to select the parameter need to be calculated:
y Graphs: you can select which graph describe the data.
y From mean menu select Stat
F l t St t > Basic Statistics
B i St ti ti > 1‐sample T
l T
(test and confidence interval).
y From the appeared window choose:
y Samples in columns: data column.
y Options: to choose the confidence level.
y Perform hypothesis test: record the value of Mean
f yp
THANK YOU