Professional Documents
Culture Documents
AGENDA
Day 1 Agenda
Correlation Analysis
Day 2 Agenda
Process Capability
Multi-Vari Analysis
Confidence Intervals
Control Charts
Hypothesis Testing
Contingency Tables
Day 4 Agenda
Design of Experiments
Grading
Class Preparation 30%
Team Classroom Exercises 30%
Team Presentations 40%
10 Minute Daily Presentation (Day 2 and 3) on Application of previous days work
20 minute final Practicum application (Last day)
Copy on Floppy as well as hard copy
Powerpoint preferred
Rotate Presenters
Q&A from the class
INTRODUCTION TO SIX
SIGMA APPLICATIONS
Learning Objectives
A Philosophy
Customer Critical To Quality (CTQ) Criteria
Breakthrough Improvements
Fact-driven, Measurement-based, Statistically Analyzed
Prioritization
Controlling the Input & Process Variations Yields a Predictable
Product
A Quality Level
6s = 3.4 Defects per Million Opportunities
A Structured Problem-Solving Approach
Phased Project: Measure, Analyze, Improve, Control
A Program
Dedicated, Trained BlackBelts
Prioritized Projects
Teams - Process Participants & Owners
POSITIONING SIX SIGMA
THE FRUIT OF SIX SIGMA
Sweet Fruit
Design for Manufacturability
Process Entitlement
Bulk of Fruit
Process Characterization
and Optimization
Ground Fruit
Logic and Intuition
UNLOCKING THE HIDDEN
FACTORY
VALUE
STREAM TO WASTE DUE TO
THE INCAPABLE
CUSTOMER PROCESSES
The necessity of* training f*arm hands f*or f*irst class f*arms in
the f*atherly handling of* f*arm livestock is f*oremost in the
minds of* f*arm owners. Since the f*oref*athers of* the f*arm
owners trained the f*arm hands f*or f*irst class f*arms in the
f*atherly handling of* f*arm livestock, the f*arm owners f*eel
they should carry on with the f*amily tradition of* training f*arm
hands of* f*irst class f*arms in the f*atherly handling of* f*arm
livestock because they believe it is the basis of* good
f*undamental f*arm management.
How many f’s can you identify in 1 minute of inspection….36 total are available.
SIX SIGMA COMPARISON
Objective
•Define the problem and
Phase 1: verify the primary and
Measurement secondary measurement
systems.
Characterization
•Identify the few factors
Phase 2: which are directly
Analysis influencing the problem.
Breakthrough •Determine values for the few
Strategy contributing factors which
Phase 3: resolve the problem.
Improvement •Determine long term control
measures which will ensure
Optimization
that the contributing factors
remain controlled.
Phase 4:
Control
Measurements are critical...
LSL T USL
HIRING NEEDS
BEADS ARE OUR BUSINESS
PRODUCTION SUPERVISOR
4 PRODUCTION WORKERS
2 INSPECTORS
1 INSPECTION SUPERVISOR
1 TALLY KEEPER
STANDING ORDERS
Your defect quota is no more than 5 off color beads allowed per paddle.
WHITE BEAD MANUFACTURING PROCESS PROCEDURES
The operator will take the bead paddle in the right hand.
Insert the bead paddle at a 45 degree angle into the bead bowl.
Agitate the bead paddle gently in the bead bowl until all spaces are filled.
Gently withdraw the bead paddle from the bowl at a 45 degree angle and allow the free beads to run off.
Without touching the beads, show the paddle to inspector #1 and wait until the off color beads are tallied.
Move to inspector #2 and wait until the off color beads are tallied.
Inspector #1 and #2 show their tallies to the inspection supervisor. If they agree, the inspection supervisor announces
the count and the tally keeper will record the result. If they do not agree, the inspection supervisor will direct the
inspectors to recount the paddle.
When the count is complete, the operator will return all the beads to the bowl and hand the paddle to the next operator.
INCENTIVE PROGRAM
An understanding of
Probability Distributions is
necessary to:
•Understand the concept and
use of statistical tools.
•Understand the significance of
random variation in everyday
measures.
•Understand the impact of
significance on the successful
resolution of a project.
IMPROVEMENT ROADMAP
Uses of Probability Distributions
Project Uses
Phase 1: •Establish baseline data
Measurement
characteristics.
Characterization
Phase 4:
•Use the concept of shift &
Control drift to establish project
expectations.
KEYS TO SUCCESS
C umulative Frequency
500
Cumulative Percent
Frequency
400
Cumulative Frequency
300 5000
200
100
0 0
70 80 90 100 110 120 130 70 80 90 100 110 120 130
"Head Count" Results from 10,000 people doing a coin toss 200 times.
Cumulative count is simply the total frequency Cumulative Percent
count accumulated as you move from left to 100
400
300
For example, we can see that the
200
occurrence of a “Head count” of less than
100
74 or greater than 124 out of 200 tosses
0
70 80 90 100 110 120 130
is so rare that a single occurrence was
"Head Count"
Results from 10,000 people doing a coin toss 200 times. not registered out of 10,000 tries.
Cumulative Percent
On the other hand, we can see that the
100
chance of getting a count near (or at) 100
Cumulative Percent
0
70 80 90 100 110 120 130
"Head Count"
COIN TOSS PROBABILITY DISTRIBUTION
PROCESS CENTERED
ON EXPECTED VALUE
% of population = probability of occurrence
600
CUM % OF POPULATION .003 .135 2.275 15.87 50.0 84.1 97.7 99.86 99.997
WHAT DOES IT MEAN?
Common Occurrence
Rare Event
Since the sample does not contain all the possible values,
there is some uncertainty about the population. Hence any
statistics, such as mean and standard deviation, are just
estimates of the true population parameters.
Descriptive Statistics
Inferential Statistics
Inferential Statistics
400
fact, there is a
probability greater 300
than 99.999% that
200
it has changed.
s
100
0
70 80 90 100 110 120 130
NUMBER OF HEADS 58 65 72 79 86 93 100 107 114 121 128 135 142
CUM % OF POPULATION .003 .135 2.275 15.87 50.0 84.1 97.7 99.86 99.997
USES OF PROBABILITY
DISTRIBUTIONS
Primarily these distributions are used to test for significant differences in data sets.
To be classified as significant, the actual measured value must exceed a critical
value. The critical value is tabular value determined by the probability distribution
and the risk of error. This risk of error is called a risk and indicates the probability
of this value occurring naturally. So, an a risk of .05 (5%) means that this critical
value will be exceeded by a random occurrence less than 5% of the time.
Critical Critical
Value Value
DISPERSION
How wide a population is spread.
DISTRIBUTION FUNCTION
The mathematical formula that
best describes the data (we will
cover this in detail in the next
module).
COIN TOSS CENTRAL TENDENCY
Number of occurrences
600
500
400
300
200
100
0
70 80 90 100 110 1 20 130
What are some of the ways that we can easily indicate
the centering characteristic of the population?
mean = x = xi
=
-2
= -.17 -1
n 12 0
0
n=12 0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Mean x i = -2
WHAT IS THE MEDIAN?
If we rank order (descending or ascending) the data set for
this distribution we could represent central tendency by the ORDERED DATA SET
order of the data points. -5
If we find the value half way (50%) through the data points, we -3
have another way of representing central tendency. This is -1 50% of data
points
called the median value. -1
0
0
Median
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Median
Value
WHAT IS THE MODE?
If we rank order (descending or ascending) the data set for
this distribution we find several ways we can represent central ORDERED DATA SET
tendency. -5
We find that a single value occurs more often than any other. -3
Since we know that there is a higher chance of this -1
occurrence in the middle of the distribution, we can use this -1
feature as an indicator of central tendency. This is called the 0
mode.
0
Mode Mode
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
MEASURES OF CENTRAL TENDENCY, SUMMARY
ORDERED DATA SET
-5
-3
-1
MEAN ( X)
-1
0 (Otherwise known as the average)
0
- 2
0
X
X = = = .17
0
0
i
1
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
3
4
n 12
ORDERED DATA SET
-5
-3
MEDIAN
-1
-1 n/2=6
0 (50 percentile data point)
0
n=12
0 Median
0 Here the median value falls between two zero
0
1
n/2=6 values and therefore is zero. If the values were
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
ORDERED DATA SET
say 2 and 3 instead, the median would be 2.5.
Mode = 0
-5
-3
-1
MODE
-1
0 (Most common value in the data set)
}
0
0
Mode = 0
0
0
The mode in this case is 0 with 5 occurrences
1
3
within this data.
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
SO WHAT’S THE REAL DIFFERENCE?
MEAN
The mean is the most
consistently accurate measure of
central tendency, but is more
difficult to calculate than the
other measures.
600
500
400
300
200
100
0
70 80 90 100 110 1 20 130
What are some of the ways that we can easily indicate the dispersion
(spread) characteristic of the population?
Three measures have historically been used; the range, the standard
deviation and the variance.
WHAT IS THE RANGE?
ORDERED DATA SET
The range is a very common metric which is easily
-5
determined from any ordered sample. To calculate the range
simply subtract the minimum value in the sample from the -3
maximum value. -1
-1
Range = x MAX - x MIN = 4 - ( -5) = 9 0
0
0 Range
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Range
Min Max
WHAT IS THE VARIANCE/STANDARD DEVIATION?
The variance (s2) is a very robust metric which requires a fair amount of work to
determine. The standard deviation(s) is the square root of the variance and is the
most commonly used measure of dispersion for larger sample sizes.
Xi - X ( Xi - X ) 2
DATA SET
Xi -2 -5
X = = = -.17 -3
-5-(-.17)=-4.83 (-4.83)2=23.32
n 12 -3-(-.17)=-2.83 (-2.83)2=8.01
-1 -1-(-.17)=-.83 (-.83)2=.69
( - X)
2 -1
X 61.67 0
-1-(-.17)=-.83 (-.83)2=.69
= = = 5.6
2 i
s 0-(-.17)=.17 (.17)2=.03
n -1 12 - 1 0
0 0-(-.17)=.17 (.17)2=.03
0 0-(-.17)=.17 (.17)2=.03
0 0-(-.17)=.17 (.17)2=.03
1 0-(-.17)=.17 (.17)2=.03
3
1-(-.17)=1.17 (1.17)2=1.37
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
3-(-.17)=3.17 (3.17)2=10.05
4-(-.17)=4.17 (4.17)2=17.39
61.67
MEASURES OF DISPERSION
ORDERED DATA SET
-5
-3
Min=-5 RANGE (R)
-1
-1
0 (The maximum data value minus the minimum)
R = X max - X min = 4 - ( - 6) = 10
0
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 Max=4
DATA SET X i - X (X - X) 2
VARIANCE (s2)
i
X =
Xi
=
-2
= -.17
-5 -5-(-.17)=-4.83
-3 -3-(-.17)=-2.83
(-4.83)2=23.32
(-2.83)2=8.01
n 12 -1 -1-(-.17)=-.83 (-.83)2=.69
-1
-1-(-.17)=-.83 (-.83)2=.69
(Squared deviations around the center point)
( - X)
0
0-(-.17)=.17 (.17)2=.03
0 2
0 0-(-.17)=.17 (.17)2=.03
X 61.67
= = = 5.6
0 0-(-.17)=.17 (.17)2=.03
2 i
s
n -1 12 - 1
0 0-(-.17)=.17 (.17)2=.03
1 0-(-.17)=.17 (.17)2=.03
3 1-(-.17)=1.17 (1.17)2=1.37
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 3-(-.17)=3.17 (3.17)2=10.05
ORDERED DATA SET
-5 4-(-.17)=4.17
-3
-1
(4.17)2=17.39
61.67 STANDARD DEVIATION (s)
-1
0
0
(Absolute deviation around the center point)
s = s 2 = 5.6 = 2.37
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
SAMPLE MEAN AND VARIANCE
EXAMPLE
( )
2
Xi Xi - X Xi - X
$ = X =
Xi 1
2
10
15
N 3 12
4 14
(X )
2 5 10
-X
s$ = s =
2 2 i 6
9
11
n -1 7
12
8
10
9 12
10
S Xi
X
s2
SO WHAT’S THE REAL
DIFFERENCE?
VARIANCE/ STANDARD DEVIATION
The standard deviation is the most
consistently accurate measure of
central tendency for a single
population. The variance has the
added benefit of being additive over
multiple populations. Both are difficult
and time consuming to calculate.
RANGE
The range is very easy to determine.
That’s the good news….The bad news
is that it is very susceptible to bias.
SO WHAT’S THE BOTTOM LINE?
VARIANCE/ STANDARD
DEVIATION
Best used when you have
enough samples (>10).
RANGE
Good for small samples (10 or
less).
SO WHAT IS THIS SHIFT & DRIFT STUFF...
LSL USL
-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12
Sources of
Variation
Common Uses
Phase 1:
Measurement
•Baselining Processes
Characterization
Phase 2:
Analysis
Breakthrough
Strategy
Phase 3:
Improvement
X X X
Data points vary, but as the data accumulates, it forms a distribution which occurs naturally.
X X
4
Original Population Continuous Distribution
3
0
0 1 2 3 4 5 6 7
-1
-2
-3
-4
4
2
Subgroup Average Normal Distribution
1
0
0 1 2 3 4 5 6 7
-1
-2
c2 Distribution
-3
-4
4
Subgroup Variance (s2)
3
0
0 1 2 3 4 5 6 7
THE LANGUAGE OF
MATH
Symbol Name Statistic Meaning Common Uses
Mean x
Variance s2 s2
Standard Deviation s s
Process Capability Cp Cp
Binomial Mean P P
THREE PROBABILITY DISTRIBUTIONS
X -
t CALC = s Significant = t CALC t CRIT
n
s12
Fcalc = 2 Significant = FCALC FCRIT
s2
(f - fa )
2
ca ,df =
2 e
Significant = c 2
CALC
c 2
CRIT
fe
Note that in each case, a limit has been established to determine what is
random chance verses significant difference. This point is called the critical
value. If the calculated value exceeds this critical value, there is very low
probability (P<.05) that this is due to random chance.
Z TRANSFORM
-1s +1s -2s +2s
68.26% 95.46%
-3s +3s
99.73%
The Focus of Six Sigma…..
Characterize
Population
Means Add
Population means interact in a simple intuitive manner. 1 + 2 = new
new
snew
s1 s2
Variations Add
Population dispersions interact in an additive manner
s12 + s22 = snew2
HOW DO POPULATIONS INTERACT?
SUBTRACTING TWO POPULATIONS
1 2
Means Subtract
Population means interact in a simple intuitive manner. 1 - 2 = new
new
snew
s1 s2
Variations Add
Population dispersions interact in an additive manner
s12 + s22 = snew2
TRANSACTIONAL EXAMPLE
$7000 The new distribution looks like this with a mean of $7000 and a
standard deviation of $9434. This distribution represents the
$0 occurrences of shipments exceeding orders. To answer the original
Shipments > orders
question (shipments>orders) we look for $0 on this new distribution.
Any occurrence to the right of this point will represent shipments >
orders. So, we need to calculate the percent of the curve that exists
to the right of $0.
TRANSACTIONAL EXAMPLE,
CONTINUED
X shipments - orders = X shipments - X orders = $60,000 - $53,000 = $7,000
sshipments - orders = s 2
shipments +s 2
orders ( ) ( )
= 5000 + 8000 = $9434
2 2
0 - X $0 - $7000
= =.74 s
Shipments > orders
s $9434
Look up .74s in the normal table and you will find .77. Therefore, the answer to the original
question is that 77% of the time, shipments will exceed orders.
Correlation Analysis is
necessary to:
•show a relationship between
two variables. This also sets the
stage for potential cause and
effect.
IMPROVEMENT ROADMAP
Uses of Correlation Analysis
Common Uses
Phase 1:
Measurement
Characterization
•Determine and quantify
Phase 2:
Analysis
the relationship between
Breakthrough
factors (x) and output
Strategy characteristics (Y)..
Phase 3:
Improvement
Optimization
Phase 4:
Control
KEYS TO SUCCESS
Output or y
variable
(dependent)
Correlation
Y= f(x)
As the input variable changes,
there is an influence or bias
on the output variable.
Input or x variable (independent)
WHAT IS CORRELATION?
A measurable relationship between two variable data
characteristics.
Not necessarily Cause & Effect (Y=f(x))
The square of “r” is the percent of the response (Y) which is related
to the input (x).
TYPES OF CORRELATION
Positive
x x x
Negative
CALCULATING “r”
Coefficient of Linear Correlation
( - x )(yi - y )
•Calculate sample covariance
sxy = x i
s
( xy )
W
r 1 -
x
L
6.7
r -1 - = -.47
12.6
L + = positive slope
| | | | | | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
6.7 12.6 - = negative slope
HOW DO I KNOW WHEN I HAVE CORRELATION?
Ordered r CRIT • The answer should strike a familiar cord at this point…
Pairs We have confidence (95%) that we have correlation
5 .88 when |rCALC|> rCRIT.
6 .81 •Since sample size is a key determinate of rCRIT we need
7 .75 to use a table to determine the correct rCRIT given the
8 .71 number of ordered pairs which comprise the complete
9 .67 data set.
10 .63 •So, in the preceding example we had 60 ordered pairs
15 .51 of data and we computed a rCALC of -.47. Using the table
20 .44 at the left we determine that the rCRIT value for 60 is .26.
25 .40
•Comparing |rCALC|> rCRIT we get .47 > .26. Therefore the
30 .36 calculated value exceeds the minimum critical value
50 .28 required for significance.
80 .22
• Conclusion: We are 95% confident that the observed
100 .20
correlation is significant.
Learning Objectives
Common Uses
Phase 1: •The Central Limit
Measurement Theorem underlies all
statistic techniques which
Characterization
rely on normality as a
Phase 2: fundamental assumption
Analysis
Breakthrough
Strategy
Phase 3:
Improvement
Optimization
Phase 4:
Control
KEYS TO SUCCESS
Normal
Why do we Care?
What this means is that no matter what kind
of distribution we sample, if the sample size
is big enough, the distribution for the mean
is approximately normal.
This is the key link that allows us to use
much of the inferential statistics we have
been working with so far.
This is the reason that only a few probability
distributions (Z, t and c2) have such broad
application.
n=2
n=5
n=10
ANOTHER PRACTICAL ASPECT
sx
This formula is for the standard error of the mean.
Common Uses
Phase 1: •Baselining a process
Measurement primary metric (Y) prior to
starting a project.
Characterization
Out of Out of
(s)
Out of Out of
Spec Spec Spec Spec
HOW IS PROCESS CAPABILITY CALCULATED
Spec Spec
(LSL) (USL)
MIN ( - LSL,USL - )
population mean and
the nearest spec limit
(|-USL |). This CPK =
distance divided by 3s
3s is Cpk.
Note:
LSL = Lower Spec Limit
USL = Upper Spec Limit
PROCESS CAPABILITY
EXAMPLE
We want to calculate the process capability for our inventory. The
historical average monthly inventory is $250,000 with a standard
deviation of $20,000. Our inventory target is $200,000 maximum.
Calculation Values:
Upper Spec value = $200,000 maximum
No Lower Spec
= historical average = $250,000
s = $20,000
MIN ( - LSL, USL - ) ($200,000 - $250,000)
CPK = = =-.83
Calculation: 3s 3 *$20,000
C PK =
0.3 382,089 0.1 884,930
3s
0.4 344,578 0.1 864,334
0.5 308,538 0.2 841,345
0.6 274,253 0.2 815,940
0.7 241,964 0.2 788,145
0.8
0.9
211,855
184,060
0.3
0.3
758,036
725,747
We find that it bears a striking resemblance to the
1.0
1.1
158,655
135,666
0.3
0.4
691,462
655,422
equation for Z which is:
1.2 115,070 0.4 617,911
- 0 with the value -0
Z CALC =
1.3 96,801 0.4 579,260
1.4 80,757 0.5 539,828 substituted for MIN(-
1.5
1.6
1.7
66,807
54,799
44,565
0.5
0.5
0.6
500,000
460,172
420,740
s LSL,USL-).
1.8
1.9
35,930
28,716
0.6
0.6
382,089
344,578
Making this substitution, we get :
1 MIN ( - LSL, USL - ) Z MIN ( - LSL ,USL - )
2.0 22,750 0.7 308,538
C pk = * =
2.1 17,864 0.7 274,253
s
2.2 13,903 0.7 241,964
2.3
2.4
10,724
8,198
0.8
0.8
211,855
184,060
3 3
2.5 6,210 0.8 158,655
2.6 4,661 0.9 135,666
2.7 3,467 0.9 115,070 We can now use a table similar to the one on the
2.8 2,555 0.9 96,801
2.9 1,866 1.0 80,757 left to transform either Z or the associated PPM to
3.0 1,350 1.0 66,807
3.1 968 1.0 54,799 an equivalent Cpk value.
3.2 687 1.1 44,565
3.3 483 1.1 35,930
3.4
3.5
337
233
1.1
1.2
28,716
22,750
So, if we have a process which has a short term
3.6 159 1.2 17,864
3.7 108 1.2 13,903 PPM=136,666 we find that the equivalent Z=1.1 and
3.8 72.4 1.3 10,724
3.9
4.0
48.1
31.7
1.3
1.3
8,198
6,210
Cpk=0.4 from the table.
Learning Objectives
Common Uses
Phase 1:
Measurement
Characterization
Optimization
Phase 4:
Control
KEYS TO SUCCESS
old machinery
tooling humidity
supplier hardness
operator error temperature technique
fatigue wrong spec.
tool wear lubrication
handling damage
pressure heat treat
Sources of
Variation
Step 1 Sample 1
Step 3
Sample 3
Sample 2
Time 2
Time 1
Common Uses
Phase 1: •Sample Size
Measurement considerations are used in
any situation where a
Characterization
sample is being used to
Phase 2: infer a population
Analysis
characteristic.
Breakthrough
Strategy
Phase 3:
Improvement
Optimization
Phase 4:
Control
KEYS TO SUCCESS
Reality
Not different (Ho) Different (H1)
Not different (Ho) Correct Conclusion Type II Error
Test Decision
risk
Consumer Risk
Decision Point
Risk a Risk
WHY DO WE CARE IF WE HAVE THE TRUE VALUE?
How confident do you want to be that you have made the right decision?
A person does not feel well and checks into a hospital for tests.
Reality
Not different (Ho) Different (H1)
Not different (Ho) Correct Conclusion Type II Error
Test Decision
risk
Consumer Risk
Error Impact
Ho: Patient is not sick
Type I Error = Treating a patient who is not sick
H1: Patient is sick
Type II Error = Not treating a sick patient
HOW ABOUT ANOTHER EXAMPLE?
A change is made to the sales force to save costs. Did it adversely impact
the order receipt rate?
Reality
Not different (Ho) Different (H1)
Not different (Ho) Correct Conclusion Type II Error
Test Decision
risk
Consumer Risk
p$ (1 - p$ ) p$ (1 - p$ )
Percent Defective p$ - Za / 2 p p$ + Za / 2
n n
These individual formulas are not critical at this point, but notice that the only
opportunity for decreasing the error band (confidence interval) without decreasing the
confidence factor, is to increase the sample size.
SAMPLE SIZE EQUATIONS
2
Za /2s Allowable error = - X
Mean n=
- X (also known as )
s 2
2
Allowable error = s/s
Standard Deviation
n = ca / 2 + 1
s
2
Za / 2
n = p$ (1 - p$ )
Percent Defective Allowable error = E
E
SAMPLE SIZE EXAMPLE
We want to estimate the true average weight for a part within 2 pounds.
Historically, the part weight has had a standard deviation of 10 pounds.
We would like to be 95% confident in the results.
Calculation Values:
Average tells you to use the mean formula
Significance: a = 5% (95% confident)
Za/2 = Z.025 = 1.96
s=10 pounds
-x = error allowed = 2 pounds 2
Za / 2s 196
. *10
2
n= = = 97
- X 2
Calculation:
Answer: n=97 Samples
SAMPLE SIZE EXAMPLE
We want to estimate the true percent defective for a part within 1%.
Historically, the part percent defective has been 10%. We would like to
be 95% confident in the results.
Calculation Values:
Percent defective tells you to use the percent defect formula
Significance: a = 5% (95% confident)
Za/2 = Z.025 = 1.96
p = 10% = .1
E = 1% = .01
Calculation: 2 2
Za / 2 1.96
n = p$ (1 - p$ ) =.1(1-.1) = 3458
E .01
Answer: n=3458 Samples
Learning Objectives
Common Uses
Phase 1: •Used in any situation
Measurement where data is being
Characterization evaluated for significance.
Phase 2:
Analysis
Breakthrough
Strategy
Phase 3:
Improvement
Optimization
Phase 4:
Control
KEYS TO SUCCESS
TIME
2 3 4 5 6 7
p$ (1 - p$ ) p$ (1 - p$ )
Percent Defective p$ - Za / 2 p p$ + Za / 2
n n
These individual formulas enable us to calculate the confidence interval for many of
the most common metrics.
CONFIDENCE INTERVAL
EXAMPLE
Over the past 6 months, we have received 10,000 parts from a vendor with
an average defect rate of 14,000 dpm. The most recent batch of parts
proved to have 23,000 dpm. Should we be concerned? We would like to
be 95% confident in the results.
Calculation Values:
Average defect rate of 14,000 ppm = 14,000/1,000,000 = .014
Significance: a = 5% (95% confident)
Za/2 = Z.025 = 1.96
n=10,000
Comparison defect rate of 23,000 ppm = .023
p$ (1 - p$ ) p$ (1 - p$ )
Calculation: p$ - Za / 2
n
p p$ + Za / 2
n
.014 (1 - .014 ) .014 (1 - .014 )
.014 - 1.96 p .014 + 1.96 .014 - .0023 p .014 + .0023
10 , 000 10 , 000
.012 p .016
Answer: Yes, .023 is significantly outside of the expected 95% confidence interval of
.012 to .016.
CONFIDENCE INTERVAL
EXERCISE
We are tracking the gas mileage of our late model ford and find that
historically, we have averaged 28 MPG. After a tune up at Billy Bob’s
auto repair we find that we only got 24 MPG average with a standard
deviation of 3 MPG in the next 16 fillups. Should we be concerned? We
would like to be 95% confident in the results.
Common Uses
Phase 1: •Control charts can be
Measurement effectively used to
determine “special cause”
Characterization
situations in the
Phase 2: Measurement and
Analysis
Analysis phases
Breakthrough
Strategy
Phase 3:
Improvement
Optimization
Phase 4:
Control
KEYS TO SUCCESS
Point
Value
Special cause
occurrence.
X
What is a Control Chart ?
A control chart is simply a run chart with confidence intervals calculated and drawn in.
These “Statistical control limits” form the trip wires which enable us to determine
when a process characteristic is operating under the influence of a “Special cause”.
+/- 3s =
99% Confidence
Interval
So how do I construct a control chart?
P(1 - P )
UCL p = P + 3 UCLnp = nP + 3 nP(1 - P)
n
P(1 - P )
LCLp = P - 3 LCLnp = nP - 3 nP(1 - P)
n
Note: P charts have an individually calculated control limit for each point plotted
U
UCLu = U + 3 UCLC = C + 3 C
n
U
LCLu = U - 3 LCLC = C - 3 C
n
Note: U charts have an individually calculated control limit for each point plotted
Hypothesis testing is
necessary to:
•determine when there is a
significant difference between
two sample populations.
•determine whether there is a
significant difference between a
sample population and a target
value.
IMPROVEMENT ROADMAP
Uses of Hypothesis Testing
Common Uses
Phase 1:
Measurement
Characterization
Critical
Point
Statistic Value
We determine a critical value from a probability table for the statistic. This
value is compared with the calculated value we get from our data. If the
calculated value exceeds the critical value, the probability of this occurrence
happening due to random variation is less than our test a.
SO WHAT IS THIS ‘NULL” HYPOTHESIS?
Mathematicians are like Frenchmen, whatever you say to them they
translate into their own language and forth with it is something entirely
different. Goethe
Population Population
Average Variance
Many test statistics use a and others use a/2 and often it is confusing to
tell which one to use. The answer is straightforward when you consider
what the test is trying to accomplish.
If there are two bounds (upper and lower), the a probability must be split
between each bound. This results in a test statistic of a/2.
If there is only one direction which is under consideration, the error
(probability) is concentrated in that direction. This results in a test
statistic of a. a a/2
Critical Critical
Critical
Value
Examples Value Value
Examples
Ho: 1>2 Common Ho: 1=2
Common Rare Rare Rare
Occurrence
Ho: s1<s2
Occurrence Occurrence Occurrence Occurrence
Ho: s1=s2
Confidence Int
1-a Probability a Probability a/2 Probability 1-a Probability a/2 Probability
n1 = n2 20 n1 + n2 30 n1 + n2 > 30
2 Sample t
2 Sample Tau 2 Sample Z
(DF: n1+n2-2)
X1 - X 2 X1 - X 2
t dCALC =
|
2 X1 - X 2 | t=
1
+
1 [( n - 1) s12 + ( n - 1) s22]
ZCALC =
s12 s 212
R1 + R 2 n1 n2 n1 + n2 - 2 +
n1 n2
Ho: 1 = 2
Receipts 1 Receipts 2
Statistic Summary:
3067 3200
n1 = n2 = 5
Significance: a/2 = .025 (2 tail)
2730 2777
taucrit = .613 (From the table for a = .025 & n=5) 2840 2623
Calculation: 2 X1 - X 2 2913 3044
R1=337, R2= 577 t dCALC = 2789 2834
R1 + R2
X1=2868, X2=2896
tauCALC=2(2868-2896)/(337+577)=|.06|
Test:
Ho: tauCALC< tauCRIT
Ho: .06<.613 = true? (yes, therefore we will fail to reject the null hypothesis).
Conclusion: Fail to reject the null hypothesis (ie. The data does not support the conclusion that there is a
significant difference)
Hypothesis Testing Example (2 Sample
t)
Several changes were made to the sales organization. The weekly number of
orders were tracked both before and after the changes. Determine if the samples
have equal means with 95% confidence.
Ho: 1 = 2
Receipts 1 Receipts 2
Statistic Summary: 3067 3200
n1 = n2 = 5
DF=n1 + n2 - 2 = 8
2730 2777
Significance: a/2 = .025 (2 tail) 2840 2623
tcrit = 2.306 (From the table for a=.025 and 8 DF) 2913 3044
Calculation: 2789 2834
s1=130, s2= 227
X1=2868, X2=2896 X1 - X 2
t=
tCALC=(2868-2896)/.63*185=|.24| 1 1 [( n - 1) s12 + ( n - 1) s22]
+
Test: n1 n2 n1 + n2 - 2
Ho: tCALC< tCRIT
Ho: .24 < 2.306 = true? (yes, therefore we will fail to reject the null hypothesis).
Conclusion: Fail to reject the null hypothesis (ie. The data does not support the conclusion that there is a significant
difference
Sample Average vs Target (0)
Coming up with the calculated statistic...
n 20 n 30 n > 30
p1 - p0 p1 - p2
ZCALC = ZCALC =
p0 (1 - p0 ) n1 p1 _ n2 p2 n1 p1 _ n2 p2 1 1
1 - +
n n1 + n2 n1 + n2 n1 n2
ANOVA
Learning Objectives
Common Uses
Phase 1:
Measurement
Characterization
Phase 2:
Analysis • Hypothesis Testing
Breakthrough
Strategy
Phase 3:
Improvement
• Design of Experiments
Optimization (DOE)
Phase 4:
Control
KEYS TO SUCCESS
Use validated data & your process experts to identify key variables.
L ite r s P e r H r B y F o r m u la t i o n
G r o u p M e a n s a r e In d i c a t e d b y L i n e s
2 5
3 Groups = 3 Treatments
L i te r s P e r H r
{
2 0 Ove ra ll M e a n
ti
n
1 5
Let’s say we run a study where weF ohave three groups which we are evaluating for a
r m u la t i o n
significant difference in the means. Each one of these groups is called a “treatment”
and represents one unique set of experimental conditions. Within each treatments, we
have seven values which are called repetitions.
Analysis Of Variance -----ANOVA
Linear Model
L ite r s P e r H r B y F o r m u la t i o n
G r o u p M e a n s a r e In d i c a t e d b y L i n e s
2 5
3 Groups = 3 Treatments
L i te r s P e r H r
{
2 0 Ove ra ll M e a n
ti
n
1 5
The first step in being able to perform this analysis is to compute the “Sum
of the Squares” to determine the variation between treatments and within
treatments.
Sum of Squares (SS):
( y ) ( y )
2 2 2
= n ( yi - y) +
a n a a n
ij - y ij - yi
i =1 j =1 i =1 i =1 j =1
Since we will be comparing two sources of variation, we will use the F test to
determine whether there is a significant difference. To use the F test, we
need to convert the “Sum of the Squares” to “Mean Squares” by dividing by
the Degrees of Freedom (DF).
The Degrees of Freedom is different for each of our sources of variation.
DFbetween = # of treatments - 1
DFwithin = (# of treatments)(# of repetitions of each treatment - 1)
DFtotal = (# of treatments)(# of repetitions of each treatment) - 1
Note that DFtotal = DFbetween + DFwithin
SS
Mean Square =
Degrees of Freedom
F Critical
Value
Statistic Value
We determine a critical value from a probability table. This value is compared with
the calculated value. If the calculated value exceeds the critical value, we will reject
the null hypothesis (concluding that a significant difference exists between 2 or more
of the treatment means.
Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?
Here we have an example of three different fuel formulations that are expected to
show a significant difference in average fuel consumption. Is there a significant
difference? Let’s use ANOVA to test our hypothesis……..
Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?
Step 1: Calculating the Mean Squares Between
Step #1 = Calculate the Mean Squares Between Value
Treatment 1 Treatment 2 Treatment 3 Treatment 4
19 20 23
18 15 20
21 21 25
16 19 22
18 17 18
20 22 24
14 19 22
The first step is to come up with the Mean Squares Between. To accomplish this
we:
•find the average of each treatment and the overall average
•find the difference between the two values for each treatment and square it
•sum the squares and multiply by the number of replicates in each treatment
•divide this resulting value by the degrees of freedom
Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?
Step 2: Calculating the Mean Squares Within
Step #2 = Calculate the Mean Squares Within Value
2
Treatment # Data (Xi) Treatment Average (Ao) Difference (Xi-Ao) Difference (Xi-Ao)
1
1
19
18
18.0
18.0
1.0
0.0
1.0
0.0
The next step is to come up with
1
1
21
16
18.0
18.0
3.0
-2.0
9.0
4.0
the Mean Squares Within. To
1
1
18
20
18.0
18.0
0.0
2.0
0.0
4.0
accomplish this we:
1
1
14
0
18.0
18.0
-4.0
0.0
16.0
0.0
•square the difference
1
1
0
0
18.0
18.0
0.0
0.0
0.0
0.0
between each individual
2
2
20
15
19.0
19.0
1.0
-4.0
1.0
16.0 within a treatment and the
2 21 19.0 2.0 4.0
2 19 19.0 0.0 0.0
average of that treatment
2 17 19.0 -2.0 4.0 •sum the squares for each
2 22 19.0 3.0 9.0
2 19 19.0 0.0 0.0 treatment
2
2
0
0
19.0
19.0
0.0
0.0
0.0
0.0 •divide this resulting value by
2 0 19.0 0.0 0.0
3 23 22.0 1.0 1.0 the degrees of freedom
3 20 22.0 -2.0 4.0
3 25 22.0 3.0 9.0
3 22 22.0 0.0 0.0
3 18 22.0 -4.0 16.0
3 24 22.0 2.0 4.0
3 22 22.0 0.0 0.0
3 0 22.0 0.0 0.0
3 0 22.0 0.0 0.0
3 0 22.0 0.0 0.0
Sum of Squares Within (SSw) 102.0
Degrees of Freedom Within (DFw = (# of treatments)(samples -1) 18
Mean Squares Within (MSw=SSw/DFw) 5.7
Class Example -- Evaluate 3 Fuel Formulations
Is there a difference?
Remaining Steps
Common Uses
Phase 1:
Measurement
Characterization
Conduct “ad hoc” training for your team prior to using the tool
Gather data, use the tool to test and then move on.
Keep it simple
A contingency table is just another way of hypothesis testing. Just like the
hypothesis testing we have learned so far, we will obtain a “critical value” from
a table (c2 in this case) and use it as a tripwire for significance. We then use
the sample data to calculate a c2CALC value. Comparing this calculated value to
our critical value, tells us whether the data groups exhibit no significant
difference (null hypothesis or Ho) or whether one or more significant differences
exist (alternate hypothesis or H1).
Ho: P1=P2=P3…
H1: One or more population (P) is significantly different
c2CRIT
Ho H1
c2CALC
So how do you build a Contingency Table?
•Define the hypothesis you want to test. In this example we have 3 vendors
from which we have historically purchased parts. We want to eliminate one
and would like to see if there is a significant difference in delivery performance.
This is stated mathematically as Ho: Pa=Pb=Pc. We want to be 95% confident
in this result.
•We then construct a table which looks like this example. In this table we have
order performance in rows and vendors in columns. Orders On Time Vendor A Vendor B Vendor C
Orders Late
•We ensure that we include both good and bad situations so that the sum of
each column is the total opportunities which have been grouped into good and
bad. We then gather enough data to ensure that each cell will have a
count of at least 5.
Vendor A Vendor B Vendor C
•Add up the totals for each column and row. Vendor A Vendor B Vendor C Total
Orders On Time 25 58 12 95
Orders Late 7 9 5 21
Total 32 67 17 116
Wait, what if I don’t have at least 5 in each cell?
Collapse the table
•If we were placing side bets in a number of bars and
wondered if there were any nonrandom factors at play.
We gather data and construct the following table:
Bar #1 Bar #2 Bar #3
Won Money 5 7 2
Lost Money 7 9 4
•Calculate the percentage of the total contained in each row by dividing the row
total by the total for all the categories. For example, the Orders on Time row
has a total of 95. The overall total for all the categories is 116. The percentage
for this row will be row/total or 95/116 = .82.
Vendor A Vendor B Vendor C Total Portion
Orders On Time 25 58 12 95 0.82
Orders Late 7 9 5 21 0.18
Total 32 67 17 116 1.00
•.The row percentage times the column total gives us the expected occurrence
for each cell based on the percentages. For example, for the Orders on time
row, .82 x32=26 for the first cell and .82 x 67= 55 for the second cell.
Actual Occurrences
Vendor A Vendor B Vendor C Total Portion
Orders On Time 25 58 12 95 0.82
Orders Late 7 9 5 21 0.18
Total 32 67 17 116 1.00
Expected Occurrences
Vendor A Vendor B Vendor C
Orders On Time 26 55
Orders Late
So how do you build a Contingency Table?
•Now we need to calculate the c2 value for the data. This is done using the
formula (a-e)2/e (where a is the actual count and e is the expected count) for
each cell. So, the c2 value for the first column would be (25-26)2/26=.04.
Filling in the remaining c2 values we get:
Actual Occurrences Expected Occurrences
Vendor A Vendor B Vendor C Total Portion Vendor A Vendor B Vendor C
Orders On Time 25 58 12 95 0.82 Orders On Time 26 55 14
Orders Late 7 9 5 21 0.18 Orders Late 6 12 3
Column Total 32 67 17 116 1.00
2 2 2
Calculations ( c =(a-e) /e) Calculated c Values
Vendor A Vendor B Vendor C Vendor A Vendor B Vendor C
2 2 2
Orders On Time (25-26) /26 (58-55) /55 (12-14) /14 Orders On Time 0.04 0.18 0.27
2 2 2
Orders Late (7-6) /6 (9-12) /12 (5-3) /3 Orders Late 0.25 0.81 1.20
Now what do I do?
Construct a
contingency table of
the data and Actual Occurrences
interpret the results Machine 1 Machine 2 Machine 3
for each data set. Good Parts 100 350 900
Scrap 15 18 23
Actual Occurrences
Shift 1 Shift 2 Shift 3
Good Parts 500 400 450
Scrap 20 19 17
Learning Objectives
FUNDAMENTALS
Learning Objectives
Design of Experiments is
particularly useful to:
•evaluate interactions between 2
or more KPIVs and their impact
on one or more KPOV’s.
•optimize values for KPIVs to
determine the optimum output
from a process.
IMPROVEMENT ROADMAP
Uses of Design of Experiments
Phase 1:
Measurement
Characterization
Phase 2:
Analysis
Breakthrough
Strategy
•Verify the relationship
Phase 3:
Improvement
between KPIV’s and
KPOV’s by manipulating
Optimization the KPIV and observing
Phase 4: the corresponding KPOV
Control change.
•Determine the best KPIV
settings to optimize the
KPOV output.
KEYS TO SUCCESS
x f(x) Y=f(x)
For the purposes of this training we will teach only full factorial (2k) designs.
This will enable you to get a basic understanding of application and use the
tool. In addition, the vast majority of problems commonly encountered in
improvement projects can be addressed with this design. If you have any
question on whether the design is adequate, consult a statistical expert...
The Yates Algorithm
Determining the number of Treatments
Determine the 2 levels for each factor. Ensure that the levels
are as widely spread apart as the process and circumstance
allow.
A (6.75)
AB (5.25)
BC (3.25)
B (2.25) What tool do you
AC (1.75)
ABC (1.25)
think would be
C (-0.25) good to use in this
situation?
Confidence Interval for DOE results
Confidence Interval
Ranked Degree of
Impact = Effect +/- Error
A (6.75)
Some of these factors
AB (5.25) do not seem to have
BC (3.25) much impact. We can
B (2.25) use them to estimate
AC (1.75) our error.
ABC (1.25)
C (-0.25) We can be relatively
safe using the ABC and
the C factors since they
offer the greatest
chance of being
insignificant.
Confidence Interval for DOE results
Confidence = ta / 2 , DF (ABC 2
+ C 2
)
Ranked Degree of DF
Impact
DF=# of groups used
6s IMPROVEMENT PHASE
Vital Few Variables
GUTTER!
Establish Operating Tolerances
7 8
soft ball
OR
158 141
hard ball
no wristband Oily Lane Soft Ball
154 159
soft ball
The Ball Type depends on the Lane Condition....
This is called an Interaction!
Where do we go from here?
With Wristband
and