Professional Documents
Culture Documents
STATISTICAL
METHODS
FOR ENGINEERS
Chapter Zero
Agenda
Day 1
8:00
Ch 0: Welcome
Day 2
Ch 3: Distribution
Analysis
Day 3
Day 4
Ch 5: Regression
and GLM
Ch 6: Logistic
Regression
10:00
11:00
12:00
1:00
Online Evaluations
Lunch on your own
Ch 2: Measurement
Systems Analysis
Ch 4: Process
Capability and
Tolerance Intervals
Ch 5: Regression
and GLM
continued
2:00
3:00
4:00
5:00
Breaks as Needed
3
Logistics
Starting Time: 8:00
Ending Time: Not later than 5:00
Lunch 12:00-1:00
Breaks every 90-120 minutes
Power Outlets
Rest Room Location
Food and drink locations (snacks, cafeteria, etc)
Icebreaker (5 Minutes)
In my journey through the world of statistics
One thing that has worked well for me is
Expectations
Tools, tools, tools
Tools may be familiar, but the intent is to present the tools with a focus
on statistical thinking and decision-making.
Benefits
A deep mathematical dive can actually help you better see the surface.
Expectations
Experience Chart
None
ALittle
Comfortable Proficient
Icouldteachit
EquivalenceTesting
ToleranceIntervals
ANOVASignal
Interpretation
MeasurementSystems
Analysis
Your Expectations
DistributionAnalysis
ProcessCapability
GeneralLinearModels
Time: 10 Minutes
8
9 | MDT Confidential
10
Chapter 1:
ANOVA and Equivalence Testing
Topics
Quality Trainer Review
ANOVA
Assumptions
Using Minitab Assistant vs Stat Menu
Calculation Deep Dive
Sample Size
ANOVA Signals
Equivalence Testing
2 | MDT Confidential
3 | MDT Confidential
4 | MDT Confidential
ANOVA: ASSUMPTIONS
5 | MDT Confidential
One-way ANOVA:
Testing for the significance of one factor
The null hypothesis:
H0: 1 = 2 = k
Meaning that the population (response) means are equal at
each of the k levels of this factor or the factor is NOT significant.
6 | MDT Confidential
Select a model
Plan sample size using relevant data or guesses
(Optional) Simulate the data and try the analysis
Collect real data
Fit the model (perform ANOVA and get p value)
Examine the residuals
Transform the response or update the model, if
necessary
State conclusion
7 | MDT Confidential
8 | MDT Confidential
9 | MDT Confidential
10 | MDT Confidential
ANOVA Calculations
See www.khanacademy.org
ANOVA 1 Calculating SST (7:39)
ANOVA 2 Calculating SSW and SSB (13:20)
ANOVA 3 Hypothesis Test and F Statistic (10:14)
11 | MDT Confidential
12 | MDT Confidential
Consider a PQ Dataset
Three runs of n=10 units produced and tensile
tested
See Ch1DataFile.mtw
Columns TipTensile1, TipTensile2, TipTensile3
13 | MDT Confidential
Minitab Options
Could use
Stat -> ANOVA
Data arrangement
Stacked (one column for X, one
column for Y)
Unstacked (Y values in columns for
each X)
14 | MDT Confidential
15 | MDT Confidential
17 | MDT Confidential
Comparisons Output
18 | MDT Confidential
10
What to do?
Test for equal variance assumption using Stat >
ANOVA > Test for Equal Variances
If test indicates unequal variances then consider
transforming the response variable
Verify if the outlier is a data entry error
Add the factor into the model
What to do?
Prevent by Randomizing
A time effect may be present
Consider time series procedure
11
Common Transformations
Transformation
Comments
Log(y)
1/y
sin1 y
Minitab Screenshots
Lower C L
Upper CL
Lambda
(using 95.0% confidence)
Estimate
10
Lower CL
Upper CL
StDev
Rounded Value
0.03
-0.30
0.38
0.00
6
4
2
Limit
0
-1
1
Lambda
12
http://www.minitab.com/support/documentation/Answers/Assistant%20White%20Papers/OneWayANOVA_MtbAsstMenuWhitePaper.pdf
25 | MDT Confidential
Report Card
26 | MDT Confidential
13
Diagnostic Report
27 | MDT Confidential
Power Report
28 | MDT Confidential
14
Summary Report
29 | MDT Confidential
ANOVA - Exercise
Use Ch1DataFile.mtw
Test for differences between the group means
using both Stat menu ANOVA and Minitab
Assistant ANOVA . . . for these 3-lot PQ studies:
For TubeTensile1, TubeTensile2, TubeTensile3
For Diameter1, Diameter2, Diameter3
30 | MDT Confidential
15
Alloy-Contact Resistance.MPJ
Applied Statistics and Probability for Engineers, 4th Edition, Douglas C. Montgomery and George C. Runger
Use for
multiple
regression
more than
one X
16
Condition is
the Block
Condition
1
2
3
4
x
Force in Grams
Stylet 1 Stylet 2 Stylet 3
18.1
14.5
14.0
20.0
16.1
16.3
30.2
27.5
26.8
42.5
39.4
38.7
27.70
24.38
23.95
stylet.MTW
Stylet.MTW
17
(1)
(2)
36 | MDT Confidential
18
19
40 | MDT Confidential
20
41 | MDT Confidential
ANOVA Signal in PQ
There was a realization that a significant p-value
in the comparison of lot means should not
necessarily mean the PQ fails
Analysis sometimes included to assess the
power of the ANOVA and the practical
significance of the difference in the means.
Eventually, Corporate Policy on Manufacturing
Process Validation added the ANOVA Failure
Flow Chart
42 | MDT Confidential
21
2008 Version
of Corporate
Guideline for
Manufacturing
Process
Validation
43 | MDT Confidential
2012
Version of
CRDM
ANOVA
Signal Flow
Chart
44 | MDT Confidential
22
Con
Can be very prescriptive
Standards for Ppk are quite high: 95% confidence
bound on Ppk > 1.33
Disincentive for larger sample size
45 | MDT Confidential
Current approaches
Corporate Guideline phased out
CV procedure still has essentially the same
ANOVA Signal Flowchart
CRDM originally had a more prescriptive version
CRDM currently has a simplified version
Would also work to include a discussion of the
sample size of the ANOVA and the practical
significance of the difference
Discussion other businesses?
46 | MDT Confidential
23
47 | MDT Confidential
24
Next steps
Total sample size is 90, so use confidence bound
Lower 95% confidence bound on Ppk is 0.92
Must make 3 more runs
TubeTensile4, TubeTensile5, TubeTensile6
These must pass tolerance interval analysis (like
the first three runs did)
All six runs pass tolerance interval analysis
49 | MDT Confidential
Conclusion
25
51 | MDT Confidential
52 | MDT Confidential
26
EQUIVALENCE TESTING
53 | MDT Confidential
27
55 | MDT Confidential
56 | MDT Confidential
28
Example of Approach 1
29
Example of Approach 1
Method
Parameter
Distribution
Standard deviation
Confidence level
Confidence interval
Mean
Normal
3 (estimate)
95%
Two-sided
Results
Margin
of Error
2
Sample
Size
12
59 | MDT Confidential
Example Output
Two-sample T for New vs Old
New
Old
N
12
12
Mean
30.927
29.19
StDev
0.858
1.52
SE Mean
0.25
0.44
P-Value = 0.003
DF = 17
Conclusions:
The processes are statistically different (p=0.003), which
is a statement about non-equality.
Despite being unequal, the processes are still equivalent.
The 95% confidence interval for the difference in means is
(0.671, 2.798), which is a strict subset of [-3, 3]
60 | MDT Confidential
30
Approach 1: Summary
The confidence interval approach is the gold
standard for clinical trials and other high scrutiny
experiments requiring FDA approval.
It is mathematically equivalent to a p-value-driven
approach called TOST (Two One-Sided T-tests).
The confidence interval approach is easier to
understand than the original form of TOST.
61 | MDT Confidential
Post-hoc Problems
Rigorous application of approach 1 requires that
the value be established before collecting data.
What should we do when data have already been
collected without defining the difference of
interest or planning sample size?
62 | MDT Confidential
31
Approach 2 Method
After collecting the means and standard deviation
of the observed data, create a power curve
through the Power and Sample Size platform in
Minitab.
Display and interpret the Power Curve in your
data analysis report.
You may honestly believe that your experiment
was sufficiently powered (>80%) to detect
meaningful differences, but the post-hoc nature
of the analysis makes your argument weaker.
64 | MDT Confidential
32
Example
Consider again our old and new processes which have
distributions of N(30,22) and N(31,12), respectively.
Suppose we forgot to take approach 1 and instead just collected
5 data points from each process.
We found a statistical difference when we collected 12 data
points, but the p-value goes above 0.05 when collecting only 5:
Two-sample T for New_5 vs Old_5
New_5
Old_5
N
5
5
Mean
30.744
29.42
StDev
0.933
3.02
SE Mean
0.42
1.4
P-Value = 0.403
DF = 4
65 | MDT Confidential
66 | MDT Confidential
33
67 | MDT Confidential
34
69 | MDT Confidential
70 | MDT Confidential
35
71 | MDT Confidential
Assumptions
Using Minitab Assistant vs Stat Menu
Calculation Deep Dive
Sample Size
ANOVA Signals
Equivalence Testing
72 | MDT Confidential
36
Chapter 2:
Measurement Systems Analysis
Topics
Quality Trainer Review
Topics with Variables Data
Gage R&R Sample Size
Probability of Misclassification (Variables Data)
Helpful Hints
2 | MDT Confidential
3 | MDT Confidential
so linearity
and stability
should be
plotted
while bias,
repeatability and
reproducibility are
just single
numbers
5 | MDT Confidential
Gage Stability
MINITAB
Snap Gauge.mtw
> Stat > Control Charts > Variables Charts for Subgroups > Xbar-R
Measurement system
is stable over time as
evidenced by:
0.254
UC L=0.253458
0.252
_
_
X=0.2497
0.250
0.248
0.246
S
8-
ep
0
:0
11
p
Se
8-
00
5:
S
9-
ep
0
:0
11
p
Se
9-
00
5:
-S
10
ep
0
:0
11
ep
-S
10
00
5:
-S
11
Day
ep
0
:0
11
e
-S
11
00
5:
-S
12
ep
0
:0
11
e
-S
12
00
5:
0.0100
Sample Range
LC L=0.245942
UC L=0.00946
0.0075
0.0050
_
R=0.00367
0.0025
0.0000
R Chart - in control
LC L=0
S
8-
ep
0
:0
11
p
Se
8-
00
5:
S
9-
ep
0
:0
11
p
Se
9-
00
5:
-S
10
ep
0
:0
11
ep
-S
10
Day
00
5:
-S
11
ep
0
:0
11
e
-S
11
00
5:
-S
12
ep
0
:0
11
e
-S
12
00
5:
7 | MDT Confidential
8 | MDT Confidential
CVG Test
Method
Validation
10 | MDT Confidential
PROBABILITY OF
MISCLASSIFICATION
11 | MDT Confidential
Misclassification
Two Misclassification Probabilities
Probability of Misclassifying Bad Unit as Good
Probability of Misclassifying Good Unit as Bad
LSL
USL
Probability of
Misclassifying
Good Unit as Bad Unit
Probability of
Misclassifying
Bad Unit as Good Unit
12 | MDT Confidential
Part mean = 30, Part Std Dev = 10, Part Upper Spec = 40
2) Calc/Random Data/Normal
(simulate gage variability)
13 | MDT Confidential
14 | MDT Confidential
6) Stat/Table/Crosstabs to
crosstabulate 4) and 5).
15 | MDT Confidential
The simulation sample size was 10000. A larger sample size would be better.
16 | MDT Confidential
MINITAB Misclassification
17 | MDT Confidential
MINITAB Misclassification
Two problems:
1) Only three decimals for probabilities( i.e. 0.000)
2) Cant enter historical: 1) process mean 2) part std.dev 3) gage std.dev
(Note: (2) can now be done with a CSR work aid 13)
18 | MDT Confidential
19 | MDT Confidential
MINITAB Misclassification
20 | MDT Confidential
10
MINITAB Misclassification
Enlarging the label on the sample mean chart, we see the mean is 30.
21 | MDT Confidential
MINITAB Misclassification
Examining the output we see that: USL 40, and the Part Sigma (10)
and the Gage Sigma (2.6) .
Prob. of a truly bad part called good is .021
22 | MDT Confidential
11
POM Tool
24 | MDT Confidential
12
25 | MDT Confidential
Exercise
Run POM analysis
Using Minitab
Simulation
Using Work Aid 13 and
Minitab GRR
Using POM Tool
26 | MDT Confidential
13
HELPFUL HINTS
27 | MDT Confidential
28 | MDT Confidential
14
29 | MDT Confidential
30 | MDT Confidential
15
31 | MDT Confidential
Standard Gage R&R methods assume that other factors that affect
measurements have been studied and controlled in the development
of the test method.
32 | MDT Confidential
16
33 | MDT Confidential
Peel test
Tensile test
34 | MDT Confidential
17
Pro
Con
Develop a non-destructive
measurement
Ideal solution
Often difficult or
impossible
Focus on Reproducibility
35 | MDT Confidential
Analysis
The nested analysis does not include a term for Part * Operator
interaction.
Note that Minitab Assistant doesnt offer the Nested analysis
18
TestingSupplierCoils.mtw
38 | MDT Confidential
19
39 |
MDT
Confi
denti
al
20
41 | MDT Confidential
Focus on Reproducibility
With destructive measurements, the
Repeatability Standard Deviation always includes
the part-to-part or subsample-to-subsample
variation. In general, repeatability standard
deviation cannot be accurately estimated.
If one population of parts is randomly assigned to
multiple operators, then the Reproducibility
Standard Deviation is not affected by part-to-part
variation.
Reproducibility standard deviation can be
estimated accurately even for destructive tests.
42 | MDT Confidential
21
Reproducibility
Stop
Trying to force (Repeatability + Part) Standard
Deviation to be small enough to meet a requirement.
Trying to obtain or create identical parts.
Start
Estimate Reproducibility standard deviation and ensure
that it is small enough. This standard deviation
depends only on the differences between operator
means.
Compare operator standard deviations. Identify cases
where operators show substantially different variation
across equivalent sets of parts.
43 | MDT Confidential
22
Example Calculations
Data based on actual TMV studies
But altered to disguise
Detection Time A, Detection Time P
45 | MDT Confidential
Detection Time A
46 | MDT Confidential
23
47 | MDT Confidential
Calculate Results
% Tolerance (Reproducibility)
= 100 * ((6*0.123)/2*(30-11.740))
= 100 * (.738 / 36.52)
= 2.02%
Std Dev Ratio = 0.986 / 0.546 = 1.81
Result: Pass
48 | MDT Confidential
24
Detection Time P
49 | MDT Confidential
50 | MDT Confidential
25
Exercises
Open Destructive Exercises.mtw
For Bond Strength results:
Assume specification is Minimum 5 lb
Analysis
52 | MDT Confidential
26
53 | MDT Confidential
54 | MDT Confidential
27
28
29
30
Summary Report
Attribute Agreement Analysis for Results
Summary Report
Misclassification Rates
100%
No
Yes
96.7%
The appraisals of the test items correctly matched the
standard 96.7% of the time.
3.3%
6.7%
0.0%
6.7%
Comments
% Accuracy by Appraiser
120
100.0
100
100.0
96.7%
90.0
80
60
40
Attribute c=0
result . . .
Showing that no
bad parts were
misclassified as
good
Overall, 96.7% of
presentations
were classified
correctly
20
Appraiser 1
Appraiser 2
Appraiser 3
61 | MDT Confidential
Accuracy Report
Attribute Agreement Analysis for Results
Accuracy Report
All graphs show 95% confidence intervals for accuracy rates.
Intervals that do not overlap are likely to be different.
Illustrates the
95% / 90% result
% by Appraiser
Good
Appraiser 1
Appraiser 1
Appraiser 2
Appraiser 3
Appraiser 2
40
60
80
100
% by Standard
Appraiser 3
Good
Bad
Bad
40
60
80
100
Appraiser 1
% by Trial
Appraiser 2
Appraiser 3
40
60
80
100
40
60
80
100
31
Kappa
Minitab:
63 | MDT Confidential
This general rule of thumb may not apply for most Medtronic
applications. Any disagreement on rejectable units would be of
concern.
64 | MDT Confidential
32
Kappa calculations
65 | MDT Confidential
Kappa results
66 | MDT Confidential
33
67 | MDT Confidential
BACKUP SLIDES
68 | MDT Confidential
34
Parts
Samples are parts that can
be subdivided into
homogenous sub samples.
Location
1
1 2
Stage 1: 1 operator
measures sub-samples (2-5)
from parts (5-10).
Stage 2: 3 operators each
measure same location per
part (5-10).
2
5
1 2
10
5
1 2
Stage 2
1 sub-sample per part
Operator
Parts
1
1 2
10
1 2
3
10
1 2
10
69 | MDT Confidential
MINITAB
35
2part
Var Comp.
0.088
0.479
0.567
% of Total
15.50
84.50
StDev
0.296
0.692
0.753
71 | MDT Confidential
operator
Source
Operator
Error
Total
Var Comp.
0.053
0.428
0.481
% of Total
11.08
88.92
StDev
0.231
0.654
0.694
part / repeat
72 | MDT Confidential
36
2
2
repeat
= 2
part
part / repeat
R&R
= 0.340 + .053
= 0.393
74 | MDT Confidential
37
75 | MDT Confidential
76 | MDT Confidential
38
77 | MDT Confidential
78 | MDT Confidential
39
Distribution Analysis
Objectives
Explain why distributional analysis is statistically
complicated (and sometimes emotionally frustrating!)
Emphasize the importance of engineering theory and
historical precedent.
Encourage the use of multiple graphical methods in
addition to numerical tests.
Review common causes of Non-Normality.
Discuss Transformations and how they compare to
fitting non-Normal distributions.
Medtronic Confidential
3 | MDT Confidential
Distribution Analysis
Motivation and Philosophy
StatisticalTool
CapabilityAnalysis
ToleranceIntervals
VariablesLotAcceptanceSampling
IndividualsChartforSPC
GLM/Regression/ANOVA
XbarchartforSPC
Twosamplettest
Nonparametricmethods
DistributionalSensitivity
High
High
High
High
Med
Med/Low
Low
Low
EffectofPoorDistributionalFit
IncorrectPPM/Ppk
IncorrectBounds
Alteredrejectionandacceptancerates
Incorrectcontrollimits
approximatepvalue
approximatepvalue
approximatepvalue
approximatepvalue
5 | MDT Confidential
Frequency
30
20
10
10
20
30
Tim e
40
50
P r o b a b ility P lo t o f T im e
Norm al
9 9.9
M ean
S tD e v
N
AD
P - V a lu e
99
Percent
95
90
12 .31
9.6 56
100
5.7 38
< 0.0 05
80
70
60
50
40
30
20
10
5
1
0.1
-20
-10
10
20
T im e
30
40
50
60
6 | MDT Confidential
7 | MDT Confidential
9 |Medtronic Confidential
9 | MDT Confidential
10 | MDT Confidential
Medtronic Confidential
N=500 Examples
1. Scientific/Engineering Knowledge
2. Historical distribution analysis
3. Distribution analysis
Why is
distribution
analysis last?
13 | MDT Confidential
Medtronic Confidential
Distribution Analysis
Weibull
Exponential (special case of Weibull)
Lognormal
Normal
18 | MDT Confidential
Weibull
A flexible model which can assume many different shapes, depending on the
choice of parameters
Scale parameter or
Shape parameter
Arises from weakest link failures, or situations when the underlying process
focuses on the minimum or maximum value of independent, positive random
variables.
Models stress-strength failures
19 | MDT Confidential
Exponential
20 | MDT Confidential
10
Lognormal
21 | MDT Confidential
Normal
22 | MDT Confidential
11
23 | MDT Confidential
Some Relationships
24 | MDT Confidential
12
Normal
Wearout
Default
Time to
stress/strength
related failure
Measurement
error
Infant
mortality
Dimensions
Lead Time
Time to
fatigue
related failure
Lognormal
25 | MDT Confidential
Distribution Analysis
Statistical Overview
13
27 | MDT Confidential
Distribution Analysis
Graphical Methods
14
Medtronic Confidential
Probability Plot
A probability plot is a 2-dimensional plot with specialized (often
logarithmic) axes, to facilitate comparison between observed
data and a hypothesized distribution.
More specifically, a probability plot is a comparison between the
observed and theoretical quantiles (i.e. percentiles) for a
hypothesized distribution.
30 | MDT Confidential
15
31 | MDT Confidential
32 | MDT Confidential
16
Medtronic Confidential
17
35 | MDT Confidential
Histograms in Minitab
The graph menu offers a histogram platform, but the graphical
summary platform offers more information with fewer clicks.
36 | MDT Confidential
18
Histograms
More intuitive than probability plots, since the x-y axes are not
transformed.
Not informative with small sample sizes (<30)
Can theoretically be misleading if the bin width is calculated
inappropriately, but in practice the histogram is a useful tool for
moderate-to-large sample sizes
Apparent right skew
Approximately Bell-Shaped
37 | MDT Confidential
Time Plots
Fitting a single distribution to your data implies that the
underlying process is stable.
Without a stable process, distributional fit is irrelevant.
Time plots and control charts help evaluate the stability of your
process.
38 | MDT Confidential
19
39 | MDT Confidential
35
1
1 1
11
1
1
1
1
1 1
1
1 1
111 11 1
1 1 11 11 1 1 11 1 1 11
1 1 11 1 1111 11 1111 1 111 11 111111
11 1 1
1
1
11
1
1 11
1 1 11
1
11 1
1
11 1 1
1
1
1 11
11
1
Individual Value
30
25
_
X =19.93
20
15
10
5
UCL=26.30
LCL=13.55
1
1
1
11
11 1
1
11
1 1
1
1
1 1
1 11
1 11 1
1
1
1
1 1 1 11 1
1 1
1 1 11111 111 1111 111 111 1 111 1 1
1
1
1 111 111
1
1 11111 1 1 11 1 11 1
1
1 1 11
1 1
11 1
1
23
45
67
89
111
133
Obse rv ation
155
177
199
221
40 | MDT Confidential
20
99.9
99
Combined
Data is not
normal
95
Percent
90
80
70
60
50
40
30
20
Mean
S tDev
N
AD
P -Valu e
10
5
19.93
9.679
225
13.617
<0.005
1
0.1
-20
-10
10
20
30
Initial Capability Data
40
50
60
41 | MDT Confidential
99.9
99
95
Percent
90
80
70
60
50
40
30
20
Week
1
2
3
Mean StDev
N
AD
P
9.871 2.155 100 0.476 0.233
20.39 2.203 25 0.280 0.616
29.87 2.011 100 0.236 0.785
10
5
1
0.1
Each week
is normal
10
20
30
Initial Capability Data
40
42 | MDT Confidential
21
Distribution Analysis
Numerical Methods
Numerical Methods
For all numerical methods:
A large (0.05) p-value implies there is no evidence
against the hypothesized distribution.
A small (<0.05) p-value implies there is statistically
significant lack-of-fit.
44 | MDT Confidential
22
45 | MDT Confidential
Anderson-Darling
Default approach in Minitab.
May be used to assess fit of Normal and nonNormal distributions.
Gives unreliable results when data are
discretized/grouped, which is fairly common
when measurement system resolution is poor.
46 | MDT Confidential
23
Anderson-Darling in Minitab
For assessing Normality:
47 | MDT Confidential
Anderson-Darling in Minitab
For any/all distributions:
48 | MDT Confidential
24
Anderson-Darling Results
Normal(10,1.5)
Normal(10,1.5)--Rounded
49 | MDT Confidential
Ryan-Joiner
Useful for discretized, rounded, or clumpy data
Will not declare significant lack-of-fit simply due to poor
measurement resolution
Recommended minimum of 5 groups to have a meaningful pvalue. Fewer groups may yield an overly optimistic (high) pvalue.
Anderson-Darling
Ryan-Joiner
50 | MDT Confidential
25
Ryan-Joiner in Minitab
51 | MDT Confidential
Truncation
The Normal distribution may be used to model tail
behavior if it provides a conservative estimate of
those tails.
This situation arises when data are truncated, which
is quantitatively captured as negative kurtosis.
52 | MDT Confidential
26
Truncation
In principle, truncated data may be evaluated
graphically or through a Skewness-Kurtosis (SK) test.
The SK test checks whether the tails of the Normal
distribution are longer or shorter than the tails of your
data.
MECC has created and validated an Excel
spreadsheet (R134997) which executes the SK test.
In practice, consult your local procedures to ensure
your analysis of truncated data is compliant.
Microsoft Excel
Worksheet
53 | MDT Confidential
54 | MDT Confidential
27
Resolving Non-Normality
1
Datashift
Multipledatasources
Outliers
4/5
Censored/Truncateddata
(tails lost)
Distributionnotnormal
Poormeasurementresolution
Toomuchdata
RandomChance
Sublot
Skewness/kurtosistest
Attributesampling
Sublot
Skewness/kurtosistest
Attributesampling
Attributesampling
Outlierremoval
(Mayremoveoutliersonlyif they
constitutetyposordatacollection
errors.)
Skewness/kurtosistest
Conservativefitting
Attributesampling
Nonnormal analysis
Transformation
AttributeSampling
RyanJoiner
Skewness/kurtosistest
Graphicalevidence
Random subsampling
Historical assessment
28
57 | MDT Confidential
58 | MDT Confidential
29
LoanApplicationTime.MTW
59 | MDT Confidential
Loc
Scale
N
AD
P-Value
99
Percent
95
90
80
70
60
50
40
30
20
P r o b a b i l i ty P l o t o f T i m e
Lo g n o r m a l - 9 5 % C I
10
5
99.9
99
Lo c
S c a le
N
AD
P - V a lu e
95
90
1
Percent
0.1
2.269
0.6845
100
0.432
0.299
10
Time
80
70
60
50
40
30
20
2.269
0.6845
100
0.432
0.299
100
10
5
1
Check if LogNormal
provides a good fit
0.1
10
Tim e
100
60 | MDT Confidential
30
O v erall C apability
Z.Bench
1.06
Z.LSL
*
Z.U SL
0.47
Ppk
0.16
Exp. O v erall Performance
PP M < LSL
*
PP M > USL 144242
PP M Total
144242
O bserv ed Performance
PPM < LS L
*
PPM > USL 160000
PPM Total
160000
10
20
30
40
50
61 | MDT Confidential
Distribution Analysis
Transformations
31
Two Options
When a dataset is non-Normal, it is acceptable either to
Mathematically transform the data to achieve Normality
Fit a non-Normal distribution
63 | MDT Confidential
Transformation Advice
If a transformation is chosen, it should be as
simple as possible, and it should ideally have a
physical interpretation.
A log transformation is particularly desirable,
since it
Is monotonic
Is straightforward to interpret (it turns multiplicative
effects into additive effects)
Is equivalent to the LogNormal distribution
Is common in the literature
64 | MDT Confidential
32
Transformation Advice
The Johnson transformation is a last resort, as it
Rarely has any scientific/engineering meaning
Involves a complicated mathematical structure
Is not universally considered an acceptable
transformation
Any Box-Cox transformation with a lambda value
between [-2,2] is typically acceptable, although the
chosen lambda should ideally have a physical
meaning.
65 | MDT Confidential
Transformation Advice
There is no transformation which will eliminate outliers!
By definition, an outlier is so far away from the rest of the data
values that it is unlikely to belong to the same distribution.
An attribute approach is typically needed when outliers are
present.
Investigate the outlier and determine if there were any typos or
other unusual circumstances which would warrant deletion.
Outliers should NOT be deleted unless there is a strong
argument as to why the outlier is not representative of the
process.
An apparent outlier could possibly be a typical datapoint from
a highly skewed distribution, like LogNormal or LEV.
Use engineering thinking as well as statistical thinking to decide
the best course of action for outlier mitigation.
66 | MDT Confidential
33
Box-Cox Transformations
(when there is no theoretical distribution)
Assumptions for Y
Y > 0; Y is skewed (right or left)
Y is unimodal (single peak)
Box-Cox determines transform to make Y
normal
Y() = (Y -1) / for 0
= loge(Y) for = 0
Use Box-Cox when there is no theoretical distribution
67 | MDT Confidential
Box-Cox Transformations
(when there is no theoretical distribution)
68 | MDT Confidential
34
Problem Statement: Time (in days) to resolve errors in case report forms for
a pre-market clinical evaluation is too long causing delay in the product
release
Project Goal: Decrease error resolution time. Expectation is 7 days.
Project Strategy: Path Y = Resolution Time
Task: Determine capability for Y = Resolution Time
69 | MDT Confidential
Fails
3 second rule
Fat pencil test
p-value
99.9
Loc
Scale
N
AD
P-Value
99
95
Percent
90
1.760
1.303
200
3.623
<0.005
80
70
60
50
40
30
20
10
5
1
0.1
0.01
0.10
1.00
10.00
Resolution Time
100.00
1000.00
Not LogNormal!
70 | MDT Confidential
35
71 | MDT Confidential
Upper CL
Lambda
50
StDev
40
Estimate
0.26
Lower CL
Upper CL
0.15
0.38
Rounded Value
0.26
30
Box-Cox transformation of Y.
20
1
Lambda
72 | MDT Confidential
36
73 | MDT Confidential
t ransforme d dat a
P rocess D ata
LS L
*
T arget
*
USL
7
S am ple M ean
10.2928
S am ple N
200
S tD ev (Within)
9.25009
S tD ev (O v erall) 9.5492
Within
O v erall
P otential (Within) C apability
Z.B ench -0.01
Z.LS L
*
Z.U S L
-0.01
C pk
-0.00
C C pk
-0.00
O v erall C apability
*
*
1.65972
1.66391
0.503383
0.485671
Z.B ench
Z.LS L
Z.U S L
P pk
C pm
0.4
O bserv ed P erform ance
P P M < LS L
*
P P M > U S L 520000.00
P P M T otal 520000.00
E xp.
PPM
PPM
PPM
0.8
1.2
1.6
2.0
2.4
-0.01
*
-0.01
-0.00
*
2.8
37
A Desirable Problem
If your data could be handled either through a transformation or
a non-Normal distribution, either path is acceptable.
All else being equal, a recommended prioritization is as follows:
1.
2.
3.
4.
75 | MDT Confidential
Distribution Analysis
Flowchart
38
Medtronic Confidential
Medtronic Confidential
39
Medtronic Confidential
Medtronic Confidential
40
Distribution Analysis
Objectives Recap
Explain why distributional analysis is statistically
complicated (and sometimes emotionally
frustrating!)
Emphasize the importance of engineering theory
and historical precedent.
Encourage the use of multiple graphical methods
in addition to numerical tests.
Review common causes of Non-Normality
Discuss Transformations and how they compare
to fitting non-Normal distributions
Medtronic Confidential
41
42
Challenge Problem
MECC Supplier Dataset: mecc_supplier.mtw
Business goal is to qualify the supplier as having
high capability, and possibly to create a variables
or attribute acceptance sampling plan.
LSL: 0.058
USL: 0.064
Analyze the data and offer your opinion of what
distribution is best for the situation at hand.
What questions would you ask the Supplier
Quality Engineer to help refine your decision?
86 | MDT Confidential
43
Objectives
QT Review
Process Capability
2 | MDT Confidential
Introduction
Process Capability for Normal Data
Capability Indices
Process Capability for Non-Normal Data
Summary
3 | MDT Confidential
A5 Process Capability
Measuring Process Capability
130
Density
0.08
Z = 6.0
0.07
Mean = 100
0.06
Std Dev = 5
Defect
Rate:
1 part
per
billion
0.05
0.04
0.03
NOTE:
2 parts per
billion for
two-sided
specs
0.02
0.01
0.00
80
90
100
5 | MDT Confidential
110
X
120
130
140
Customer
Requirement
130
0.09
Density
0.08
0.07
Mean = 107.5
0.06
Std Dev = 5
Z = 4.5
4.5
Defect
Rate:
0.05
3.4 parts
per
million
0.04
0.03
0.02
0.01
0.00
80
6 | MDT Confidential
90
100
110
X
120
130
140
SIGMA SCALE
Short-Term
Process
Sigma
6.0
5.5
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
Long-Term
7 | MDT Confidential
z
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
P(Z>z)
0.0000034
0.0000317
0.0002327
0.0013500
0.0062097
0.0227501
0.0668072
0.1586553
0.3085375
0.5000000
0.6914625
0.8413447
0.9331928
DPMO
3.4
32
233
1,350
6,210
22,750
66,807
158,655
308,538
500,000
691,462
841,345
933,193
% Conforming
99.99966
99.99683
99.9767
99.865
99.379
97.72
93.32
84.13
69.15
50.00
30.85
15.87
6.68
Cp, Pp
Cpk, Ppk
Includes mean
to account for
centering
8 | MDT Confidential
ProcessFalloutandtheprocess
capabilityratio(PCR).
9 | MDT Confidential
10 | MDT Confidential
NOTE:
Average (R/d2) = (R-bar/d2)
Data Source:
20 subgroup samples of 5
parts taken from a
component manufacturing
process. Data are coded
x 0.0001 in. + 0.50 in.
Applied Statistics &
Probability for Engineers,
6th Edition (Montgomery &
Runger, Wiley 2013)
LSL = 25
USL = 45
Target = 35
12 | MDT Confidential
Sample Mean
37.5
UCL=36.82
35.0
_
_
X=33.32
32.5
30.0
LCL=29.82
1
1
6.0
11
Sample
13
15
17
19
Sample StDev
UCL=5.116
4.5
3.0
_
S=2.449
1.5
LCL=0
0.0
1
13 | MDT Confidential
11
Sample
13
15
17
19
Target
USL
W ithin
Ov erall
P rocess Data
LS L
25
Target
35
USL
45
S ample M ean 33.32
S ample N
100
S tDev (Within) 2.60524
S tDev (O v erall) 3.29946
27
O bserv ed P erformance
P P M < LS L 0.00
P P M > U S L 0.00
P P M Total 0.00
30
33
36
39
42
1.01
0.84
1.18
0.84
0.90
45
14 | MDT Confidential
Mean
StDev
N
AD
P-Value
99
95
90
33.32
3.299
100
0.620
0.104
Percent
80
70
60
50
40
30
20
10
5
1
0.1
25
30
35
40
45
Data
15 | MDT Confidential
Capability Histogram
Xbar Chart
LSL
Target
36
S pecifications
LS L
25
Target 35
USL
45
_
_
X=33.32
32
LCL=29.82
28
1
11
13
15
17
19
27
30
Sample Range
16
33
36
39
42
45
R Chart
A D: 0.620, P : 0.104
UCL=12.81
_
R=6.06
LCL=0
0
1
11
13
15
17
19
20
Last 20 Subgroups
30
Within
StDev 2.605
Cp
1.28
Cpk
1.06
PPM
706.32
32
24
5
16 | MDT Confidential
10
Sample
40
Capability Plot
40
Values
USL
UCL=36.82
15
20
Within
Overall
Overall
StDev 3.299
Pp
1.01
Ppk
0.84
Cpm
0.90
PPM
6040.85
Specs
Target
USL
B/W
Overall
P rocess D ata
LS L
25
Target
35
USL
45
S ample M ean
33.32
S ample N
100
S tDev (Betw een) 2.37001
S tDev (Within)
2.60524
S tDev (B/W)
3.52197
S tDev (O v erall) 3.29946
B/W C apability
Cp
0.95
C P L 0.79
C P U 1.11
C pk 0.79
O v erall C apability
Pp
PPL
PPU
P pk
C pm
27
O bserv ed P erformance
P P M < LS L 0.00
P P M > U S L 0.00
P P M Total 0.00
17 | MDT Confidential
30
33
36
39
42
1.01
0.84
1.18
0.84
0.90
45
18 | MDT Confidential
U C L=105.98
104
_
X=99.10
100
96
LC L=92.21
92
1
11
O bser vation
13
15
17
19
U C L=8.461
M oving Range
8
6
4
__
M R=2.589
LC L=0
0
1
11
O bser vation
13
15
17
19
19 | MDT Confidential
Target
USL
Within
Overall
P rocess D ata
LS L
95
Target
100
USL
105
S ample M ean
99.095
S ample N
20
S tD ev (Within)
2.29563
S tD ev (O v erall) 1.97603
94
O bserv ed P erformance
P P M < LS L 50000.00
PPM > USL
0.00
P P M Total
50000.00
96
98
100
102
0.84
0.69
1.00
0.69
0.76
104
20 | MDT Confidential
10
Mean
StDev
N
AD
P-Value
95
90
99.10
1.976
20
0.398
0.333
Percent
80
70
60
50
40
30
20
10
5
95.0
97.5
100.0
Concentration
102.5
105.0
21 | MDT Confidential
Individual Value
I Chart
LSL
UCL=105.98
105
Target
S pecifications
LSL
95
Target 100
U SL
105
_
X=99.10
100
95
LCL=92.21
1
11
13
15
17
19
94
96
98
Moving Range
100
102
104
UCL=8.461
__
MR=2.589
A D : 0.398, P : 0.333
LCL=0
1
11
13
15
17
19
95
Last 20 Observations
Values
USL
StDev
Cp
Cpk
PPM
100.0
97.5
95.0
5
10
Observation
15
100
105
Capability Plot
20
Within
2.296
0.73
0.59
42277.92
Within
O v erall
Overall
StDev 1.976
Pp
0.84
Ppk
0.69
Cpm
0.76
PPM
20519.84
S pecs
22 | MDT Confidential
11
Cp
Disparity Indicates
Stability Issue
Disparity Indicates
Stability Issue
Pp
Cpk
Ppk
Overall Performance
23 | MDT Confidential
FOUR POSSIBILITIES
(Donald J. Wheeler)
Control Charts (LCL, UCL)
Ideal State
(Monitor)
Brink of Chaos
(Remove
Special Causes)
Threshold
State
(Alter System)
State of
Chaos
Is Process Capable of
Meeting Requirements?
Process Capability Indices
No
(Cp, Cpk, Pp, Ppk)
Requires LSL, USL
No
24 | MDT Confidential
12
110
105
100
95
90
90
1
10
20
30
40
50
Index
60
70
80
90
100
Process Capability of A
LSL
USL
Within
Overall
Process Data
LSL
Target
90
*
USL
110
Sample Mean
99.8045
Sample N
StDev(Within)
100
1.48539
StDev(Overall)
1.45512
CPL
2.20
CPU
Cpk
2.29
2.20
Overall Capability
90
93
Observed Performance
Exp. Within Performance
PPM < LSL 25 | MDT
0.00 Confidential
PPM < LSL
0.00
96
99
102
105
Pp
PPL
PPU
Ppk
108
0.00
0.00
0.00
PPM Total
0.00
PPM Total
0.00
PPM Total
0.00
2.29
2.25
2.34
2.25
Cpm
25
110
105
100
95
90
90
1
10
20
30
40
50
Index
60
70
80
90
100
Process Capability of B
LSL
USL
Within
Overall
Process Data
LSL
Target
USL
90
*
110
Sample Mean
106.814
Sample N
StDev(Within)
100
1.36089
StDev(Overall)
1.4326
CPL
4.12
CPU
Cpk
0.78
0.78
Overall Capability
90
93
96
99
Observed Performance
Exp. Within Performance
PPM < LSL 26 | MDT0.00
PPM < LSL
0.00
Confidential
20000.00
9616.16
13080.51
PPM Total
20000.00
PPM Total
9616.16
PPM Total
13080.51
102
105
108
111
Pp
PPL
PPU
Ppk
Cpm
2.33
3.91
0.74
0.74
*
26
13
110
105
100
95
90
90
1
10
20
30
40
50
Index
60
70
80
90
100
Process Capability of C
LSL
USL
Within
Overall
Process Data
LSL
Target
90
*
USL
110
Sample Mean
100.309
Sample N
StDev(Within)
100
4.73733
StDev(Overall)
4.66247
CPL
0.73
CPU
Cpk
0.68
0.68
Overall Capability
90
95
100
Observed Performance
Exp. Within Performance
PPM < LSL 27 | 20000.00
PPM < LSL
14772.82
MDT Confidential
30000.00
20394.82
18831.51
PPM Total
50000.00
PPM Total
35167.63
PPM Total
32347.18
105
Pp
PPL
PPU
Ppk
110
0.71
0.74
0.69
0.69
Cpm
27
110
105
100
95
90
90
1
10
20
30
40
50
Index
60
70
80
90
100
Process Capability of D
LSL
USL
Within
Overall
Process Data
LSL
Target
USL
90
*
110
Sample Mean
100.106
Sample N
StDev(Within)
100
1.23073
StDev(Overall)
3.78355
CPL
2.74
CPU
Cpk
2.68
2.68
Overall Capability
90
93
Observed Performance
Exp. Within Performance
PPM < LSL 28 | MDT
0.00 Confidential
PPM < LSL
0.00
96
99
102
105
Pp
PPL
PPU
Ppk
108
0.00
0.00
4459.79
PPM Total
0.00
PPM Total
0.00
PPM Total
8242.00
Cpm
0.88
0.89
0.87
0.87
*
28
14
DATA
CONDITIONS
How was it collected?
At a single point in
time, or over multiple
time points? Were all
sources of variation
acting during the data
collection timeframe?
ANALYSIS
STATISTICS
Control Charts:
Is variation stable over time?
INFERENCE
PREDICTION
Future Performance
15
31 | MDT Confidential
Attribute Data
WHAT IS Z
2 .1 5 C a p a b ility
W h a t is Z ?
T e lls h o w
c a p a b le Y is r e la tiv e to s p e c s
Z
6
D P M O
3 .4
2 33
6 ,2 1 0
6 6 ,8 0 7
3 0 8 ,5 3 7
6 9 1 ,4 6 2
32 | MDT Confidential
16
Attribute Data
ROADMAP FOR CAPABILITY
2.15 Capability
Capability Roadmap
What Type of Data
Do You Have ?
Attribute Data
Variables Data
MINITAB:
Stat > Quality Tools >
Capability Analysis > Normal
Z.st
Z Bench
Potential (Within)
Attribute Data
ATTRIBUTE PROCESS CAPABILITY
Opps per unit is the number of opportunities per unit to have this particular defect.
A unit may have more than one opportunity to have a specific defect.
(It is conservative to assume only 1 opportunity of a defect per unit)
The Six Sigma Project Guide is used to carry out the capability calculations.
The icon for this Guide looks like this:
Note: Z.ST stands for Z short term which is a common measure to use in Six Sigma.
34 | MDT Confidential
17
Attribute Data
5 Capab ty
Defects
Opps per
Unit
Units
44
NA
1
NA
50
NA
Z.ST
Z.ST 95%
Upper
Z.ST 95%
Lower
0.33
0.80
-0.19
Initial Capability
Final Capability
Initial Capability
Based on n=50, we are 95% confident: Z.ST < 0.80, Z.St > -0.19
35 | MDT Confidential
Attribute Data
2.15 Capability
Project Goal
(% Defective)
80.000%
Defects
Opps per
Unit
Units
Total Opps
DPMO
44
80
1
1
50
100
50
100
880000
800000
Z.ST
Z.ST 95%
Upper
Z.ST 95%
Lower
Project
Goal Z.ST
Initial Capability
0.33
0.80
-0.19
0.658
Final Capability
0.66
0.95
0.36
0.658
Initial Capability
Final Capability
Final Capability:
# Units is arbitrary (since we dont have any final data yet)
# Defects = # Units * Project Goal % = 100 * 80% = 80 (assumes goal is met)
36 | MDT Confidential
18
Attribute Data
Attribute capability can be expressed as:
-a proportion defective with a confidence interval
-a Z with a confidence interval
p
Project Goal
% Defective
80%
60%
40%
20%
5
0%
Final Capability
Z.ST
4
Initial Capability
3
2
1
0
Initial Capability
Final Capability
37 | MDT Confidential
Attribute Data
For baseline capability: 44/50 defective units (one opportunity per unit)
Inputs:
Minitab: Stat>Quality Tools>Capability Analysis>Binomial
38 | MDT Confidential
19
Attribute Data
For baseline capability: 44/50 defective units (one opportunity per unit)
Outputs:
Binomial Process Capability Analysis of Defectives
P C har t
Binomial P lot
U C L=1
Expected Defectives
Propor tion
1.0
_
P =0.88
0.9
0.8
LC L=0.7421
47.5
45.0
42.5
40.0
40
45
50
O bser ved Defectives
1
Sample
C umulative % Defective
H istogr am
1.00
S ummary S tats
95
85
80
75
0.98
0.99
1.00
Sample
1.01
1.02
% D efectiv e:
Low er C I:
U pper C I:
Target:
P P M D ef:
Low er C I:
88.00
75.69
95.47
0.00
880000
756899
U pper C I:
P rocess Z:
Low er C I:
U pper C I:
954665
-1.1750
-1.6919
-0.6964
Fr equency
%Defective
(95.0% confidence)
90
0.75
0.50
0.25
0.00
88
% Defective
Note: Add 1.5 to Minitab Z outputs to get the Z.st & CI Z.st for baseline
44/50.
39 | MDT Confidential
Attribute Data
2.15 Capability
Submitted Reports
Defects
Opportunities per Report
200
61
1
40 | MDT Confidential
20
OP 10
PRB
Scrap
Good
Rework
41 | MDT Confidential
97.5
_
X=96.55
95.0
92.5
LCL=90.70
90.0
87.5
85.0
13
19
25
31
Day
37
43
49
55
21
97.5
_
X=96.55
95.0
92.5
LCL=90.70
90.0
87.5
NOTE: A statistically
stable process is in control, displaying a consistent
pattern of variation over time. The variation exhibited by a stable
process is 85.0
considered to be due to chance or common causes that are
1
7
13
19
25
31
37
43
49
55
inherent to the design of the system
(product and process). Therefore, a
Day
stable process is operating to its full potential by design. If we desire
better performance (increase mean FPY, or reduce variation), then a
change to the system is required. What type of changes may be
effective? Who is responsible for excecuting changes to the system?
43 | MDT Confidential
44 | MDT Confidential
22
Example A
I-MR Chart of Daily FPY
100
Daily FP Y ( % )
_
X=96.51
95
LC L=90.58
90
85
13
19
25
31
Day
37
43
49
55
M oving Range
U C L=7.288
6
4
__
M R=2.231
LC L=0
0
1
13
19
25
31
Day
37
43
49
55
I Chart (Long-Term):
S = 2.013
MR Chart (Short-Term): S = 2.231 / 1.128 = 1.98
Stability Index = 2.013 / 1.978 = 1.02
45 | MDT Confidential
Example B
I Chart of Daily FPY
_
X=97.0
100
LCL=82.6
80
1
60
40
20
UB=0
13
19
25
31
Day
37
43
49
55
I Chart (Long-Term):
S = 13.45
MR Chart (Short-Term): S = 5.4 / 1.128 = 4.79
Stability Index = 13.45 / 4.79 = 2.81
46 | MDT Confidential
23
Example B
I Chart of Daily FPY
_
X=97.0
100
LCL=82.6
80
1
60
40
20
13
19
25
31
Day
37
49
55
I Chart (Long-Term):
S = 13.45
MR Chart (Short-Term): S = 5.4 / 1.128 = 4.79
Stability Index = 13.45 / 4.79 = 2.81
47 | MDT Confidential
Example C
I Chart of Daily FPY
_
X=99.32
100
LCL=97.40
95
90
85
80
75
13
19
25
31
Day
37
43
49
55
I Chart (Long-Term):
S = 3.627
MR Chart (Short-Term): S = 0.72 / 1.128 = 0.638
Stability Index = 3.627 / 0.638 = 5.68
48 | MDT Confidential
24
AVERAGE FPY
99
98
97
A
96
1
49 | MDT Confidential
3
4
STABILITY INDEX
Improvement Strategy
AVERAGE FPY vs. STABILITY INDEX
Capable But Periodically Unstable
100
AVERAGE FPY
99
98
Stable but
Chronically Less
Capable
97
Change System
A (Projects)
96
Monthly, Quarterly:
Ops Mgmt, Engr
1
50 | MDT Confidential
3
4
STABILITY INDEX
25
Non-normal Data
Dataset: DISTSKEW.MTW
Variables: Pos Skew (column B)
Objective: Determine Cpk with Specs: 5-50
Pathway: Stat/Basic Statistics/Graphical Summary
Inputs: select variable Pos Skew to analyze
Is this data normally distributed?
Pathway: Graph/Probability Plot (Test for Normality, default option)
Inputs: select variable Pos Skew to analyze
Two plot layout: Right click on folder icon on toolbar to left of i toolbar symbol
Hold down control key and left click on two graph names, right click on the graph names
to get layout tool and click on finish.
Layout tool results:
Non-normal Data
CPK FOR NON-NORMAL DISTRIBUTION
Dataset: DISTSKEW.MTW
Variables: Pos Skew (column B)
Box/Cox transformation :Pathway: Stat/Control Charts/Box Cox
Inputs: all obs in one column/ select variable Pos Skew /Subgroup Size 1
Johnson Transformation: Pathway: Stat/Quality Tools/Johnson Transformation
Inputs: select variable Pos Skew to analyze
Merged layout:
= 0.0
52 | MDT Confidential
26
Non-normal Data
BOX- COX TRANSFORMATION
BOX COX Table of Transformations
______________________________________________________________________
Transformation
______________________________________________________________________
1
No transformation
1/2
Square root
0
Log
-1/2
Reciprocal Square Root
-1
Reciprocal
53 | MDT Confidential
Non-normal Data
CPK WITH TRANSFORMED DATA
What is the Cpk for DistSkew Data Set?
Pathway: Stat/Quality tools/Capability Analysis/Normal
Inputs: select variable Pos skew, subgroup size 1, LSL=5,USL=50
AND click on Box-Cox button and select Use optimal lambda
54 | MDT Confidential
27
Non-normal Data
CAPABILITY WITH TRANSFORMED DATA
Capability SixPack
Pathway: Stat/Quality tools/Capability Sixpack/Normal
Inputs: select variable Pos skew, subgroup size 1, LSL=5,USL=50
AND click on Box-Cox button and select Use optimal lambda
55 | MDT Confidential
Non-normal Data
CAPABILITY WITH RAW DATA
What is the Capability for DistSkew Data Set?
Pathway: Stat/Quality tools/Capability Analysis/Nonnormal
Inputs: select variable Pos skew, subgroup size 1, LSL=5,USL=50
AND click select the radio button distribution with pull down of lognormal
Output:
56 | MDT Confidential
28
Non-normal Data
Capability Normal Branch with Box-Cox vs Log-Normal
1) the ppm Observed stay the same when you fit the log-normal using
either the normal or non-normal capability branch. Actually, the ppm
observed will stay the same no what distribution you fit to the data.
2) the ppm Expected Overall stays the same when you fit the log-normal
using either the normal or non-normal capability branch.
3) The Ppks can be very different between using the capability normal
branch (with the Box-Cox transform) vs using the capability nonnormal
(using lognormal fit) because the capability nonnormal branch uses
the ISO definition of Ppk
4) The capability nonnormal has no Cpk. Just Ppk. And it has no confidence
interval for Ppk either.
57 | MDT Confidential
58 | MDT Confidential
29
59 | MDT Confidential
60 | MDT Confidential
30
Minitab Assistant
61 | MDT Confidential
62 | MDT Confidential
31
63 | MDT Confidential
Status
Stability
Description
The process mean and variation are stable. No points are out of control.
Number of
Subgroups
Capability Histogram
Are the data inside the limits?
You only have 20 subgroups. For a capability analysis, it is generally recommended that you collect at
least 25 subgroups over a long enough period of time to capture the different sources of process
variation.
Normality
Your data passed the normality test. As long as you have enough data, the capability estimates
should be reasonably accurate.
Amount
of Data
The total number of observations is 100 or more. The capability estimates should be reasonably
precise.
Process Characterization
LSL
USL
Total N
Subgroup size
Mean
StDev (overall)
StDev (within)
100
5
0.54646
0.019341
0.018548
Capability Statistics
Xbar-R Chart
Confirm that the process is stable.
0.52
0.54
0.56
0.58
0.60
Actual (overall)
Pp
Ppk
Z.Bench
% Out of spec (observed)
% Out of spec (expected)
PPM (DPMO) (observed)
PPM (DPMO) (expected)
Potential (within)
Cp
Cpk
Z.Bench
% Out of spec (expected)
PPM (DPMO) (expected)
0.86
0.80
2.29
2.00
1.10
20000
10969
0.90
0.83
2.41
0.81
8072
M ean
0.56
0.54
0.52
0.10
Customer Requirements
Rang e
0.05
Low
High
Upper Spec
Target
Lower Spec
0.6
*
0.5
Process Characterization
Z.Bench = 2.29
Mean
Standard deviation
0.00
1
11
13
15
17
19
Actual (overall) Capability
Are the data inside the limits?
LSL
Normality Plot
The points should be close to the line.
USL
0.86
0.80
2.29
1.10
10969
Comments
Normality Test
(Anderson-Darling)
Results
P-value
0.54646
0.019341
Conclusions
-- The defect rate is 1.10%, which estimates the
percentage of parts from the process that are outside the
spec limits.
Pass
0.794
0.50
0.52
0.54
0.56
0.58
0.60
64 | MDT Confidential
32
Use TILES.MTW
Choose Minitab Assistant
Capability Analysis
Detects non-normality
and offers the option of
transfomation (Box-Cox)
65 | MDT Confidential
Status
Description
Stability
The process mean and variation are stable. No points are out of control.
Number of
Subgroups
You only have 10 subgroups. For a capability analysis, it is generally recommended that you collect at
least 25 subgroups over a long enough period of time to capture the different sources of process
variation.
Normality
The transformed data passed the normality test. As long as you have enough data, the capability
estimates should be reasonably accurate.
Amount
of Data
The total number of observations is 100 or more. The capability estimates should be reasonably
precise.
66 | MDT Confidential
33
Xbar-S Chart
Confirm that the process is stable.
Customer Requirements
M
ean
Low
3
StDev
High
Upper Spec
Target
Lower Spec
1
1
Mean
Standard deviation
2.9231
1.7860
*
0.75
2.24
1.26
12569
10
Normality Test
(Anderson-Darling)
Original
Transformed
Fail
0.010
Pass
0.574
Results
P-v alue
8
*
*
Process Characterization
Z.Bench = 2.24
USL
Comments
Conclusions
-- The defect rate is 1.26%, which estimates the
percentage of parts from the process that are outside the
spec limits.
Capability Histogram
Are the data below the limit?
Process Characterization
USL
Total N
Subgroup size
100
10
0.0
Capability Statistics
Actual (overall)
Pp
Ppk
Z.Bench
% Out of spec (observed)
% Out of spec (expected)
PPM (DPMO) (observed)
PPM (DPMO) (expected)
Potential (within)
Cp
Cpk
Z.Bench
% Out of spec (expected)
PPM (DPMO) (expected)
0.0
1.5
3.0
4.5
6.0
7.5
1.5
3.0
4.5
6.0
7.5
*
0.75
2.24
2.00
1.26
20000
12569
*
0.76
2.28
1.12
11249
Transformed Data
67 | MDT Confidential
68 | MDT Confidential
34
USL
Within
Ov erall
P rocess Data
LS L
0.5
Target
*
U SL
0.6
Sample M ean 0.54646
Sample N
100
StDev (Within) 0.0185477
StDev (O v erall) 0.0193414
O v erall C apability
0.50
O bserv ed P erformance
P P M < LS L 10000.00
P P M > U SL 10000.00
P P M Total
20000.00
0.52
0.54
0.56
0.58
0.60
Pp
Low er C L
PPL
PPU
P pk
Low er C L
C pm
Low er C L
0.86
0.76
0.80
0.92
0.80
0.69
*
*
70 | MDT Confidential
35
10
0.24
0.31
0.38
0.44
0.51
0.58
0.64
0.70
0.77
0.83
0.89
0.96
1.02
1.08
1.14
1.21
1.27
1.33
1.39
1.45
1.52
1.58
1.64
1.70
1.76
1.82
1.89
1.95
2.01
2.07
2.13
2.19
2.25
2.32
2.38
2.44
20
0.32
0.40
0.48
0.55
0.63
0.71
0.78
0.86
0.93
1.01
1.08
1.16
1.23
1.30
1.38
1.45
1.53
1.60
1.67
1.75
1.82
1.90
1.97
2.04
2.12
2.19
2.26
2.34
2.41
2.48
2.56
2.63
2.71
2.78
2.85
2.93
30
0.35
0.44
0.52
0.60
0.68
0.76
0.84
0.92
1.00
1.08
1.16
1.24
1.32
1.40
1.48
1.56
1.64
1.71
1.79
1.87
1.95
2.03
2.11
2.19
2.27
2.34
2.42
2.50
2.58
2.66
2.74
2.82
2.89
2.97
3.05
3.13
100
0.42
0.51
0.60
0.69
0.78
0.87
0.96
1.05
1.14
1.23
1.32
1.41
1.49
1.58
1.67
1.76
1.85
1.94
2.03
2.11
2.20
2.29
2.38
2.47
2.56
2.65
2.73
2.82
2.91
3.00
3.09
3.18
3.26
3.35
3.44
3.53
150
0.43
0.53
0.62
0.71
0.80
0.89
0.99
1.08
1.17
1.26
1.35
1.44
1.53
1.62
1.71
1.80
1.89
1.99
2.08
2.17
2.26
2.35
2.44
2.53
2.62
2.71
2.80
2.89
2.98
3.07
3.16
3.25
3.34
3.44
3.53
3.62
200
0.44
0.54
0.63
0.72
0.82
0.91
1.00
1.09
1.19
1.28
1.37
1.46
1.55
1.65
1.74
1.83
1.92
2.01
2.11
2.20
2.29
2.38
2.47
2.57
2.66
2.75
2.84
2.93
3.03
3.12
3.21
3.30
3.39
3.48
3.58
3.67
71 | MDT Confidential
72 | MDT Confidential
36
1.1
2.2
3.3
4.4
5.5
6.6
A -S quared
P -V alue <
436.22
0.005
M ean
S tDev
V ariance
S kew ness
Kurtosis
N
1.1036
0.5573
0.3106
2.7486
13.4533
10000
M inimum
1st Q uartile
M edian
3rd Q uartile
M aximum
7.7
0.3191
0.7542
0.9676
1.2816
7.8123
1.1145
0.9761
Mean
0.5651
Median
0.950
0.975
1.000
1.025
1.050
1.075
1.100
Histogram of Cpk 10
Normal
Normal
Mean
StDev
N
500
0.0
1.1
2.2
3.3
4.4
Cpk 5
5.5
6.6
Mean
StDev
N
1600
Frequency
Frequency
1000
1.104
0.5573
10000
800
7.7
0.7
1.4
2.1
Histogram of Cpk 20
StDev
N
0.1763
10000
1.10
1.32
Cpk 20
1.54
1.76
1.98
0.64
0.80
0.96
Mean
StDev
0.9788
0.1053
10000
1.08
Cpk 50
1.12
Cpk 30
1.28
1.44
0.1411
10000
1.60
1.20
1.32
1.44
Mean
StDev
500
Frequency
Frequency
250
0.96
0.9770
StDev
N
Normal
500
0.84
Mean
Normal
0.72
4.9
250
Histogram of Cpk 50
4.2
500
Frequency
Frequency
0.9802
200
0.88
3.5
Normal
Mean
400
0.66
2.8
Cpk 10
Histogram of Cpk 30
Normal
1.007
0.2853
10000
0.9825
0.07376
10000
250
0.72
0.80
0.88
0.96
1.04
Cpk 100
1.12
1.20
1.28
74 | MDT Confidential
37
Assumptions:
Normal
= 10
=1
LSL = 7
USL = 7
True Cpk = 1.0
7
6
4
3
2
1
0
Cpk 5
Cpk 10
Cpk 15
Cpk 20
Cpk 25
Cpk 30
Cpk 50
Cpk 100
75 | MDT Confidential
Assumptions:
Normal
= 10
=1
LSL = 7
USL = 7
True Cpk = 1.0
3.0
2.5
2.0
Data
Data
1.5
1.0
0.5
0.0
Cpk LB 5
Cpk LB 10
Cpk LB 15
Cpk LB 20
Cpk LB 25
Cpk LB 30
76 | MDT Confidential
38
6.00%
5.00%
Miss Rate
5.00%
4.00%
3.00%
2.00%
1.00%
Target
Nominal
0.00%
USL=13
7.0
7.5
8.0
8.5
9.0
True Population Mean
9.5
10.0
77 | MDT Confidential
Simulation Results
Formula is conservative when process
mean is on target (better than 95%
coverage of true Cpk value).
As process mean deviates from target,
formula provides approximately the
stated reliability in performance (95%),
regardless of sample size.
78 | MDT Confidential
39
79 | MDT Confidential
50
60
70
75
0.999999
0.99999
0.9999
0.999
0.998
0.997
0.996
0.995
0.994
0.993
0.992
0.991
0.99
0.98
0.97
0.96
0.95
0.94
0.93
0.92
0.91
0.9
0.8
0.7
0.6
0.5
693147
69315
6932
693
347
231
173
139
116
99
87
77
69
35
23
17
14
12
10
9
8
7
4
2
2
1
916291
91629
9163
916
458
305
229
183
153
131
115
102
92
46
31
23
18
15
13
11
10
9
5
3
2
2
1203973
120397
12040
1204
602
401
301
241
201
172
150
134
120
60
40
30
24
20
17
15
13
12
6
4
3
2
1386294
138629
13863
1386
693
462
346
277
231
198
173
154
138
69
46
34
28
23
20
17
15
14
7
4
3
3
Confidence Level (% )
80
85
90
1609438
160943
16094
1609
804
536
402
322
268
230
201
179
161
80
53
40
32
27
23
20
18
16
8
5
4
3
1897120
189712
18971
1897
948
632
474
379
316
271
237
210
189
94
63
47
37
31
27
23
21
19
9
6
4
3
2302584
230258
23025
2302
1151
767
575
460
383
328
287
255
230
114
76
57
45
38
32
28
25
22
11
7
5
4
95
97.5
99
99.5
99.9
2995731
299572
29956
2995
1497
998
748
598
498
427
373
332
299
149
99
74
59
49
42
36
32
29
14
9
6
5
3688878
368887
36887
3688
1843
1228
921
736
613
526
460
409
368
183
122
91
72
60
51
45
40
36
17
11
8
6
4605168
460515
46050
4603
2301
1533
1149
919
766
656
574
510
459
228
152
113
90
75
64
56
49
44
21
13
10
7
5298315
529830
52981
5296
2647
1764
1322
1058
881
755
660
587
528
263
174
130
104
86
74
64
57
51
24
15
11
8
6907752
690773
69075
6905
3451
2300
1724
1379
1148
984
861
765
688
342
227
170
135
112
96
83
74
66
31
20
14
10
80 | MDT Confidential
40
0 Failures Allowed
Reliability
0.999
0.997
0.99
0.95
0.90
0.80
Confidence
90
95
2302
767
230
45
22
11
2995
998
299
59
29
14
Reliability
0.999
0.997
0.99
0.95
0.90
0.80
0.99970
0.99910
0.9970
0.9847
0.9690
0.9389
Confidence
90
95
0.99977
0.99931
0.9977
0.9883
0.9764
0.9517
0.99998
0.99993
0.9997
0.9989
0.9977
0.9950
0.99998
0.99995
0.9998
0.9991
0.9982
0.9963
BEFORE
DURING
Characterization
Qualification
Process
Studies
Studies
Stability
Inject sources of
variation to stress system
Experimentation (DOE)
Simulation Modeling
Measure design margin
n delivers conf%/rel%
Limited conditions
All sources of
variation will be
acting over the long
term in the future
Representative
sample?
Need to detect
significant changes
Optimization
82 | MDT Confidential
41
42
Summary Quiz
True or False
___________
___________
___________
You can ignore plotting the data and just compute Ppk.
The Total Exp. Overall ppm are the same for
log-normal data in Minitab for both of these approaches:
1) the Normal Capability branch (with lambda=0) and
2) the non-normal Capability branch and selecting
Log-normal.
The smaller the sample used to compute Ppk, the
better. It is less work to collect the data.
86 | MDT Confidential
43
87 | MDT Confidential
44
Chapter 4B:
Tolerance Intervals
Topics
Tolerance Intervals
Calculations
Sample Size
2 | MDT Confidential
3 | MDT Confidential
4 | MDT Confidential
5 | MDT Confidential
6 | MDT Confidential
7 | MDT Confidential
8 | MDT Confidential
9 | MDT Confidential
11 | MDT Confidential
30
41.063
10.127
Normal
0
20
40
60
Lower
18.583
Nonparametric
N ormal
Lower
7.560
Normality Test
N onparametric
0
10
20
30
40
50
60
AD
P-Value
70
0.772
0.040
90
50
10
1
10
20
30
40
50
60
70
12 | MDT Confidential
13 | MDT Confidential
Exercise
Compute 95/95 lower tolerance bounds for
TubeTensile2, TubeTensile3
15 | MDT Confidential
16 | MDT Confidential
18 | MDT Confidential
10
Exercises
Choose a sample size for
Normal distribution tolerance interval
One-sided specification: Min 3 lbf
Planning data: TubeTensile3
21 | MDT Confidential
22 | MDT Confidential
11
Two-sided
Two-sided 95% / 95%:
Calculate two-sided
confidence intervals for
2.5th and 97.5th percentiles.
Lower bound is the lower
95% bound on the 2.5th
percentile
Upper bound is the upper
95% bound on the 97.5th
percentile
23 | MDT Confidential
24 | MDT Confidential
12
25 | MDT Confidential
26 | MDT Confidential
13
Interval is
2.03 to 61.51
27 | MDT Confidential
28 | MDT Confidential
14
29 | MDT Confidential
15
LeRoy Mattson
Jeremy Strief
Objectives
Understand how GLM is a generalization of
ANOVA and regression
Understand three primary concepts within GLM
models
Fixed vs. Random effects
Nesting vs. Crossing
Covariate (Continuous) vs. Factor (Attribute)
2 | MDT Confidential
3 | MDT Confidential
Attribute
Variables
(discrete data) (continuous data)
Variables (continuous)
Attribute (discrete)
Regression
Multiple Regression
GLM
t-test (1 X, 2 levels)
One-way ANOVA
GLM
Logistic Regression
Chi Square
Logistic Regression
GLM: Concepts
GLM: Variable Y One Attribute X
GLM: Variable Y Two Attribute Xs
GLM: Variable Y Mixture of Attribute & Variable Xs
GLM Introduction
GLM stands for General Linear Model
A flexible, unified approach to regression and
ANOVA.
Needed when building a Y=f(X) transfer function, but
when the input variables dont match a standard
regression or ANOVA approach:
Regression assumes continuous Xs
ANOVA treats Xs as attributes, and it often requires a
balanced experimental design in Minitab
What if your dataset does not fit into the ANOVA or
Regression mold?
6 | MDT Confidential
Motivating Example
Pin Pulls.mtw
7 | MDT Confidential
Motivating Example
Response variable (Y): Pull Strength
Predictor Variables (Xs):
Hole diameter: 17.5, 18.5, or 19.5
Fillet Style: one-sided or two-sided
Solder size: small or large
8 | MDT Confidential
17.5
18.5
19.5
All
All
4
0
4
8
3
0
2
5
7
0
6
13
17.5
18.5
19.5
All
All
9
4
16
29
4
0
12
16
13
4
28
45
9 | MDT Confidential
10 | MDT Confidential
11 | MDT Confidential
GLM: Concepts
GLM: Variable Y One Attribute X
GLM: Variable Y Two Attribute Xs
GLM: Variable Y Mixture of Attribute & Variable Xs
Topics to be covered
GLM: Variable Y One Attribute X
One-way ANOVA (review)
GLM approach
Random effect vs. Fixed effect model
14
15
16
17
Minitab Output
19
Multiple comparison
20
10
ANOVA
GLM
no*
yes
yes
yes
Fits covariates
no
yes
no*
yes
21
11
23
24 | MDT Confidential
12
25 | MDT Confidential
Designs
Suppliers
Material types
Controllable process settings (e.g. laser power,
position, etc.)
Random Effects:
Lots
Operators
Subsampling from a finite population of levels
Noise variables (uncontrollable aspects of a process)
26 | MDT Confidential
13
MINITAB
Loom.mtw
Var(y) = a2 + 2
Random Effect Model
Estimate a2 & 2
27
na2
28
14
Loom.mtw
29
30
15
GLM: Concepts
GLM: Variable Y One Attribute X
GLM: Variable Y Two Attribute Xs
GLM: Variable Y Mixture of Attribute & Variable Xs
Topics to be covered
GLM: Variable Y Two Attribute Xs
Two-way ANOVA
GLM approach
Crossed vs. Nested design
32
16
MINITAB
Monday begins
at 21:00 on
Sunday, etc.
33
ANOVA approach
Y
interaction default
between Xs
Xs
34
17
ANOVA Output
yijk = + ai + bj + abij + eijk
35
Xs
36
18
37
38
19
90
Shift
Since interaction is
significant, these
plots do not tell the
whole story!
80
70
60
50
40
30
20
Mon
Tue
Wed
Thu
Fri
39
40
20
Interaction =
Lines NOT
Parallel
Each line is a
different day
Each line is a
different shift
2 Shift
2 Day
= 1473.8/39776.0
2 Day*Shift
21
Days Overdue.mtw
Approach:
43
Exercise Debrief
Solution:
What are the key Xs?
What is the relationship between the key Xs and Y
What is the impact of the key Xs on Y?
44
22
Days Overdue.mtw
45
Percent
Verify Normality
Assumption
(want fit to line)
-4
-2
0
2
Standardized Residual
18
12
6
0
-2
-1
0
1
Standardized Residual
3.0
1.5
0.0
-1.5
-3.0
-15
-10
-5
0
Fitted Value
Frequency
24
3.0
1.5
0.0
-1.5
-3.0
10
20
46
30 40 50 60 70
Observation Order
80
90 100
Verify Independence
Assumption (Want no
patterns)
23
Micrometer.mtw
47
Nesting
Factor B is nested in factor A if the levels of B
have different meanings for each level of A.
Stated differently, factor B is nested in factor A if
there is a completely different set of levels of B
for every level of A.
Minitab notation: B(A) means B is nested within
A.
48 | MDT Confidential
24
Nesting Example
Example: An experiment is run with three suppliers,
each of which produces three batches of material.
There clearly are three levels of supplier, but how
many levels of batch are there?
Batch 1 from supplier 1 has nothing to do with batch 1
from supplier 2. Batch level 1 has no consistent
meaning across suppliers. So Batch is nested in
supplier.
Instead of labeling the batch levels as 1-3, it would be
appropriate to label them 1-9.
Crossing
Factor B is crossed with Factor A if the levels of B
have the same meaning for each level of A.
This is the standard factorial structure of a DOE
Example: An experiment is run with three
suppliers, each of which utilizes two types of
material100% gold or 100% nickel.
Gold and Nickel have the same meaning and
same interpretation, regardless of supplier.
Supplier is therefore crossed with material.
50 | MDT Confidential
25
Purity.mtw
52
26
A = Fixed or Random ?,
B = Fixed or Random ?
Is Supplier a key X?
batch = ?
= 1.62
Source
DF
SS
MS
supplier
15.056
7.52778
2.85
0.077
batch
25.639
8.54630
3.24
0.040
Interaction
44.278
7.37963
2.80
0.033
Error
24
63.333
2.63889
Total
35
148.306
S = 1.624
R-Sq = 57.30%
R-Sq(adj) = 37.72%
54
27
GLM Exercise:
MINITAB
(Purity.mtw)
Is supplier a key X?
F/R
F = Fixed
C = Crossed
R = Random
N = Nested
56
28
example
Statistical model
Terms in model
Factor A, B crossed
A, B, A*B
A, B(A), C, A*C,
B*C
+ el(ijk)
57
Exercise
MINITAB
Time.MTW
L2
O5 O6 O7 O8
F1
F2
F3
Time: 20 minutes
58
29
GLM: Concepts
GLM: Variable Y One Attribute X
GLM: Variable Y Two Attribute Xs
GLM: Variable Y Mixture of Attribute & Variable Xs
Topics to be covered
GLM: Variable Y Mixture of Attribute and Variable Xs
GLM with Covariates
Strategic GLM
60
30
Attribute (Factor) X
Variables (Covariate) X
7
Coffee Taste
Lead Time (Days)
Taste
Curve
Y=F(X)
Actual
data
quadratic
3
0
10
20
30
40
35
50
60
30
25
20
15
10
5
No Line or
Curve
Y=F(X)
Actual
data
Supplier
10
Example:
MINITAB
62
31
DF
3
1
1
2
1
1
37
46
Seq SS
3784.8
8.3
55.7
684.3
336.8
10108.2
4957.1
19935.2
Adj SS
5991.3
61.3
40.2
572.2
1.4
10108.2
4957.1
Adj MS
1997.1
61.3
40.2
286.1
1.4
10108.2
134.0
F
14.91
0.46
0.30
2.14
0.01
75.45
P
0.000
0.503
0.587
0.133
0.919
0.000
64
32
Reduce Terms
Edit Last Dialog
65
Reduce Terms
Source
Rub Band
Ball
PB Angle
Error
Total
DF
3
2
1
40
46
Seq SS
3784.8
691.8
10348.8
5109.8
19935.2
Adj SS
5988.0
681.2
10348.8
5109.8
Adj MS
1996.0
340.6
10348.8
127.7
F
15.63
2.67
81.01
P
0.000
0.082
0.000
66
33
Source
Rub Band
Ball
PB Angle
Error
Total
DF
3
2
4
37
46
Seq SS
3784.8
691.8
12097.9
3360.7
19935.2
DF
Adj SS
F
p
Adj SS
5212.1
900.0
12097.9
3360.7
Adj MS
1737.4
450.0
3024.5
90.8
Variable
1
10348.8
81.01
0.000
Attribute
4
12097.9
33.30
0.000
40
37
Error DF
F
19.13
4.95
33.30
P
0.000
0.012
0.000
67
Coef
99.896
SE Coef
1.531
T
65.24
P
0.000
-14.661
19.518
3.354
2.725
2.942
2.597
-5.38
6.63
1.29
0.000
0.000
0.204
7.570
-5.102
2.664
2.031
2.84
-2.51
0.007
0.016
-32.106
-4.533
7.497
9.455
3.035
2.920
3.129
3.669
-10.58
-1.55
2.40
2.58
0.000
0.129
0.022
0.014
-14.661
-5.102
+7.497
= 87.63
34
Interactions
69
Interactions
Source
Rub Band
Ball
PB Angle
Rub Band*Ball
Error
Total
DF
3
2
1
6
34
46
Seq SS
3784.8
691.8
10348.8
187.5
4922.3
19935.2
Adj SS
4423.8
502.4
8766.5
187.5
4922.3
Adj MS
1474.6
251.2
8766.5
31.2
144.8
F
10.19
1.74
60.55
0.22
P
0.000
0.192
0.000
0.969
35
Interactions
p-values for Interactions
Rub Band
Ball
Ball
0.969
PB Angle
0.566
0.211
Curvature only
applies to
variables Xs!
5.5
Point Type
Corner
Center
Mean of Taste
5.0
Curvature Model
4.5
4.0
3.5
Linear Model
3.0
1.0
30.5
60.0
36
Standardized Residual
-1
-2
130
140
150
PB Angle
160
170
74
37
(PB Angle)2
75
DF
3
2
1
1
39
46
Seq SS
3784.8
691.8
10348.8
1360.4
3749.3
19935.2
Adj SS
6036.3
1060.9
1685.1
1360.4
3749.3
Adj MS
2012.1
530.5
1685.1
1360.4
96.1
F
20.93
5.52
17.53
14.15
P
0.000
0.008
0.000
0.001
76
38
77
99
Percent
Brush over
to find
which point
is causing
trouble!
-2
0
2
Standardized Residual
4
2
0
-2
60
Frequency
12
8
4
-1
0
1
2
3
Standardized Residual
100
Fitted Value
120
140
16
80
4
2
0
-2
10
15 20 25 30 35
Observation Order
40
45
What do we conclude?
78
39
Exercise
MINITAB
Zinc plating.mtw
79
Exercise : Questions
1) One-way ANOVA : X = vendor
Is there significant differences among vendors?
2) GLM: X1= vendor, X2 = Bracket Thickness
How does this change the conclusion?
3) Bonus Questions:
If you are to do this testing again, what would you do differently?
Use a graphical tool to support your rationale (Suggestion: try
Interaction Plot under ANOVA)
80
40
81
Exercise:
Fit a GLM to create a model for pull strength
Can Hole diameter be reasonably treated as a
covariate? (Engineering theory suggests that it can.)
Determine if variables are fixed vs. random, crossed
vs. nested
Which Xs are statistically significant?
82 | MDT Confidential
41
83 | MDT Confidential
42
Logistic Regression
I still feel like Im regressing
LeRoy Mattson
Objectives
Understand how logistic regression creates a
predictive model for an attribute Y
Fit logistic regression models in Minitab
2 | MDT Confidential
Logistic Regression
X\Y
Lung Cancer
No Lung Cancer
Total
Smoker
Non-smoker
2/3
1/8
0.67
0.125
= 5.33
Variables (continuous)
Attribute (discrete)
Regression
Multiple Regression
GLM
t-test (1 X, 2 levels)
One-way ANOVA
GLM
Logistic Regression
Chi Square
Logistic Regression
In this module
1) What tool(s) for Hypothesis Test?
2) What tool(s) for Graphical Analysis?
Approach:
Work individually.
Y = Cancer or Cancer-Free
X = Exposure
(smoking/nonsmoking)
Smoking2.MTW
11
In this module
1) What tool(s) for Hypothesis Test?
2) What tool(s) for Graphical Analysis?
Approach:
Work individually.
12
1.0
0.397278
41.8%
39.3%
0.6
0.4
99
0.2
90
200
250
300
350
400
target speed (cms/sec)
450
Percent
0.0
500
50
10
1
hit or miss
0.8
-2
-1
0
1
Standardized Residual
2
1
0
-1
-2
0.00
Frequency
4.5
3.0
1.5
-2
-1
0
1
Standardized Residual
0.50
Fitted Value
0.75
1.00
6.0
0.0
0.25
2
1
0
-1
-2
8 10 12 14 16 18 20 22 24
Observation Order
13
Logit(p) vs X
Logit (p)
b1 < 0
0.8
0.6
logit(p)
proportion(p)
0.7
0.5
0.4
0.3
-1
0.2
-2
0.1
0.0
200
250
300
350
X
400
450
200
500
250
300
350
X
400
450
500
each X value
S-shape to straight line
Logit(p) = loge[(p/(1-p)]
Logistic f(x): loge [p(x)/(1-p(x))] = b0 +b1X
Origins: Verhulst (mathematician) named the logistic function (1838-1847: 3 papers).
Pearl and Reed (1920, Johns Hopkins, Biometry and Vital Statistics) rediscovered
Logistic to model population growth in US
14
Declare Attribute Xs
Identifying Key Xs
Link Function: Logit
Response Information
Variable
Value
Count
hit or miss
13
12
Total
25
Coef
Constant
target speed (cms/sec)
SE Coef
5.56028
2.04130
2.72
0.006
-0.0156619
0.0055920
-2.80
0.005
Ratio
0.98
95% CI
Lower Upper
0.97
1.00
Log-Likelihood = -11.411
Test that all slopes are zero: G = 11.796, DF = 1, P-Value = 0.001
Response Information
Variable
Value
hit or miss
13
12
Total
25
Count
(Event)
Coef
SE Coef
5.56028
2.04130
2.72
0.006
-0.0156619
0.0055920
-2.80
0.005
Constant
target speed (cms/sec)
Ratio
0.98
95% CI
Lower Upper
0.97
1.00
Log-Likelihood = -11.411
Test that all slopes are zero: G = 11.796, DF = 1, P-Value = 0.001
For 50 unit increase in target speed, risk (chance) of hitting target = (0.98)50 = 0.46
(i.e., a 54% reduction)
17
18
1.0
0.8
Y-Data
0.4
0.2
0.0
200
250
300
350
400
target speed (cms/sec)
450
500
Cancer Remission.MTW
(from Lee*)
Task:
Approach:
Time:
10 minutes
20
10
Exercise Debrief
Solution:
1.
2.
3.
4.
21
Logistic Regression
11
EsophagealCancer.MTW
30
8.45436
9.2%
0.0%
cancer%
25
20
15
10
20
30
40
50
age
60
70
80
24
12
70
alc%
60
50
40
30
20
10
20
Rows: Alcohol
25
0
6
18.18
1
27
81.82
All
33
100.00
30
40
50
a ge
60
65
27
81.82
6
18.18
33
100.00
75
18
66.67
9
33.33
27
100.00
70
All
93
51.67
87
48.33
180
100.00
80
25
Alcohol is Attribute X
MINITAB
EsophagealCancer.MTW
26
13
Predictor
Constant
Alcohol
1
Age Group
Coef
-2.72159
SE Coef
0.753215
Z
-3.61
P
0.000
0.733554
0.0187340
0.406715
0.0119209
1.80
1.57
0.071
0.116
Odds
Ratio
95% CI
Lower Upper
2.08
1.02
0.94
1.00
4.62
1.04
Log-Likelihood = -87.915
Test that all slopes are zero: G = 4.314, DF = 2, P-Value = 0.116
Alcohol is Key X
27
0.35
EPRO1
0.30
0.25
0.20
0.15
0.10
20
30
40
50
A ge Group
60
70
80
28
14
Ingots.MTW
Task:
Approach:
Time:
10 minutes
29
Summary Quiz
True or False
___________
___________
___________
30
15
Statistical Resources
Avoiding wheel re-invention
LeRoy Mattson
Objectives
Ensure you are aware of statistical resources
both internal and external to Medtronic:
Medtronic Statistical Resources Web Site
External Web Sites
2 | MDT Confidential
http://mitintra.corp.medtronic.com/corporate-statistics/
3 | MDT Confidential
For links to Validation Plans & Reports: Click on Search button on Web Site
For Medstat Plans/Reports: Enter Medstat validation
For Minitab Plans/reports: Enter Minitab validation
For Crystal Ball Validation Plans/Reports :Enter Crystal Ball validation
Note: These links are to pdf documents stored in Documentum.
4 | MDT Confidential
5 | MDT Confidential
6 | MDT Confidential
Get Trained
7 | MDT Confidential
8 | MDT Confidential
9 | MDT Confidential
Tools/Resources: Software
10 | MDT Confidential
11 | MDT Confidential
12 | MDT Confidential
Tools/Resources: MHOS
Medtronic Handbook of Statistics - Rev G. : in pdf format only
13 | MDT Confidential
14 | MDT Confidential
Tools/Resources: Other
Software
Miscellaneous
15 | MDT Confidential
16 | MDT Confidential
Get Connected
17 | MDT Confidential
Get Connected
Industrial Statistics Questions?
1) Contact your divisions Industrial Statistics Council member
18 | MDT Confidential
19 | MDT Confidential
20 | MDT Confidential
10
21 | MDT Confidential
22 | MDT Confidential
11
23 | MDT Confidential
24 | MDT Confidential
12