Professional Documents
Culture Documents
7.1 Introduction
Measurement System Analysis (MSA) is the first step of the measure phase along the
DMAIC pathway to improvement. You will be basing the success of your improvement
project on key performance indicators that are tied to your measurement system. Consequently,
before you begin tracking metrics you will need to complete a MSA to validate the measurement
system. A comprehensive MSA typically consists of six parts; Instrument Detection Limit,
Method Detection Limit, Accuracy, Linearity, Gage R&R and Long Term Stability. If you want
to expand measurement capacity or qualify another instrument you must expand the MSA to
include Metrology Correlation and Matching.
A poor measurement system can make data meaningless and process improvement
impossible. Large measurement error will prevent assessment of process stability and capability,
confound Root Cause Analysis and hamper continuous improvement activities in manufacturing
operations. Measurement error has a direct impact on assessing the stability and capability of a
process. Poor metrology can make a stable process appear unstable and make a capable process
appear incapable. Measurement System Analysis quantifies the effect of measurement error on
the total variation of a unit operation. The sources of this variation may be visualized as in
Figure 7.1 and the elements of a measurement system as in Figure 7.2.
Observed Process
Variation
Actual Process Measurement
Variation Variation
Operators are often skeptical of measurement systems, especially those that provide them
with false feedback causing them to “over-steer” their process. This skepticism is well founded
since many measurement systems are not capable of accurately or precisely measuring the
process. Accuracy refers to the average of individual measurements compared with the known,
true value. Precision refers to the grouping of the individual measurements - the tighter the
124
grouping, the higher the precision. The bull’s eye targets of Figure 7.3 best illustrate the
difference between accuracy and precision.
Figure 7.3 Accuracy vs Precision – The Center of the Target is the Objective
125
Accuracy is influenced by resolution, bias, linearity and stability whereas precision is
influenced by repeatability and reproducibility of the measurement system. Repeatability is the
variation which occurs when the same operator repeatedly measures the same sample on the
same instrument under the same conditions. Reproducibility is the variation which occurs
between two or more instruments or operators measuring the same sample with the same
measurement method in a stable environment. The total variance in a quality characteristic of a
process is described by Eqn 7.1 and Eqn 7.2 while the percent contribution of the measurement
system to the total variance may be calculated from Eqn 7.3.
2 repeatability + 2reproducibility
% Contribution = X 100 Eqn 7.3
2total
We want to be able to measure true variations in product quality and not variations in the
2
measurement system so it is desired to minimize measurement . We will review the steps in a
typical measurement system analysis by way of example, first for the case of variables data and
then for the case of attribute data.
Today’s measurement devices are an order of magnitude more complex than the “gages”
for which the Automotive Industry Action Group (AIAG) first developed Gage Repeatability and
126
Reproducibility (Gage R&R) studies. Typically they are electromechanical devices with internal
microprocessors having inherent signal to noise ratios. The Instrument Detection Limit (IDL)
should be calculated from the baseline noise of the instrument. Let us examine the case where a
gas chromatograph (GC) is being used to measure the concentration of some analyte of interest.
Refer to Figure 7.4.
The chromatogram has a baseline with peaks at different column retention times for
hydrogen, argon, oxygen, nitrogen, methane and carbon monoxide. Let’s say we wanted to
calculate the IDL for nitrogen at retention time 5.2 min. We would purge and evacuate the
column to make sure it is clean then successively inject seven blanks of the carrier gas (helium).
The baseline noise peak at retention time 5.2 min is integrated for each of the blank injections
and converted to concentration units of Nitrogen. The standard deviation of these concentrations
is multiplied by the Student’s t statistic for n-1 degrees of freedom at a 99% confidence interval
(3.143) to calculate the IDL. This is the EPA protocol as defined in 40 CFR Part 136:
Guidelines Establishing Test Procedures for the Analysis of Pollutants, Appendix B. Refer to
Figure 7.5 below for the calculation summary.
127
7.3 Method Detection Limit (MDL)
MDL divided by the mean of the seven trials should be within 10-100%. If this is not the
case, repeat the MDL analysis with a starting sample concentration closer to the calculated MDL.
A properly conducted measurement system analysis (MSA) can yield a treasure trove of
information about your measurement system. Repeatability, reproducibility, resolution, bias, and
precision to tolerance ratio are all deliverables of the MSA and can be used to identify areas for
improvement in your measurement system. It is important to conduct the MSA in the current
state since this is your present feedback mechanism for your process. Resist the temptation to
dust off the Standard Operating Procedure and brief the operators on the correct way to measure
the parts. Resist the temptation to replace the NIST8- traceable standard, which looks like it has
been kicked around the metrology laboratory a few times.
To prepare for an MSA you must collect samples from the process that span the
specification range of the measurement in question. Include out-of-spec high samples and out-
of-spec low samples. Avoid creating samples artificially in the laboratory. There may be
complicating factors in the commercial process which influence your measurement system.
Include all Operators in the MSA who routinely measure the product. The number of samples
8
National Institute of Standards and Technology
128
times the number of Operators should be greater than or equal to fifteen, with three trials for each
sample. If this is not practical, increase the number of trials as per Figure 7.7.
Code the samples such that the coding gives no indication to the expected measurement
value – this is called blind sample coding. Have each sample measured by an outside laboratory.
These measurements will serve as your reference values. Ask each Operator to measure each
sample three times in random sequence. Ensure that the Operators do not “compare notes”. We
will utilize Minitab to analyze the measurement system described in Case Study III.
Minnesota Polymer is a firm believer in process ownership. The same operator who charges the raw materials, runs
the manufacturing process, collects the quality control sample, presses the sample disk and then runs the silica
analysis on the XRF-EDS instrument. The operator uses the silica concentration analysis results to adjust the silica
charge on the succeeding batch. POMBLK-15 is typically run over a five-day period in the three-shift, 24/7
operation.
Penny has collected five powder samples from POMBLK-15 process retains which span the silica specification
range and included two out-of-specification samples pulled from quarantine lots. She has asked each of the three
shift operators to randomly analyze three samples from each powder bag for silica content according to her sampling
plan. Penny has sent a portion of each sample powder to the Company’s R&D Headquarters in Hong Kong for
silica analysis. These results will serve as reference values for each sample. The following table summarizes the
silica concentration measurements and Figure 7.8 captures the screen shots of the MSA steps for Case Study III.
129
Sample Operator 1 Operator 2 Operator 3
Bag # Reference 1 Trial 1 Trial 2 Trial 3 Trial 1 Trial 2 Trial 3 Trial 1 Trial 2 Trial 3
1 17.3 18.2 17.9 18.2 18.1 18.0 18.0 17.8 17.8 18.2
2 14.0 14.4 14.9 14.8 14.8 14.6 14.8 14.4 14.4 14.5
3 13.3 14.0 13.9 13.8 13.9 14.2 14.0 13.8 13.7 13.8
4 16.7 17.2 17.2 17.4 17.4 17.3 17.5 17.4 17.5 17.5
5 12.0 12.9 12.8 12.5 12.5 12.9 12.8 12.9 12.5 12.6
1
As Reported by Hong Kong R&D Center
Open a new worksheet. Click on Stat Quality Tools Gage Study Create Gage R&R Study
Worksheet on the top menu.
130
Enter the Number of Operators, the Number of Replicates and the Number of Parts in the dialogue box.
Click OK.
131
Name the adjoining column Silica Conc and transcribe the random sample measurement data to the
relevant cells in the worksheet.
Click on Stat Quality Tools Gage Study Gage R&R Study (Crossed) on the top menu.
132
Select C2 Parts for Part numbers, C3 Operators for Operators and C4 Silica Conc for Measurement data in
the dialogue box. Click the radio toggle button for ANOVA under Method of Analysis. Click Options.
Six (6) standard deviations will account for 99.73% of the Measurement System variation. Enter Lower
Spec Limit and Upper Spec Limit in the dialogue box. Click OK. Click OK.
133
Gage R&R (ANOVA) Report for Silica Conc
Reported by:
Gage name: Tolerance:
Date of study: Misc:
16
150
14
0 1 2 3 4 5
Gage R&R Repeat Reprod Part-to-Part
Parts
R Chart by Operators
1 2 3 Silica Conc by Operators
UCL=0.6693
Sample Range
18
0.50
_ 16
0.25 R=0.26
0.00 LCL=0 14
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Parts 1 2 3
Operators
Xbar Chart by Operators
1 2 3
Parts * Operators Interaction
18
Sample Mean
18 Operators
UCL=15.593
__ 1
16
2
Average
X=15.327 16 3
LCL=15.061
14
14
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Parts 1 2 3 4 5
Parts
A new graph is created in the Minitab project file with the Gage R&R analysis results.
134
Return to the session by clicking on Window Session on the top menu to view the ANOVA analytical
results.
Let us more closely examine the graphical output of the Gage R&R (ANOVA) Report for
Silica Conc. Figure 7.9 shows the components of variation. A good measurement system will
have the lion’s share of variation coming from the product, not the measurement system.
Consequently, we would like the bars for repeatability and reproducibility to be small relative to
part-to-part variation. Figure 7.10 captures the range SPC chart by Operators. The range chart
should be in control. If it is not, a repeatability problem is present. By contrast, the X-bar SPC
chart of Figure 7.11 should be out of control. This seems counterintuitive but it is a healthy
indication that the variability present is due to part to part differences rather than Operator to
Operator differences. Figure 7.12 is an individual value plot of silica concentration by sample
number. The circles with a cross indicate the mean of the sample data and the solid circles are
individual data points. We want a tight grouping around the mean for each sample and we want
significant variation between the means of different samples. If we do not have variation
135
between samples the MSA has been poorly designed and we essentially have five samples of the
same thing. This will preclude analysis of the measurement system. Figure 7.13 is a boxplot of
silica concentration by Operator. As in Figure 7.12 the circles with a cross indicate the mean
concentration for all samples by Operator. The shaded boxes represent the interquartile range
(Q3-Q1) for each Operator. The interquartile range (IQR) is the preferred measure of spread for
data sets which are not normally distributed. The solid line within the IQR is the median silica
concentration of all samples by Operator. If Operators are performing the same, we would
expect similar means, medians and IQRs. Figure 7.14 is an individual value plot used to check
for Operator-Sample interactions. The lines for each Operator should be reasonably parallel to
each other. Crossing lines indicate the presence of Operator-Sample interactions. This can
happen when Operators are struggling with samples at or near the MDL or if the instrument
signal to noise ratio varies as a function of concentration.
136
Figure 7.11 MSA X-bar Chart by Operators
137
Figure 7.14 MSA Sample by Operator Interaction
Let us now focus on the analytical output of the session window as captured in Figure
7.8. Lovers of Gage R&Rs will typically look for four (4) metrics as defined below and expect
these metrics to be within the acceptable or excellent ranges specified by Gage R&R Metric
Rules of Thumb as shown in Figure 7.15.
Equation 7.4 % Contribution
2measurement
% Contribution = X 100 Eqn 7.4
2total
1.41total
Number of Distinct Categories = trunc Eqn 7.8
measurement
138
where 2total = Total Variance
2measurement = Variance due to Measurement System
total = Total Standard Deviation
measurement = Standard Deviation due to Measurement System
P/T = Precision to Tolerance Ratio
USL = Upper Spec Limit
LSL = Lower Spec Limit
TOL = Process Mean – LSL for LSL only
TOL = USL – Process Mean for USL only
The highlighted output of the Minitab session window indicates a % Contribution of the
measurement system of 0.55%. This is in the excellent region. % Study Variation is 7.39%
which is also in the excellent region. Precision to Tolerance ratio is 25.37%. This is in the
acceptable region. Number of distinct categories is 19, well within the excellent region. Overall,
this is a good measurement system. Now, let us proceed to check for linearity and bias by
adding the reference concentrations as measured by the Hong Kong R&D Center for each of the
samples to the worksheet. Figure 7.16 captures the screen shots necessary for this process.
139
Figure 7.16 Gage Linearity and Bias Study Steps – Variable Data
Return to the active worksheet by clicking on Window Worksheet 1 *** on the top menu. Name the
adjoining column Reference Conc and enter the reference sample concentration values corresponding to
each sample (Part) number.
Click on Stat Quality Tools Gage Study Gage Linearity and Bias Study on the top menu.
140
Select C2 Parts for Part numbers, C5 Reference Conc for Reference values and C4 Silica Conc for
Measurement data in the dialogue box. Click OK.
Gage Linearity
Predictor Coef SE Coef P
1.0 Regression Constant 0.5443 0.1826 0.005
95% CI Slope 0.00835 0.01234 0.502
Data
Avg Bias
S 0.167550 R-Sq 1.1%
0.8
Gage Bias
Reference Bias P
0.6 Average 0.666667 0.000
12 0.711111 0.000
13.3 0.600000 0.000
Bias
14 0.622222 0.000
16.7 0.677778 0.000
0.4 17.3 0.722222 0.000
0.2
0.0 0
12 13 14 15 16 17
Reference Value
A new graph is created in the Minitab project file with the Gage Linearity and Bias Study results.
141
We can see there is a bias between the Hong Kong measurement system and Minnesota
Polymer’s measurement system. The bias is relatively constant over the silica concentration
range of interest as indicated by the regression line. The Minnesota Polymer measurement
system is reading approximately 0.67 wt % Silica higher than Hong Kong. This is not saying
that the Hong Kong instrument is right and the Minnesota Polymer instrument is wrong. It is
merely saying that there is a difference between the two instruments which must be investigated.
This difference could have process capability implications if it is validated. Minnesota Polymer
may be operating in the top half of the allowable spec range. The logical next step is for the
Hong Kong R&D center to conduct an MSA of similar design, ideally with the same sample set
utilized by Minnesota Polymer.
In our next case we will analyze the measurement system used to rate customer
satisfaction as described in Case Study IV below.
1
Intended outcome of script from Customer Satisfaction Team
142
Figure 7.17 Measurement System Analysis Steps – Attribute Data
Open a new worksheet. Click on Stat Quality Tools Create Attribute Agreement Analysis Worksheet
on the top menu.
Enter the Number of samples, the Number of appraisers and the Number of replicates in the dialogue box.
Click OK.
143
The worksheet is modified to include a randomized run order of the scripts (samples).
Name the adjoining columns Response and Reference. Transcribe the satisfaction level rating and the
reference value of the script to the appropriate cells.
144
Click on Stat Quality Tools Attribute Agreement Analysis on the top menu.
Select C4 Response for Attribute column, C2 Samples for Samples and C3 Appraisers for Appraisers in the
dialogue box. Select C5 Reference for Known standard/attribute. Click OK.
145
Assessment Agreement Date of study:
Reported by:
Name of product:
Misc:
90 90
Percent
Percent
80 80
70 70
60 60
1 2 3 1 2 3
Appraiser Appraiser
A new graph is created in the Minitab project file with the Attribute Assessment Agreement results.
Display the analytical MSA Attribute Agreement Results by clicking on Window Session on the top
menu.
146
The attribute MSA results allow us to determine the percentage overall agreement, the
percentage agreement within appraisers (repeatability), the percentage agreement between
appraisers (reproducibility), the percentage agreement with reference values (accuracy) and the
Kappa Value (index used to determine how much better the measurement system is than random
chance).
From the graphical results we can see that the Customer Service Agents were in
agreement with each other 90% of the time and were in agreement with the expected (standard)
result 90% of the time. From the analytical results we can see that the agreement between
appraisers was 80% and the overall agreement vs the standard values was 80%. The Kappa
Value for all appraisers vs the standard values was 0.90, indicative of excellent agreement
between the appraised values and reference values. Figure 7.18 provides benchmark
interpretations for Kappa Values.
Another way of looking at this case is that out of sixty expected outcomes there were
only three miscalls on rating customer satisfaction by the Customer Service Agents included in
this study. Mr. Lee can have confidence in the feedback of the Virtual Cable customer
satisfaction measurement system and proceed to identify and remedy the underlying root causes
of customer dissatisfaction.
Improvements to the measurement system should be focused on the root cause(s) of high
measurement system variation. If repeatability is poor, consider a more detailed repeatability
study using one part and one operator over an extended period of time. Ask the operator to
measure this one sample twice per day for one month. Is the afternoon measurement always
greater or always lesser than the morning measurement? Perhaps the instrument is not
adequately cooled. Are the measurements trending up or down during the month? This is an
indication of instrument drift. Is there a gold standard for the instrument? This is one part that
is representative of production parts, kept in a climate-controlled room, handled only with gloves
and carried around on a red velvet pillow. Any instrument must have a gold standard. Even the
kilogram has a gold standard. It is a platinum-iridium cylinder held under glass at the Bureau
International des Poids et Mesures in Sèvres, France. If the gold standard measures differently
147
during the month the measurement error is not due to the gold standard, it is due to the
measurement system. Consider if the instrument and/or samples are affected by temperature,
humidity, vibration, dust, etc. Set up experiments to validate these effects with data to support
your conclusions. If you are lobbying for the instrument to be relocated to a climate-controlled
clean room you better have the data to justify this move.
If reproducibility is poor, read the Standard Operating Procedure (SOP) in detail. Is the
procedure crystal clear without ambiguity which would lead operators to conduct the procedure
differently? Does the procedure specify instrument calibration before each use? Does the
procedure indicate what to do if the instrument fails the calibration routine? Observe the
operators conducting the procedure. Are they adhering to the procedure? Consider utilizing the
operator with the lowest variation as a mentor/coach for the other operators. Ensure that the SOP
is comprehensive and visual. Functional procedures should be dominated by pictures, diagrams,
sketches, flow charts, etc which clearly demonstrate the order of operations and call out the
critical points of the procedure. Avoid lengthy text SOP’s devoid of graphics. They do not
facilitate memory triangulation – the use of multiple senses to recall learning. Refresher training
should be conducted annually on SOP’s with supervisor audit of the Operator performing the
measurement SOP.
Now that you have performed analyses to establish the Instrument Detection Limit,
Method Detection Limit, Accuracy, Linearity, and Gage R&R metrics of your measurement
system and proven that you have a healthy measurement system; you will need to monitor the
measurement system to ensure that it remains healthy. Stability is typically monitored through
daily measurement of a standard on the instrument in question. If a standard is not available, one
of the samples from the Gage R&R can be utilized as a “Golden Sample”. Each day, after the
instrument is calibrated, the standard is measured on the instrument. An Individuals Moving
Range (IMR) SPC chart is generated as we have covered in Chapter 6. If the standard is in
control then the measurement system is deemed to be in control and this provides the
justification to utilize the instrument to perform commercial analyses on process samples
throughout the day. If the standard is not in control the instrument is deemed to be
nonconforming and a Root Cause Analysis must be initiated to identify the source(s) of the
discrepancy. Once the discrepancy has been identified and corrected, the standard is re-run on
the instrument and the IMR chart refreshed to prove that the instrument is in control. Figure 7.19
shows daily stability measurements from Case Study III, silica concentration measurement of
Golden Sample disk number two.
148
I-MR Chart of Golden Sample 2 Silica Conc
15.2
UCL=15.101
Individual Value
14.8
_
X=14.67
14.4
LCL=14.239
1 4 7 10 13 16 19 22 25 28
Day
0.6
UCL=0.5295
Moving Range
0.4
0.2 __
MR=0.1621
0.0 LCL=0
1 4 7 10 13 16 19 22 25 28
Day
149
Select a minimum of sixteen samples to be measured on both metrology tools. Samples should
be selected such that they span the measurement range of interest (for example – the spec range).
Avoid clustered samples around a certain measurement value. If necessary, manufacture
samples to cover the spec range. It is acceptable to include out of spec high and low samples.
In order for two measurement systems to be correlated, R-squared of the least squares
regression line of the current instrument vs the proposed instrument must be 75% or higher. If
matching is desired, there are two additional requirements; the 95% confidence interval of the
slope of the orthogonal regression line must include a slope of 1.0 and a paired t-Test passes (ie
95% confidence interval of mean includes zero). This ensures that bias between the two
instruments is not significant.
Let us revisit Penelope Banks at Minnesota Polymer to better understand metrology
correlation and matching protocol. Penny has requisitioned a redundant XRF-EDS to serve as a
critical back-up to the existing XRF-EDS instrument and to provide analysis capacity expansion
for the future. She has been submitting samples for analysis to both instruments for the last
sixteen weeks and has collected the following results. Please refer to Figure 7.20 for correlation
and matching analysis steps.
150
Figure 7.20 Metrology Correlation and Matching Steps
Open a new worksheet. Copy and paste the measurement data from the two instruments into the
worksheet.
151
Select With Regression in the dialogue box. Click OK.
Select the reference instrument XRF-EDS1 for the X variables and XRF-EDS2 for the Y variables. Click
OK.
152
Scatterplot of XRF-EDS2 vs XRF-EDS1
17
16
XRF-EDS2
15
14
13
13 14 15 16 17
XRF-EDS1
Hover your cursor over the least squares regression line. The R-sq = 98.1%. Correlation is good.
153
Return to the worksheet. Click on Stat → Regression → Orthogonal Regression on the top menu.
Select the reference instrument XRF-EDS2 for the Response (Y) and XRF-EDS1 for the Predictor (X)
variables. Click Options.
154
Select 95 for the Confidence level. Click OK → then click OK one more time.
17
16
XRF-EDS2
15
14
13
13 14 15 16 17
XRF-EDS1
155
Click on Window → Session on the top menu. The session window indicates that the 95% Confidence
Interval of the slope includes 1.0. The two instruments are linear in accuracy.
Return to the worksheet. Click on Stat → Basic Statistics → Paired t on the top menu.
156
Select XRF-EDS1 for Sample 1 and XRF-EDS2 for Sample 2 in the dialogue box. Click Options.
Select 95.0 for Confidence level. Select 0.0 for Hypothesized difference. Select Difference ≠ hypothesized
difference for Alternative hypothesis in the dialogue box. Click OK. Then click OK one more time.
157
The session window indicates that the 95% confidence interval for the mean difference includes zero. The
P-Value for the paired t-Test is above the significance level of 0.05. Therefore we may not reject the null
hypothesis. There is no significant bias between the two instruments.
Penelope has proven that XRF-EDS2 is correlated and matched to XRF-EDS1. She may
now use XRF-EDS2 for commercial shipment releases including Certificates of Analysis to her
customers.
158