Six Sigmaaa

Measure Phase
Six Sigma Statistics
Six Sigma Statistics

Welcome to Measure Process Discovery Six Sigma Statistics
Basic Statistics Descriptive Statistics Normal Distribution Assessing Normality Special Cause / Common Cause Graphing Techniques
Measurement System Analysis Process Capability Wrap Up & Action Items

2
Purpose of Basic Statistics
The purpose of Basic Statistics is to:

Provide a numerical summary of the data being analyzed.
Data (n)
Factual information organized for analysis. Numerical or other information represented in a form suitable for processing by computer Values from scientific experiments.
Provide the basis for making inferences about the future. Provide the foundation for assessing process capability. Provide a common language to be used throughout an organization to describe processes.
Relax.it wont be that bad!

3
Statistical Notation Cheat Sheet
Summation The Standard Deviation of sample data The Standard Deviation of population data The variance of sample data The variance of population data
An individual value, an observation A particular (1st) individual value For each, all, individual values The Mean, average of sample data The grand Mean, grand average
The range of data The Mean of population data The average range of data Multi-purpose notation, i.e. # of subgroups, # of classes The absolute value of some term Sample size Greater than, less than Greater than or equal to, less than or equal to Population size A proportion of sample data A proportion of population data
Parameters vs. Statistics Population: Frame: Sample:

All the items that have the property of interest under study. An identifiable subset of the population. A significantly smaller subset of the population used to make an inference.
Population
Sample Sample Sample
Population Parameters:
Arithmetic descriptions of a population , , P, 2, N
Sample Statistics:

5
Arithmetic descriptions of a sample X-bar , s, p, s2, n
Types of Data Attribute Data (Qualitative)

Is always binary, there are only two possible values (0, 1)
Yes, No Go, No go Pass/Fail
Variable Data (Quantitative)

Discrete (Count) Data Can be categorized in a classification and is based on counts.
Number of defects Number of defective units Number of customer returns
Continuous Data Can be measured on a continuum, it has decimal subdivisions that are
meaningful Time, Pressure, Conveyor Speed, Material feed rate Money Pressure Conveyor Speed Material feed rate
Discrete Variables
Discrete Variable
Possible values for the variable
The number of defective needles in boxes of 100 diabetic syringes The number of individuals in groups of 30 with a Type A personality The number of surveys returned out of 300 mailed in a customer satisfaction study. The number of employees in 100 having finished high school or obtained a GED The number of times you need to flip a coin before a head appears for the first time
0,1,2, , 100
0,1,2, , 30
0,1,2, 300
0,1,2, 100
1,2,3, (note, there is no upper limit because you might need to flip forever before the first head appears.
Continuous Variables
Continuous Variable
Possible Values for the Variable
The length of prison time served for individuals convicted of first degree murder
All the real numbers between a and b, where a is the smallest amount of time served and b is the largest. All the real numbers between a and $30,000, where a is the smallest household income in the population All real numbers between 200 and b, where b is the largest glucose reading in all such individuals
The household income for households with incomes less than or equal to $30,000
The blood glucose reading for those individuals having glucose readings equal to or greater than 200
Definitions of Scaled Data Understanding the nature of data and how to represent it can affect the types of statistical tests possible. Nominal Scale data consists of names, labels, or categories. Cannot be arranged in an ordering scheme. No arithmetic operations are performed for nominal data. Ordinal Scale data is arranged in some order, but differences between data values either cannot be determined or are meaningless. Interval Scale data can be arranged in some order and for which differences in data values are meaningful. The data can be arranged in an ordering scheme and differences can be interpreted. Ratio Scale data that can be ranked and for which all arithmetic operations including division can be performed. (division by zero is of course excluded) Ratio level data has an absolute zero and a value of zero indicates a complete absence of the characteristic of interest.
9
Nominal Scale
Qualitative Variable
Possible nominal level data values for the variable
Blood Types
A, B, AB, O
State of Residence
Alabama, , Wyoming
Country of Birth
United States, China, other
Time to weigh in!

10
Ordinal Scale
Qualitative Variable
Possible Ordinal level data values

Subcompact, compact, intermediate, full size, luxury
Automobile Sizes
Product rating
Poor, good, excellent
Baseball team classification
Class A, Class AA, Class AAA, Major League
11
Interval Scale
Interval Variable
Possible Scores
IQ scores of students in BlackBelt Training
100 (the difference between scores is measurable and has meaning but a difference of 20 points between 100 and 120 does not indicate that one student is 1.2 times more intelligent )
12
Ratio Scale
Ratio Variable
Possible Scores
Grams of fat consumed per adult in the United States
0 (If person A consumes 25 grams of fat and person B consumes 50 grams, we can say that person B consumes twice as much fat as person A. If a person C consumes zero grams of fat per day, we can say there is a complete absence of fat consumed on that day. Note that a ratio is interpretable and an absolute zero exists.)
13
Converting Attribute Data to Continuous Data
Continuous Data is always more desirable In many cases Attribute Data can be converted to Continuous Which is more useful? 15 scratches or Total scratch length of 9.25 22 foreign materials or 2.5 fm/square inch 200 defects or 25 defects/hour
14
Descriptive Statistics
Measures of Location (central tendency)

Mean Median Mode
Measures of Variation (dispersion)

Range Interquartile Range Standard deviation Variance
15
Open the MINITAB Project Measure Data Sets.mpj and select the worksheet basicstatistics.mtw
16
Measures of Location
Mean is:
Commonly referred to as the average. The arithmetic balance point of a distribution of data.
Stat>Basic Statistics>Display Descriptive Statistics>Graphs >Histogram of data, with normal curve
Histogram (with Normal Curve) of Data

80 70 60
Frequency
Mean StDev N 5.000 0.01007 200
Sample
Population
50 40
Descriptive Statistics: Data

30 20 10 0 4.97 4.98 4.99 Data 5.00 5.01 5.02
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100 Variable Maximum Data 5.0200
17
Median is:
The mid-point, or 50th percentile, of a distribution of data. Arrange the data from low to high, or high to low.
It is the single middle value in the ordered list if there is an odd number of observations It is the average of the two middle values in the ordered list if there are an even number of observations

80 70 60
Frequency
Mean StDev N 5.000 0.01007 200
50
Descriptive Statistics: Data

40 30 20 10 0 4.97 4.98 4.99 Data 5.00 5.01 5.02
Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100 Variable Maximum Data 5.0200
18
Trimmed Mean is a:
Compromise between the Mean and Median.
The Trimmed Mean is calculated by eliminating a specified percentage of the smallest and largest observations from the data set and then calculating the average of the remaining observations Useful for data with potential extreme values.
Stat>Basic Statistics>Display Descriptive Statistics>Statistics> Trimmed Mean
Descriptive Statistics: Data Variable N N* Mean SE Mean TrMean StDev Minimum Q1 Median Data 200 0 4.9999 0.000712 4.9999 0.0101 4.9700 4.9900 5.0000 Variable Q3 Maximum Data 5.0100 5.0200
19
Mode is:
The most frequently occurring value in a distribution of data.
Mode = 5
80 70 60
Frequency
Mean StDev N 5.000 0.01007 200
50 40 30 20 10 0 4.97 4.98 4.99 Data 5.00 5.01 5.02
20
Measures of Variation
Range is the:
Difference between the largest observation and the smallest observation in the data set.
A small range would indicate a small amount of variability and a large range a large amount of variability.
Descriptive Statistics: Data Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100 Variable Maximum Data 5.0200
Interquartile Range is the:

Difference between the 75th percentile and the 25th percentile.
Use Range or Interquartile Range when the data distribution is Skewed.

21
Standard Deviation is:

Equivalent of the average deviation of values from the Mean for a distribution of data. A unit of measure for distances from the Mean. Use when data are symmetrical.
Sample
Population
Descriptive Statistics: Data Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Data 200 0 4.9999 0.000712 0.0101 4.9700 4.9900 5.0000 5.0100 Variable Maximum Data 5.0200
Cannot calculate population Standard Deviation because this is sample data.

22
Variance is the:
Average squared deviation of each individual data point from the Mean.
Sample
Population
23
Normal Distribution
The Normal Distribution is the most recognized distribution in statistics.
What are the characteristics of a Normal Distribution?

Only random error is present Process free of assignable cause Process free of drifts and shifts
So what is present when the data is Non-normal?
24
The Normal Curve
The normal curve is a smooth, symmetrical, bell-shaped curve, generated by the density function.
It is the most useful continuous probability model as many naturally occurring measurements such as heights, weights, etc. are approximately Normally Distributed.
25
Normal Distribution Each combination of Mean and Standard Deviation generates a unique Normal curve:
Standard Normal Distribution Has a = 0, and = 1 Data from any Normal Distribution can be made to fit the standard Normal by converting raw scores to standard scores. Z-scores measure how many Standard Deviations from the mean a particular data-value lies.
26
Normal Distribution
The area under the curve between any 2 points represents the proportion of the distribution between those points. The area between the Mean and any other point depends upon the Standard Deviation.
Convert any raw score to a Z-score using the formula:
Refer to a set of Standard Normal Tables to find the proportion between and x.
27
The Empirical Rule
The Empirical Rule
-6
-5
-4
-3
-2
-1
+1
+2
+3
+4
+5
+6
68.27 % of the data will fall within +/- 1 standard deviation 95.45 % of the data will fall within +/- 2 standard deviations 99.73 % of the data will fall within +/- 3 standard deviations 99.9937 % of the data will fall within +/- 4 standard deviations 99.999943 % of the data will fall within +/- 5 standard deviations 99.9999998 % of the data will fall within +/- 6 standard deviations
28
The Empirical Rule (cont.)
No matter what the shape of your distribution is, as you travel 3 Standard Deviations from the Mean, the probability of occurrence beyond that point begins to converge to a very low number.
29
Why Assess Normality? While many processes in nature behave according to the normal distribution, many processes in business, particularly in the areas of service and transactions, do not There are many types of distributions:
There are many statistical tools that assume Normal Distribution properties in their calculations. So understanding just how Normal the data are will impact how we look at the data.
30
Tools for Assessing Normality
The shape of any Normal curve can be calculated based on the Normal Probability density function. Tests for Normality basically compare the shape of the calculated curve to the actual distribution of your data points. For the purposes of this training, we will focus on 2 ways in MINITAB to assess Normality:
The Anderson-Darling test Normal probability test
Watch that curve!

31
Goodness-of-Fit
The Anderson-Darling test uses an empirical density function.
Departure of the actual data from the expected Normal Distribution. The Anderson-Darling Goodness-of-Fit test assesses the magnitude of these departures using an Observed minus Expected formula.
100
Expected for Normal Distribution Actual Data
20%
80 C u m u l a 60 t i v e P e 40 r c e n t 20
20%
0 3.0 3.5 4.0 4.5 5.0 5.5
Raw Data Scale
32
The Normal Probability Plot

Probability Plot of Amount
Normal
99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1 Mean StDev N AD P-Value 84.69 7.913 70 0.265 0.684
Percent
60
70
80 Amount
90
100
110
The Anderson-Darling test is a good litmus test for normality: if the P-value is more than .05, your data are normal enough for most purposes.
33
The Anderson-Darling test also appears in this output. Again, if the P-value is greater than .05, assume the data are Normal.
The reasoning behind the decision to assume Normality based on the Pvalue will be covered in the Analyze Phase. For now, just accept this as a general guideline.
34
Anderson-Darling Caveat Use the Anderson Darling column to generate these graphs.
Probability Plot of Anderson Darling
Normal
99.9 99 95 90 80 70 60 50 40 30 20 10 5 1
Mean
Summary for Anderson Darling

A nderson-Darling Normality Test A -S quared P-V alue 0.18 0.921 50.031 4.951 24.511 -0.061788 -0.180064 500 35.727 46.800 50.006 53.218 62.823 50.466 50.500 5.278
Mean StDev N AD P-Value
50.03 4.951 500 0.177 0.921
Mean StD ev V ariance Skew ness Kurtosis N Minimum 1st Q uartile Median 3rd Q uartile Maximum 49.596 49.663 9 5 % Confidence Inter vals 4.662
Percent
36
40
44
48
52
56
60
95% C onfidence Interv al for Mean 95% C onfidence Interv al for Median 95% C onfidence Interv al for S tD ev
0.1
35
40
45 50 55 Anderson Darling
60
65
Median 49.50 49.75 50.00 50.25 50.50
In this case, both the Histogram and the Normality Plot look very normal. However, because the sample size is so large, the Anderson-Darling test is very sensitive and any slight deviation from Normal will cause the P-value to be very low. Again, the topic of sensitivity will be covered in greater detail in the Analyze Phase. For now, just assume that if N > 100 and the data look Normal, then they probably are.
35
If the Data Are Not Normal, Dont Panic!
Normal Data are not common in the transactional world. There are lots of meaningful statistical tools you can use to analyze your data (more on that later). It just means you may have to think about your data in a slightly different way.
Dont touch that button!

36
Normality Exercise
Exercise objective: To demonstrate how to test for Normality. 1. Generate Normal Probability Plots and the graphical summary using the Descriptive Statistics.MTW file. 2. Use only the columns Dist A and Dist D. 3. Answer the following quiz questions based on your analysis of this data set.
37
Isolating Special Causes from Common Causes
Special Cause: Variation is caused by known factors that result in a non-random distribution of output. Also referred to as Assignable Cause. Common Cause: Variation caused by unknown factors resulting in a steady but random distribution of output around the average of the data. It is the variation left over after Special Cause variation has been removed and typically (not always) follows a Normal Distribution. If we know that the basic structure of the data should follow a Normal Distribution, but plots from our data shows otherwise; we know the data contain Special Causes.
Special Causes = Opportunity

38
Introduction to Graphing
The purpose of Graphing is to:

Identify potential relationships between variables. Identify risk in meeting the critical needs of the Customer, Business and People. Provide insight into the nature of the Xs which may or may not control Y. Show the results of passive data collection.
In this section we will cover

1. 2. 3. 4. 5. Box Plots Scatter Plots Dot Plots Time Series Plots Histograms
39
Data Sources
Data sources are suggested by many of the tools that have been covered so far:
Process Map X-Y Matrix Fishbone Diagrams FMEA
Examples are:
1. Time Shift Day of the week Week of the month Season of the year Location/position Facility Region Office
40
3.
Operator Training Experience Skill Adherence to procedures Any other sources?
2.
4.
Graphical Concepts
The characteristics of a good graph include: Variety of data Selection of

Variables Graph Range
Information to interpret relationships Explore quantitative relationships
41
The Histogram
A Histogram displays data that have been summarized into intervals. It can be used to assess the symmetry or Skewness of the data.
Histogram of Histogram
40
30
Frequency
20
10
0 98 99 100 101 Histogram 102 103
To construct a Histogram, the horizontal axis is divided into equal intervals and a vertical bar is drawn at each interval to represent its frequency (the number of values that fall within the interval).
42
Histogram Caveat
All the Histograms below were generated using random samples of the data from the worksheet Graphing Data.mtw.
Histogram of H1_20, H2_20, H3_20, H4_20
98 H1_20 4 3 2 4 3 2 1 H3_20 0 8 6 4 2 0 98 99 100 101 102 H4_20 99 100 H2_20 101 102
Frequency
1 0 8 6 4 2 0
Be careful not to determine Normality simply from a Histogram plot, if the sample size is low the data may not look very Normal.
43
Variation on a Histogram
Using the worksheet Graphing Data.mtw create a simple Histogram for the data column called granular.
Histogram of Granular
25
20
Frequency
15
10
0 44 46 48 50 Granular 52 54 56
44
Dot Plot
The Dot Plot can be a useful alternative to the Histogram especially if you want to see individual values or you want to brush the data.
Dotplot of Granular
44
46
48
50 Granular
52
54
56
45

Six Sigmaaa

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Six Sigmaaa

Uploaded by

Copyright:

Available Formats

Measure Phase

Six Sigma Statistics

Six Sigma Statistics

Measurement System Analysis Process Capability Wrap Up & Action Items

Purpose of Basic Statistics

The purpose of Basic Statistics is to:

Relax.it wont be that bad!

Statistical Notation Cheat Sheet

Parameters vs. Statistics Population: Frame: Sample:

Sample Sample Sample

Arithmetic descriptions of a sample X-bar , s, p, s2, n

Types of Data Attribute Data (Qualitative)

Variable Data (Quantitative)

Possible values for the variable

Possible Values for the Variable

Possible nominal level data values for the variable

United States, China, other

Time to weigh in!

Possible Ordinal level data values

Poor, good, excellent

Baseball team classification

Class A, Class AA, Class AAA, Major League

IQ scores of students in BlackBelt Training

Grams of fat consumed per adult in the United States

Converting Attribute Data to Continuous Data

Measures of Location (central tendency)

Measures of Variation (dispersion)

Histogram (with Normal Curve) of Data

Descriptive Statistics: Data

Histogram (with Normal Curve) of Data

Descriptive Statistics: Data

50 40 30 20 10 0 4.97 4.98 4.99 Data 5.00 5.01 5.02

Interquartile Range is the:

Use Range or Interquartile Range when the data distribution is Skewed.

Standard Deviation is:

Cannot calculate population Standard Deviation because this is sample data.

The Normal Distribution is the most recognized distribution in statistics.

What are the characteristics of a Normal Distribution?

So what is present when the data is Non-normal?

The Normal Curve

Convert any raw score to a Z-score using the formula:

The Empirical Rule

The Empirical Rule

The Empirical Rule (cont.)

Tools for Assessing Normality

Watch that curve!

The Anderson-Darling test uses an empirical density function.

Expected for Normal Distribution Actual Data

Raw Data Scale

The Normal Probability Plot

Summary for Anderson Darling

Mean StDev N AD P-Value

50.03 4.951 500 0.177 0.921

Median 49.50 49.75 50.00 50.25 50.50

If the Data Are Not Normal, Dont Panic!

Dont touch that button!

Isolating Special Causes from Common Causes

Special Causes = Opportunity

The purpose of Graphing is to:

In this section we will cover

Operator Training Experience Skill Adherence to procedures Any other sources?

The characteristics of a good graph include: Variety of data Selection of

Information to interpret relationships Explore quantitative relationships