You are on page 1of 54

Basic Statistics

GE Company Proprietary June, 1998


Field Training/Basic Statistics 2
Where are We in D-M-A-I-C?

Focusing the Problem
Define Need
(CTQ Identification)
Define Phase
Measure Phase
Analyze Phase
Improve Phase
Control Phase
Define Process
Process Variation
Data Collection
Baselining
Basic Statistics
Monitor Output
Customer (Dashboards)
Intro to
Data Analysis
Pilot
Improvements
GE Company Proprietary June, 1998
Field Training/Basic Statistics 3
- Types of data
- Normal distribution and normal probabilities
- Measures of the center of the data
Mean
Median
- Measures of the spread of data
Range
Variance
Standard deviation
- Process stability and process capability

Topics well discuss
Basic Statistics
GE Company Proprietary June, 1998
Field Training/Basic Statistics 4
- What is a unit?
- What is a defect ?
- What are the number of opportunities/unit for error?
- Calculate DPU = Total number of defects
Total number of units
- Calculate DPMO = Total number of defects
Total Total number
number x of opportunities
of units per unit
Note: Some processes may have multiple opportunities per unit
It is necessary to define
Calculating Sigma
x 1,000,000
GE Company Proprietary June, 1998
Field Training/Basic Statistics 5
Control
Analyze
Improve
Define
Measure
Use of Data in the DMAIC Cycle
Collect facts surrounding a problem or opportunity
Develop a baseline for process parameters
Establish relationships between process inputs
and outputs
Quantify the impact of a new process improvement
Monitor process; ensure process gain is sustained
GE Company Proprietary June, 1998
Field Training/Basic Statistics 6
Hmm
What Kind of
Data Is This?
Six Sigma
Data
There are several different types of data
Types of Data
- We collect a statistically appropriate
amount of data before we draw any
conclusions
- We choose the appropriate analytical tools
to measure and control processes.

Understanding what type of data we have
ensures that:
GE Company Proprietary June, 1998
Field Training/Basic Statistics 7
Types of Data
- Attribute data (qualitative, words)
Categories (strongly agree, agree, etc. . .)
Yes, no (order form filled out accurately or not)
On time, not on time
Pass/fail; good/bad (accurate billing/overcharged)
- Variable data (quantitative, numbers)
Discrete (count) data
Data is not capable of being meaningfully subdivided into more
precise increments
Sample size needed is much larger than continuous data
Ex: # of days > 30 A/R aging;# of times customer hangs up
before receiving response
Continuous data
Decimal subdivisions are meaningful
Ex: time to answer the telephone (exact # of seconds per call)
Sample size of 30 is usually adequate
GE Company Proprietary June, 1998
Field Training/Basic Statistics 8
Continuous
1 4.9787 21 4.9759 41 4.9769 61 4.9756 81 4.9746
2 4.9760 22 4.9761 42 4.9762 62 4.9759 82 4.9758
3 4.9762 23 4.9746 43 4.9747 63 4.9752 83 4.9786
4 4.9772 24 4.9764 44 4.9776 64 4.9756 84 4.9759
5 4.9767 25 4.9758 45 4.9738 65 4.9749 85 4.9756
6 4.9756 26 4.9761 46 4.9767 66 4.9770 86 4.9754
7 4.9745 27 4.9779 47 4.9756 67 4.9747 87 4.9751
8 4.9761 28 4.9777 48 4.9758 68 4.9758 88 4.9757
9 4.9764 29 4.9770 49 4.9757 69 4.9765 89 4.9752
10 4.9751 30 4.9764 50 4.9769 70 4.9759 90 4.9744
11 4.9750 31 4.9752 51 4.9754 71 4.9766 91 4.9755
12 4.9768 32 4.9762 52 4.9772 72 4.9763 92 4.9764
13 4.9761 33 4.9767 53 4.9757 73 4.9771 93 4.9768
14 4.9751 34 4.9766 54 4.9778 74 4.9761 94 4.9760
15 4.9757 35 4.9767 55 4.9746 75 4.9762 95 4.9742
16 4.9751 36 4.9757 56 4.9774 76 4.9768 96 4.9772
17 4.9767 37 4.9763 57 4.9759 77 4.9767 97 4.9768
18 4.9766 38 4.9778 58 4.9757 78 4.9780 98 4.9754
19 4.9757 39 4.9746 59 4.9767 79 4.9761 99 4.9764
20 4.9764 40 4.9756 60 4.9776 80 4.9763 100 4.9767
Salary Increases of 100 Employees
Discrete
Salary Increases of 100 Random Employees
1 1 21 -1 41 1 61 -1 81 -1
2 0 22 1 42 1 62 -1 82 -1
3 1 23 -1 43 -1 63 -1 83 1
4 1 24 1 44 1 64 -1 84 -1
5 1 25 -1 45 -1 65 -1 85 -1
6 -1 26 1 46 1 66 1 86 -1
7 -1 27 1 47 -1 67 -1 87 -1
8 1 28 1 48 -1 68 -1 88 -1
9 1 29 1 49 -1 69 1 89 -1
10 -1 30 1 50 1 70 -1 90 -1
11 -1 31 -1 51 -1 71 1 91 -1
12 1 32 1 52 1 72 1 92 1
13 1 33 1 53 -1 73 1 93 1
14 -1 34 1 54 1 74 1 94 0
15 -1 35 1 55 -1 75 1 95 -1
16 -1 36 -1 56 1 76 1 96 1
17 1 37 1 57 -1 77 1 97 1
18 1 38 1 58 -1 78 1 98 -1
19 -1 39 -1 59 1 79 1 99 1
20 1 40 -1 60 1 80 1 100 1
Continuous vs. Discrete
GE Company Proprietary June, 1998
Field Training/Basic Statistics 9
After completion of Basic Statistics, you should :
Goals
- Be able to use statistical terminology
- Be able to graph data
- Calculate mean, median, and standard deviation
- Be able to describe and interpret data
- Use the data to make better and quicker decisions
- Recognize and apply the normal distribution

GE Company Proprietary June, 1998
Field Training/Basic Statistics 10
What Is Statistics??
Collecting data, graphing data, and
using that information to make decisions.
GE Company Proprietary June, 1998
Field Training/Basic Statistics 11
Class Exercise
Flip a coin 20 times and record the sequential results (H or T)
# of
Heads Tal l y # of Heads Frequency
Rel at i v e
Frequency
Cum ul at i v e
Frequency
Cum ul at i v e
Rel at i v e
Frequency
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Flip: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # of Heads
Results:
n = x = median = s =
GE Company Proprietary June, 1998
Field Training/Basic Statistics 12
Definition of:
n: Sample size, number of observations

Frequency: Number of entries in a cell

Relative Frequency: Cell frequency
Sample size


Cumulative Frequency: Number in that cell plus previous cells

The cumulative frequency in the last cell must be

Cumulative Relative Frequency: Cumulative frequency
Total sample size

The cumulative relative frequency in the last cell must be


GE Company Proprietary June, 1998
Field Training/Basic Statistics 13
Observations Vary
- Differences are expected
- Variation is due to:
People
Process
Material
Measuring instrument (gauge repeatability
and reproducibility)
Seasonality
Others . . . . _________
_________
_________
There is inherent variability in even a very good product.
This can be detected if you have a measurement which
is sensitive enough to detect that variation.

GE Company Proprietary June, 1998
Field Training/Basic Statistics 14
Profound Statement: Although variation makes our life difficult, through the
magic and wonder of statistics, we can describe the variation, reduce the
variation, and improve the quality of our decisions.
This course concentrates
on descriptive statistics.
Variation often makes our tasks difficult.
Statistics deals with variation.
- Descriptive statistics ( describing variation with graphs
and summary values )
- Inference (decision making and predicting in the
presence of uncertainty)
- Design of experiments (methods of collecting data
to improve the quality of decisions)

GE Company Proprietary June, 1998
Field Training/Basic Statistics 15
What is the Value of Graphing Data?
1. __________________________________

2. __________________________________

3. __________________________________

4. __________________________________
GE Company Proprietary June, 1998
Field Training/Basic Statistics 16
- A visual picture of data
- Shape of observations (bell-shaped?)
- Spread
- Most frequent values
- Highest and lowest value

Information You Can Gather from Graphing a Set of Data
GE Company Proprietary June, 1998
Field Training/Basic Statistics 17
Populations and Samples
- Population all items of interest
- Sample subset of data from the population
- Random sample every item in the population has an equal chance of
being in the sample
- We might be interested in the population of all range endcaps used in
1996, but we are basing our decision on a sample of n = 30 range
endcaps.

GE Company Proprietary June, 1998
Field Training/Basic Statistics 18
We measure a sample in
order to study the population.
Populations and Samples
- We use different symbols for population and sample values.
Population Sample
Average (mu) x (x-bar) or (mu hat)
Standard Deviation o (sigma) s or o (sigma hat)
(spread of the data)
Variance o
2
s
2
or o
2


^
^
^
GE Company Proprietary June, 1998
Field Training/Basic Statistics 19

mean = X = X bar




X = X
1
+ X
2
+ X
3
+. . . + X
n
=

X i


n
n
i =1
n
- The mean is influenced by extreme values
- The mean is the most commonly used measure
of the center of the distribution
Sample:
Mean (Average) Formula
GE Company Proprietary June, 1998
Field Training/Basic Statistics 20
X Calculation
X 1 =
X 2 =
X 3 =
X 4 =
X 5 =
X 6 =
X 7 =
X 8 =
X 9 =
X 10 =
X 11 =
X 12 =
X 13 =
X 14 =
X 15 =
X 16 =
X 17 =
X 18 =
X 19 =
X
20
=
n = X
i
= X
i

n
=
Use the data from the coin chart
GE Company Proprietary June, 1998
Field Training/Basic Statistics 21
Example : $60,000 $80,000 $100,000 $120,000 $1,640,000

Median = $100,000 Average = $400,000
Median
- The middle value
- Not influenced by extreme values
- Applicable to income and housing prices
because of the extreme values.

GE Company Proprietary June, 1998
Field Training/Basic Statistics 22
Calculating the Median
Place the values in order and select the middle value
Odd number of observations

The median is the ordered value

Example :

Given the numbers

60,000 80,000 100,000 120,000 1,640,000

n = 5

n + 1
2
5 + 1
2
6
2

The 3rd. ordered value is 100,000
=
=
n + 1
2
= 3rd. ordered value

=
Even number of observations

The median is the average of the two middle values:

The n and n ordered values
2 2
Example:

Given the numbers

60,000 80,000 100,000 120,000 160,000 1,640,000

n = 6

n and n ordered value
2 2

6 and 6 3rd. and 4th. ordered values
2 2

median =
+ 1
+ 1
100,000 + 120,000
2
= 100,000
+ 1
GE Company Proprietary June, 1998
Field Training/Basic Statistics 23
- Coin data, n = _____________
- List the values in order
- Even or odd number of data
- If odd, calculate which ordered observation is the median ( )
- If even, calculate which ordered observations are used for the median
( + 1 )
- Find the middle number(s)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
n + 1
2
Median Calculation
n n
2 2
and
GE Company Proprietary June, 1998
Field Training/Basic Statistics 24
$
Thought Provoker
Consider the incomes of all GEA employees:
- Would the mean and the median be the same?
Which would be greater? Why?
Which measure would be appropriate to use?
Would any measure of the center tell you all you want to know
about incomes of GEA employees?
GE Company Proprietary June, 1998
Field Training/Basic Statistics 25
Range is a measurement of spread,
but uses two observations only.
Range
Range = Largest value minus smallest value
Coin data :
Largest =
Smallest =
Range =
GE Company Proprietary June, 1998
Field Training/Basic Statistics 26
Standard Deviation and Variance
- The standard deviation is a measure of the spread of the data
Population Standard Deviation = o (sigma)
Sample Standard Deviation = s
- The variance is the square of the standard deviation
Population Variance = o
2

Sample Variance = s
2
GE Company Proprietary June, 1998
Field Training/Basic Statistics 27
s =

n
i = 1
( Xi - X)
2
n - 1
The Standard Deviation Formula
- Variance and standard deviation use all the observations to determine
the spread
- The range and standard deviation are both sensitive to extreme values
- The standard deviation is useful when the distribution is normal
- We divide by (n - 1) to make an unbiased estimate of the standard
deviation
- Dividing by n tends to give a low estimate
- The square of a negative number is positive: Example (-5)
2
= 25
- If the spread of numbers is large, then s will be large

GE Company Proprietary June, 1998
Field Training/Basic Statistics 28
Calculating the Standard Deviation
n
( X
i -
X )
2
s = o =
i = 1
=
n - 1
^
X = =
i = 1
=
n
^
X
i
n
X i ( X i - X ) ( X i - X )
2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

GE Company Proprietary June, 1998
Field Training/Basic Statistics 29
Results of a computer experiment
19
14
9
4
1 5 0
1 0 0
5 0
0
F

r

e

q

u

e

n

c

y

1. Flip a coin 20 times.

2. Record the number of heads.

3. Repeat steps 1and 2 500 times,
to represent 500 people.

4. Graph the results.
GE Company Proprietary June, 1998
Field Training/Basic Statistics 30
The Normal Distribution

- Many data sets follow a normal distribution (Gaussian, bell-shaped curve).
- Many data sets do not follow a normal distribution, especially when the spec
is one sided bounded by zero (example: average # of delivery days).
- The Central Limit Theorem states that the sum or average of many values
often follows a normal distribution. In the coin flipping example, we are
adding the results of 20 individual trials.
- Always graph the data to see if a normal distribution is reasonable.
- The assumption of a normal distribution is critical when using the normal
tables (Z tables).

GE Company Proprietary June, 1998
Field Training/Basic Statistics 31

Forming the Normal Curve
Units of Measure
GE Company Proprietary June, 1998
Field Training/Basic Statistics 32
68.3%
95.4%
99.7%
99.99999975%
- 6o - 5o - 4o - 3o -2o - 1o + 1o + 2o + 3o + 4o + 5o + 6o
Normal Curve
GE Company Proprietary June, 1998
Field Training/Basic Statistics 33
110 100 90 80 70 60 50 40 30 20
100
50
0
C1
F
r
e
q
u
e
n
c
y
Comparison of Distributions.
Sketch in the means and medians on each distribution.
Negative Skew Positive Skew
Symmetrical Distribution
80 70 60 50 40 30 20 10 0
300
200
100
0
C3
F
r
e
q
u
e
n
c
y
Comparison of Distributions.
Tail
130 120 110 100 90 80 70 60
300
200
100
0
C2
F
r
e
q
u
e
n
c
y
Comparison of Distributions.
Tail
Different Distributions
Used With Permission
AlliedSignal 1995 - Dr. Steve Zinkgraf
Centering
GE Company Proprietary June, 1998
Field Training/Basic Statistics 34
- Property 1: A normal distribution can be described completely by
knowing only the:
mean, and
standard deviation

Distribution One
Distribution Two
Distribution Three
What is the difference among these three normal distributions?
Used With Permission
AlliedSignal 1995 - Dr. Steve Zinkgraf
The Normal Distribution
Note: Means for all three
distributions are equal
Mean
GE Company Proprietary June, 1998
Field Training/Basic Statistics 35
Used With Permission
AlliedSignal 1995 - Dr. Steve Zinkgraf
The Normal Distribution
- The normal distribution is a distribution of data which has certain
consistent properties
- These properties are very useful in our understanding of the
characteristics of the underlying process from which the data were
obtained
- Most natural phenomena and man-made processes are distributed
normally, or can be represented as normally distributed
GE Company Proprietary June, 1998
Field Training/Basic Statistics 36
Upper Specification Limit (USL)
Target Specification (T)
Lower Specification Limit (LSL)
Mean of the distribution () or x
Standard Deviation of the distribution (o) or s

The Standard Deviation

1o
T USL
p(d)
Upper Specification Limit (USL)
Target Specification (T)
Lower Specification Limit (LSL)
Mean of the distribution () or x
Standard Deviation of the distribution (o) or s
3o
The distance between the point of inflection and
the mean constitutes the size of a standard
deviation. If three such deviations can be fit
between the target value and the specification
limit, we would say the process has three sigma
capability.
Used With Permission
6 Sigma Academy Inc. 1995
Note:
s = the Std Deviation of a
Sample
o = the Std Deviation of a
Population
Spread
GE Company Proprietary June, 1998
Field Training/Basic Statistics 37
The Mean
1o
USL
6o
If six standard deviations
can be fit between the mean
and the specification limit,
we would say the process
has six sigma capability.
Six Sigma
Upper
Specification Limit
Process Capability
GE Company Proprietary June, 1998
Field Training/Basic Statistics 38
Example for on-time delivery
m = 8 AM on
Requested Date
USL = 4 PM on
Requested
Date
6o = 8 hours
LSL = 8 AM on
Day Prior to
Requested Date
1o = 1.33 Hours
3s = 4 hours
Six Sigma
GE Company Proprietary June, 1998
Field Training/Basic Statistics 39
2 5 2 0 1 5 1 0 5 0
8 0
7 0
6 0
5 0
S a m p l e N u m b e r
S
a
m
p
l
e

M
e
a
n

X - B a r C h a r t f o r P r o c e s s B
X = 7 0 . 9 8
U C L = 7 7 . 2 7
L C L = 6 4 . 7 0
Process Capability
Measures of process capability show us:
- The results (output) of our process over time
- When something has changed in our process
- When our process may be statistically out of control

GE Company Proprietary June, 1998
Field Training/Basic Statistics 40
Measurement
Time
All Processes Have Variation
- All repetitive activities of a process have a certain amount of fluctuation
- Input, process and output measures will fluctuate
- This fluctuation is called variation
GE Company Proprietary June, 1998
Field Training/Basic Statistics 41
Machines

Materials

Methods

Measurement

Mother Nature

People
P
R
O
C
E
S
S
Sources Of Variation
GE Company Proprietary June, 1998
Field Training/Basic Statistics 42


Variation
- All variation is caused
- There are two major classifications of causes
Common Cause
normal, day-to-day, predictable variation in a process
Special cause
unusual circumstances generating unpredictable variation
Variation is the voice of the process
learn to listen and understand it
GE Company Proprietary June, 1998
Field Training/Basic Statistics 43
Common
Causes
Special
Causes
Common and Special Causes
- Common to all occasions and places
- Degree of presence varies
- Each cause contributes a small effect to the
variation in results
- Variation due to common cause will almost
always give results that are in statistical control

- Temporary or local; specific
- May come and go sporadically
- Evidence of the lack of statistical control is a
signal that a special cause is likely to have
occurred

GE Company Proprietary June, 1998
Field Training/Basic Statistics 44
Special Causes
2 5 2 0 1 5 1 0 5 0
7 5
7 0
6 5
S a m p l e N u m b e r
S
a
m

p
l
e

M
e
a
n

X - B a r C h a r t f o r P r o c e s s A
X = 7 0 . 9 1
U C L = 7 7 . 2 0
L C L = 6 4 . 6 2
2 5 2 0 1 5 1 0 5 0
8 0
7 0
6 0
5 0
S a m p l e N u m b e r
S
a
m
p
l
e

M
e
a
n

X - B a r C h a r t f o r P r o c e s s B
X = 7 0 . 9 8
U C L = 7 7 . 2 7
L C L = 6 4 . 7 0
Used With Permission
AlliedSignal 1995 - Dr. Steve Zinkgraf
Variation
- While every process displays Variation, some processes display controlled
variation, while other processes display uncontrolled variation (Walter Shewhart).
- Controlled Variation is characterized by a stable and consistent pattern of
variation over time. Associated with Common Causes.
- Uncontrolled Variation is characterized by variation that changes over time.
Associated with Special Causes.
- Process A shows controlled variation.
- Process B shows uncontrolled variation
GE Company Proprietary June, 1998
Field Training/Basic Statistics 45
- There will always be variability present in any process
- We can tolerate variability if:
The total variability of the output is relatively small compared
to the process specifications and the process is on target
The process is stable over time

LSL USL Nom
USL
LSL USL Nom
Acceptable
C
o
s
t

C
o
s
t

New
Traditional
Goal Post
Mentality
Used With Permission
AlliedSignal 1995 - Dr. Steve Zinkgraf
Can We Tolerate Variability?
GE Company Proprietary June, 1998
Field Training/Basic Statistics 46
Used With Permission
AlliedSignal 1995 - Dr. Steve Zinkgraf
Process Stability
- Determine if process is stable
If process is not stable, identify and
remove causes of instability
- Determine the location of the process mean.
Is it on target?
If not, identify the variables which affect
the mean and determine optimal settings
to achieve target value
- Estimate the magnitude of the total variability.
Is it acceptable with respect to the customer
requirements (spec limits)?
If not, identify the sources of the variability
and eliminate or reduce their influence on
the process

GE Company Proprietary June, 1998
Field Training/Basic Statistics 47
From a statistical point of view, there are only two problems . . .
It has too much spread It needs centering
x
x
x
x
x
xx
xx
x
x
x
x
x
xxx
xxx
xx xx
xx
Lets take a look at both . . .
Centering and Spread
GE Company Proprietary June, 1998
Field Training/Basic Statistics 48
Inherent Capability of the
Process
General Assumptions:
Over time, a typical process
will shift and drift by approx.
1.5o
also called short-term capability
Time 1
Time 2
Time 3
Time 4
T LSL USL
Sustained Capability of the
Process also called long-term capability
Visualizing the Process Dynamics Is the Process Stable ?
Used With Permission
6 Sigma Academy Inc. 1995
GE Company Proprietary June, 1998
Field Training/Basic Statistics 49
Poor Process
Capability
LSL
USL
Very High
Probability
of Defects
Very High
Probability
of Defects
LSL
USL
Excellent
Process
Capability
Very Low
Probability
of Defects
Very Low
Probability
of Defects
Used With Permission
6 Sigma Academy Inc. 1995
Is the Variability Acceptable to the Customer?
Note: Specification limits (LSL and
USL) must be defined using
customer input!
GE Company Proprietary June, 1998
Field Training/Basic Statistics 50
a
Spec Limit
Probably
of a Defect
GE Company Proprietary June, 1998
Field Training/Basic Statistics 51
High Probability
of Defects
Poor Design
Capability
High
Probability of
Defects
LSL USL
Low
Probability
of Defects
Low
Probability
of Defects
LSL USL
The Normal Curve and Capability

GE Company Proprietary June, 1998
Field Training/Basic Statistics 52
3o Capability Historical Standard
4o Capability Current Standard
6o Capability New Standard
Sigma Area Spelling Money Time Distance
3 o A floor space 1.5 misspelled words $2.7 Million indebtedness 3 1/2 months Coast-to-coast
of a small hardware per page in a book per $1 billion in assets per century trip
store.
4 o A floor space of a 1 misspelled word per $63,000 indebtedness per 2 1/2 days per 45 minutes of
typical living room 30 pages in a book $1 billion in assets century freeway driving
5 o Size of the bottom 1 misspelled word in $570 indebtedness per 30 minutes per A trip to the
of your telephone a set of encyclopedias $1 billion in assets century local gas station
6 o Size of a typical 1 misspelled word in all $2 indebtedness per 6 seconds per 4 steps in any
diamond of the books contained $1 billion in assets century direction
in a small library
Understanding the Differences
GE Company Proprietary June, 1998
Field Training/Basic Statistics 53
- Calculate X (mean)
_______________
X =
- Calculate s (std. deviation)
_______________
s = o =
Group Project
Create a histogram of the height of class attendees.
Calculate mean and standard deviation.
X
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
i
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
X
i
i
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
X
i
i
X
i


n
^ X
i
X
n 1
GE Company Proprietary June, 1998
Field Training/Basic Statistics 54
Frequency
Height

You might also like