You are on page 1of 72

IBM-09: Six Sigma Tools and

Techniques
Measure Phase
Introduction to Basic Statistics

A.RameshPhD
DepartmentofManagementStudies
IndianInstituteofTechnologyRoorkee
ramesh.anbanandam@gmail.com

Measure Phase Basic Statistics


What is Data?:
Refer to facts usually collected as the result of experience, observation or
measurement
Consist of numbers, words, or images
Lowest level of abstraction from which information and knowledge are
derived

DATA

Information

Knowledge

I believe in God - Rest bring data! A famous quote

Measure Phase Basic Statistics


WHAT DO THESE WORDS & NUMBERS MEAN TO YOU ?

ROOF

WIRES
TREE

TREE
FLOWERS

SKY
GLASS

LAMP
POLE

GRASS

TILES

WINDOWS

2.87, 2.85, 2.88, 2.85, 2.86, 2.85,


2.81, 2.82, 2.83, 2.85, 2.84, 2.84,
2.85, 2.86, 2.85, 2.84, 2.85, 2.85,
2.87, 2.81, 2.85, 2.82, 2.83, 2.85,
2.85, 2.86, 2.85, 2.86, 2.89, 2.85,
2.84, 2.84, 2.85, 2.85, 2.83, 2.82,
2.86, 2.83, 2.85, 2.86, 2.85, 2.84,
2.84, 2.87, 2.85, 2.86, 2.85, 2.84,
2.90, 2.88

PROBABLY NOTHING
MUCH !

Measure Phase Basic Statistics


How about following pictures?

Statistics helps in summarizing and understanding the data

CanStatisticsBeTrusted?
Therearethreekindsoflies:
Lies,damnedlies,andstatistics.
MarkTwain

Itiseasytoliewithstatistics.Butitiseasierto
liewithoutthem.
FrederickMosteller

Figureswontliebutliarswillfigure.
CharlesGrosvenor
ramesh.anbanandam@gmail.com

Statistics..
Plays an important role in many facets of
human endeavour
Occurs remarkably frequently in our everyday
lives
It is often incorrectly thought of as just a
collection of data, graphs and diagrams

ramesh.anbanandam@gmail.com

What is Statistics?
Science of gathering, analyzing, interpreting, and
presenting data
Branch of mathematics
Facts and figures
Statistics is the scientific method that enables us to
make decisions as responsibly as possible.

ramesh.anbanandam@gmail.com

Statistics: Science of variability..?


Virtually everything varies
Variation occurs among individuals
Variation occurs within any one individual as
time passes

ramesh.anbanandam@gmail.com

Measure Phase Basic Statistics


Concept of Variation:
Variation is natural and can not be avoided
Customer experiences the variation not the average.
Lower the variation, better the customer experience
What Customer wants

What
customer
experiences
(Variation)
What we measure
(Average)

Measure Phase Basic Statistics


Sources of Variation:

Any changes to the above factors would directly impact the


process performance and causes for variation

Measure Phase Basic Statistics


Variation Example:
A man wants to reach his work
place exactly by 7:00 a.m

6.55

7.05

One day he reaches little earlier


(6:55)
Another day he reaches little late
(7:05)

Can we identify cause for this


variation?

No ! We may not be !
It may be affected by factors which
Affects the time he takes to travel
He cannot control
Vary randomly
E.g..) The normal traffic he encounters
under normal course of travel

These Variation is called as


Inherent Variation or
Common Cause Variation or
White Noise

Measure Phase Basic Statistics


Variation Example:
TODAY HE IS VERY
EARLY !

6.00

Can We find out what is cause


for this?

YES. We Can !
It may be because of some specific
circumstances which do not occur in
the normal scheme of actions.
E.g..)
His watch was running fast
He got a lift
He had a Client Call
He had some important work to be
finished before 7.30

These variations are called as


Special Cause Variation or
Black Noise

Measure Phase Basic Statistics


Reacting to common cause vs. special cause:

Measure Phase Basic Statistics


Sampling:
Sampling is the process of:
Collecting only a portion of the data that is available or could be available,
and drawing conclusions about the total population (statistical inference)

Population
x

x x
x
x
x
x x
x
x x
x
x x

Sample
x
x x

x
x
x

n = 3 days
N = 567 days
What is the Average
Resolution time?

From the sample,


we infer that the
average resolution
time (x) is 1.2 days

*Within a certain
confidence band or
margin of error

Population Versus Sample


Population the whole
a collection of persons, objects, or items under study
The entire group of individuals in a statistical study we
want information about.

Census gathering data from the entire population


Sample a portion of the whole
a subset of the population
a part of the population from which we actually collect
information, used to draw conclusions about the whole
(statistical inference

ramesh.anbanandam@gmail.com

16

Statistics can be split into two broad


categories
1. Descriptive statistics
2. Statistical inference

ramesh.anbanandam@gmail.com

17

Descriptive Statistics
Collect data
ex. Survey

Present data
ex. Tables and graphs

Characterize data

ex. Sample mean = n

Descriptive statistics..
Encompasses the following:
Graphical or pictorial display
Condensation of large masses of data into a form
such as tables
Preparation of summary measures to give a
concise description of complex information (e.g.
an average figure)
Exhibition of patterns that may be found in sets of
information
ramesh.anbanandam@gmail.com

19

Inferential Statistics
Estimation
ex. Estimate the population
mean weight using the
sample mean weight

Hypothesis testing
ex. Test the claim that the
population mean weight is
120 pounds
Drawingconclusionsand/ormakingdecisionsconcerninga
population basedonsample results.

Inferential Statistics..
Especially relates to:
Determining whether characteristics of a situation
are unusual or if they have happened by chance
Estimating values of numerical quantities and
determining the reliability of those estimates
Using past occurrences to attempt to predict the
future

ramesh.anbanandam@gmail.com

21

Process of Inferential Statistics

Population

Calculate x
to estimate

Sample
x
(statistic )

(parameter)

Select a
random sample
ramesh.anbanandam@gmail.com

22

Population vs. Sample


Population

Measures used to describe the


population are called parameters

Sample

Measures computed from


sample data are called statistics

Parameter vs. Statistic


Parameter descriptive measure of the
population
Usually represented by Greek letters

Statistic descriptive measure of a sample


Usually represented by Roman letters

ramesh.anbanandam@gmail.com

24

Symbols for Population Parameters


denotes population parameter

denotes population variance

denotes population standard deviation

ramesh.anbanandam@gmail.com

25

Symbols for Sample Statistics


x denotes sample mean

denotes sample variance

S denotes sample standard deviation

ramesh.anbanandam@gmail.com

26

Types of Variables
Categorical (qualitative) variables have values
that can only be placed into categories, such as
yes and no.
Numerical (quantitative) variables have values
that represent quantities.

Types of Variables
Data

Categorical

Numerical

Examples:

MaritalStatus
PoliticalParty
EyeColor
(Definedcategories)

Discrete
Examples:

NumberofChildren
Defectsperhour
(Counteditems)

Continuous
Examples:

Weight
Voltage
(Measuredcharacteristics)

Levels of Data Measurement

Nominal Lowest level of measurement


Ordinal
Interval
Ratio Highest level of measurement

ramesh.anbanandam@gmail.com

29

Levels of Measurement
A nominal scale classifies data into distinct
categories in which no ranking is implied.
CategoricalVariablesCategories
PersonalComputerOwnership
TypeofStocksOwned

Yes / No
Growth Value Other

InternetProvider
MicrosoftNetwork/AOL

LevelsofMeasurement
An ordinal scale classifies data into distinct
categories in which ranking is implied
CategoricalVariable

OrderedCategories

Student class designation

Freshman, Sophomore, Junior,


Senior

Product satisfaction

Satisfied, Neutral, Unsatisfied

Faculty rank

Professor, Associate Professor,


Assistant Professor, Instructor

Standard & Poors bond ratings

AAA, AA, A, BBB, BB, B, CCC, CC,


C, DDD, DD, D

Student Grades

A, B, C, D, F

Levels of Measurement
An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point.
A ratio scale is an ordered scale in which the
difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.

Interval and Ratio Scales

Usage Potential of Various


Levels of Data
Ratio
Interval
Ordinal
Nominal

ramesh.anbanandam@gmail.com

34

Data Level, Operations,


and Statistical Methods
DataLevel

MeaningfulOperations

Statistical
Methods

Nominal

ClassifyingandCounting

Nonparametric

Ordinal

AlloftheaboveplusRanking

Nonparametric

Interval

AlloftheaboveplusAddition,
Subtraction

Parametric

Ratio

Alloftheaboveplus
multiplicationanddivision

ramesh.anbanandam@gmail.com

Parametric

35

Data preparation rules


Data presented must be
factual
relevant

Before presentation always check:


the source of the data
that the data has been accurately transcribed
the figures are relevant to the problem
ramesh.anbanandam@gmail.com

36

Methods of visual presentation of data


Table

East
West
North

1st Qtr
2nd Qtr 3rd Qtr
4th Qtr
20.4
27.4
90
20.4
30.6
38.6
34.6
31.6
45.9
46.9
45
43.9

ramesh.anbanandam@gmail.com

37

Methods of visual presentation of data


Graphs
90
80
70
60
East
West
North

50
40
30
20
10
0

1st Qtr

2nd Qtr

3rd Qtr

ramesh.anbanandam@gmail.com

4th Qtr

38

Methods of visual presentation of data


Pie chart

1st Qtr
2nd Qtr
3rd Qtr
4th Qtr

ramesh.anbanandam@gmail.com

39

Methods of visual presentation of data


Multiple bar chart
4th Qtr
3rd Qtr

North
West
East

2nd Qtr
1st Qtr
0

20

40

ramesh.anbanandam@gmail.com

60

80

100

40

Methods of visual presentation of data


Simple pictogram

100
80
60
40

North

20
0

East
1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

ramesh.anbanandam@gmail.com

West

41

Frequency distributions
Frequency tables
Class Interval
< 20
<40
<60
<80
<100

Observation Table
Frequency
Cumulative Frequency
13
13
18
31
25
56
15
71
9
80

ramesh.anbanandam@gmail.com

42

Frequency diagrams
Frequency
30
Frequency

25
20

Cumulative Frequency

15

90
80
70
60
50
40
30
20
10
0

10
5
0
< 20

<40

<60

<80

<100

Cumulative Frequency

< 20

<40

<60

<80

<100

Frequency
30
25
20
Frequency

15
10
5
0
< 20

<40

<60

<80

<100

ramesh.anbanandam@gmail.com

43

Ungrouped Versus Grouped Data


Ungrouped data
have not been summarized in any way
are also called raw data

Grouped data
have been organized into a frequency
distribution

ramesh.anbanandam@gmail.com

44

Example of Ungrouped Data


42

26

32

34

57

30

58

37

50

30

53

40

30

47

49

50

40

32

31

40

52

28

23

35

25

30

36

32

26

50

55

30

58

64

52

49

33

43

46

32

61

31

30

40

60

74

37

29

43

54

Ages of a Sample of
Managers from
XYZ

ramesh.anbanandam@gmail.com

45

Frequency Distribution of Ages


Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

Frequency
6
18
11
11
3
1

ramesh.anbanandam@gmail.com

46

Data Range
42

26

32

34

57

30

58

37

50

30

53

40

30

47

49

50

40

32

31

40

52

28

23

35

25

30

36

32

26

50

55

30

58

64

52

49

33

43

46

32

61

31

30

40

60

74

37

29

43

54

Range = Largest - Smallest


= 74 - 23
= 51

Smallest
Largest

ramesh.anbanandam@gmail.com

47

Number of Classes and Class Width

The number of classes should be between 5 and 15.


Fewer than 5 classes cause excessive summarization.
More than 15 classes leave too much detail.
Class Width
Divide the range by the number of classes for an approximate
class width

Round up to a convenient number

51
Approximate Class Width = = 8.5
6
Class Width = 10

ramesh.anbanandam@gmail.com

48

Class Midpoint
beginning class endpoint + ending class endpoint
Class Midpoint =
2
30 + 40
=
2
= 35
1
Class Midpoint = class beginning point + class width
2
1
= 30 + 10
2
= 35
ramesh.anbanandam@gmail.com

49

Relative Frequency
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
6
11
50
11
18
3
50
1
50

ramesh.anbanandam@gmail.com

Relative
Frequency
.12
.36

.22
.22
.06

.02
1.00

50

Cumulative Frequency
Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
11
11
3
1
50

Cumulative
Frequency
6
24
18 + 6
35
11 + 24
46
49
50

ramesh.anbanandam@gmail.com

51

Class Midpoints, Relative Frequencies, and


Cumulative Frequencies

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
11
11
3
1
50

Midpoint
25
35
45
55
65
75

ramesh.anbanandam@gmail.com

Relative

Cumulative

Frequency
.12
.36
.22
.22
.06
.02
1.00

Frequency
6
24
35
46
49
50

52

Cumulative Relative Frequencies

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80
Total

Frequency
6
18
11
11
3
1
50

Relative Cumulative
Frequency Frequency
.12
6
.36
24
.22
35
.22
46
.06
49
.02
50
1.00

ramesh.anbanandam@gmail.com

Cumulative Relative

Frequency
.12
.48
.70
.92
.98
1.00

53

Common Statistical Graphs

Histogram -- vertical bar chart of frequencies


Frequency Polygon -- line graph of frequencies
Ogive -- line graph of cumulative frequencies
Pie Chart -- proportional representation for
categories of a whole
Stem and Leaf Plot
Pareto Chart
Scatter Plot

ramesh.anbanandam@gmail.com

54

10
0

Frequency

Class Interval Frequency


20-under 30
6
30-under 40
18
40-under 50
11
50-under 60
11
60-under 70
3
70-under 80
1

20

Histogram

10 20 30 40 50 60 70 80
Years

ramesh.anbanandam@gmail.com

55

10
0

Frequency

Class Interval Frequency


20-under 30
6
30-under 40
18
40-under 50
11
50-under 60
11
60-under 70
3
70-under 80
1

20

Histogram Construction

10 20 30 40 50 60 70 80
Years

ramesh.anbanandam@gmail.com

56

10
0

Frequency

Class Interval Frequency


20-under 30
6
30-under 40
18
40-under 50
11
50-under 60
11
60-under 70
3
70-under 80
1

20

Frequency Polygon

10 20 30 40 50 60 70 80
Years

ramesh.anbanandam@gmail.com

57

40
20
0

Frequency

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

Cumulative
Frequency
6
24
35
46
49
50

60

Ogive

10

20

30

40

50

60

70

80

Years

ramesh.anbanandam@gmail.com

58

Class Interval
20-under 30
30-under 40
40-under 50
50-under 60
60-under 70
70-under 80

Cumulative
Relative
Frequency
.12
.48
.70
.92
.98
1.00

Cumulative Relative Frequency

Relative Frequency Ogive

1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0

10

20

30

40

50

60

70

80

Years

ramesh.anbanandam@gmail.com

59

ComplaintsbyPassengers
COMPLAINT

NUMBER

PROPORTION

DEGREES

Stations,etc.

28,000

.40

144.0

Train
Performance
Equipment

14,700

.21

75.6

10,500

.15

50.4

Personnel

9,800

.14

50.6

Schedules,
etc.
Total

7,000

.10

36.0

70,000

1.00

360.0

ramesh.anbanandam@gmail.com

60

ComplaintsbyPassengers
Personnel
14%

Schedules,
Etc.
10%

Equipment
15%

Train
Performance
21%

ramesh.anbanandam@gmail.com

Stations, Etc.
40%

61

Company

Second Quarter
Truck
Production

2dQuarter
Truck
Production

357,411

354,936

160,997

34,099

12,747
920,190

ramesh.anbanandam@gmail.com

Totals

62

Second Quarter
Truck Production
17%
4%
1%

39%
39%

ramesh.anbanandam@gmail.com

63

Pie Chart Calculations for Company A

Company
A
B
C

357, 411
=
920,190

2dQuarter
Truck
Production

Degrees

357,411

.388

140

354,936

.386

139

160,997

.175

63

34,099

12,747
920,190

Totals

Proportion

.388 .037
360 =

ramesh.anbanandam@gmail.com

.014
1.000

13
5
360
64

Frequency

Pareto Chart
100

100%

90

90%

80

80%

70

70%

60

60%

50

50%

40

40%

30

30%

20

20%

10

10%

0%
Poor
Wiring

Short in
Coil

Defective
Plug

Other

ramesh.anbanandam@gmail.com

65

ScatterPlot
Gasoline Sales
(1000's of
Gallons)

60

15

120

90

15

140

60

200

Gasoline Sales

Registered
Vehicles
(1000's)

100

0
0

ramesh.anbanandam@gmail.com

10
15
Registered Vehicles

20

66

Principles of Excellent Graphs


The graph should not distort the data.
The graph should not contain unnecessary
adornments (sometimes referred to as chart junk).
The scale on the vertical axis should begin at zero.
All axes should be properly labeled.
The graph should contain a title.
The simplest possible graph should be used for a
given set of data.

Graphical Errors: Chart Junk


BadPresentation

MinimumWage
1960:$1.00

GoodPresentation

MinimumWage

1970:$1.60

2
1980:$3.10

0
1990:$3.80

1960

1970

1980

1990

Graphical Errors:
Compressing the Vertical Axis
BadPresentation
200

QuarterlySales

50

100

25

0
Q1

Q2

Q3

Q4

GoodPresentation
QuarterlySales

Q1

Q2

Q3

Q4

Graphical Errors: No Zero Point on the


Vertical Axis
BadPresentation

GoodPresentations

45

45

42

42

39

39

36

36

MonthlySales

MonthlySales

M J

Graphing the first six months of sales

Thank You
71

http://www.stats.gla.ac.uk/steps/glossary/prese
nting_data.html
http://www.ilir.uiuc.edu/courses/lir593/

You might also like