You are on page 1of 3

quality

control
The Histogram as a Measurement
of Process Consistency

he histogram is one of the seven


basic tools of quality control
used to summarize, display and analyze process data. Karl Pearson,
18571936, introduced it as a way of
showing the probability distribution
of a continuous variable.
The derivation of the word histogram is uncertain. Sometimes it is
said to be derived from the Greek
histos meaning anything set
upright (as the masts of a ship, the
bar of a loom, or the vertical bars of a
histogram); and gramma, i.e.,
'drawing, record, writing. It is also
said that Karl Pearson derived the
name from historical diagram.
A histogram consists of tabular frequencies, shown as adjacent rectangles, erected over discrete intervals,
with an area equal to the frequency
of the observations in the interval.
The height of a rectangle is also
equal to the frequency density of the

interval, i.e., the frequency divided by


the width of the interval. The total
area of the histogram is equal to the
number of data. A histogram may
also be normalized displaying relative frequencies. It then shows the
proportion of cases that fall into
each of several categories, with the
total area equaling 1. The categories
are usually specified as consecutive,
non-overlapping intervals of a variable. The categories (intervals) must
be adjacent, and often are chosen to
be of the same size. The rectangles of
a histogram are drawn so that they
touch each other to indicate that the
original variable is continuous.
The ordinary histogram shows the
number of datum per unit interval
so that the height of each bar is equal
to the proportion of total data that
falls into that category. The area
under the curve represents the total
number of data. This histogram

Ordinary histogram

6000
0

2000

4000

1000

Frequency

1500

8000

2000

10000

Cumulative histogram

500

Frequency

shows absolute numbers, with the


frequency in thousands.
In Figure 1, the histogram on the
right differs from the one on the left
in that it shows the data cumulativelyand the total area of all the bars is
equal 100%. The curve displayed is a
simple density estimate.
In other words, a histogram represents a frequency distribution by
means of rectangles whose widths
represent class intervals and whose
areas are proportional to the corresponding frequencies. The intervals
are placed together in order to show
that the data represented by the histogram, while exclusive, is also continuous. (For example, in a histogram it is possible to have two connecting intervals of 10.520.5 and
20.533.5, but not two connecting
intervals of 10.520.5 and 22.532.5.
Empty intervals are represented as
empty and not skipped.)
Histograms are used to plot density of data, and often for density estimation: estimating the probability
density function of the underlying
variable. The total area of a histogram used for probability density
is always normalized to 1. Since the
sum of the intervals on the x-axis is
always 1, histograms are identical to
relative frequency plots.
Above are examples of ordinary
and cumulative histograms of the
same data. The data shown is a random sample of 10,000 points from a

-4

-2

-4

rnorm (1000)

-2

rnorm (1000)

Figure 1. Both histograms use the same data, the difference is in how the data is presented.

36 I metalfinishing I September 2012

www.metalfinishing.com

500
0

Frequency

1000

1500

qualitycontrol

-6

-4

-2

Figure 2. Bimodal Histogram.

normal distribution with a mean of 0


and a standard deviation of 1.

SHAPE OR FORM OF A
DISTRIBUTION
The shape of a histogram provides
important information about the
data distribution. The histogram is
may be highly or moderately skewed
to the left or right. A symmetrical
shape is also possible, although a
histogram is never perfectly symmetrical. If the histogram is skewed to
the left, or negatively skewed, the tail
extends further to the left.
The mode of a distribution is that
value which is most frequently
occurring or has the largest probability of occurrence. The sample mode
occurs at the peak of the histogram.
For many phenomena, it is quite
common for the distribution of the
response values to cluster around a
single mode (unimodal) and then
distribute themselves with lesser frequency out into the tails. The normal
distribution is the classic example of
a unimodal distribution.
The histogram shown in Figure 2
illustrates data from a bimodal (2
peak) distribution. The histogram
serves as a tool for diagnosing problems
such
as
bimodality.
Questioning the underlying reason
for distributional non-unimodality
frequently leads to greater insight
and improved deterministic modelwww.metalfinishing.com

ing of the phenomenon under study.


For example, for the data presented
above, the bimodal histogram is
caused by a lack of uniformity in the
data.
An example of a distribution
skewed to the left might be the relative frequency of exam scores. Most
of the scores are above 70 percent
and only a few low scores occur. An
example for a distribution skewed to
the right or positively skewed is a histogram showing the relative frequency of housing values. A relatively
small number of expensive homes
create the skeweness to the right. The

tail extends further to the right. The


shape of a symmetrical distribution
mirrors the skeweness of the left or
right tail. For example, the histogram of data for IQ scores.
Histograms can be unimodal, bimodal or multi-modal, depending
on the dataset.
A truncated histogram ends
abruptly at one end, which indicates
possible sorting or inspection of
non-conforming parts. This may also
mean that part of the distribution
has been removed by screening, 100
% inspection or review. Such practices are usually costly and are good
candidates for improvement efforts.
Plateau Histograms. A nearly flat or
plateau-like histogram often means
that the process is not well defined
or understood by those doing the
work or inspection. Since individuals run the process in different ways,
there are a great many different
measurements and none that stand
out. The solution is to more clearly
define the process and/or piece part
parameters.
The plateau might be called a
multimodal distribution. Several
processes with normal distributions
are combined. Because there are
many peaks close together, the top of
the distribution resembles a plateau.
Number of cells and width. There is

Positive Skewed

Skewed Histogram
Negative Skewed

Figure 3. Skewed Histograms.

September 2012 I metalfinishing I 37

qualitycontrol
Platykurtic

Figure 4. Truncated, or cliff-like, Histogram.

Leptokurtic
Figure 6. Illustration of Kurtosis.

Figure 5. Plateau-like Histogram.

no best number of cells, and different cell sizes can reveal different
features of the data. Some theoreticians have attempted to determine
an optimal number of cells, but
these methods generally make
strong assumptions about the shape
of the distribution. Depending on
the actual data distribution and the
goals of the analysis, different cell
widths may be appropriate, so experimentation is usually needed to
determine an appropriate width.
There are, however, various useful
guidelines and rules of thumb.
Most engineers favor setting the
number of cells somewhere between
11 and 17, but always an odd number. The later point is important so
that the mid-point of the distribution is not split between two cells. It
is also a good rule, when using measurement data, to set the cell limits a
point halfway between the number
of decimal points of the most precise
data. Consider what happens where a
cell is 4 to 8 and the next cell 8 to 12.
A reading of 8 could fall in either cell,
hence the rule.
Kurtosis. In probability theory and
statistics, kurtosis is derived from the
Greek word meaning bulging is any
measure of the peakedness of the
38I metalfinishing I September 2012

probability distribution of a
real-valued random variable. In
a similar way to
the concept of
skewness, kurtosis is a descriptor of the shape of a
probability distribution and, just as
for skewness, there are different ways
of quantifying it for a theoretical distribution and corresponding ways of
estimating it from a sample from a
population.
One math-based common measure
of kurtosis, originating with Karl
Pearson, is based on a scaled version
of the fourth moment of the data or
population, but it has been argued
that this measure really measures
heavy tails, and not peakedness. For
this measure, higher kurtosis means
more of the variance is the result of
infrequent extreme deviations, as
opposed to frequent modestly sized
deviations. It is common practice to
use an adjusted version of Pearsons
kurtosis, the excess kurtosis, to provide a comparison of the shape of a
given distribution to that of the normal distribution. Distributions with
negative or positive excess kurtosis
are called platykurtic or leptokurtic
distributions, respectively. When a
curve, or histogram, is compared to a
normal distribution, a platykurtic
data set has a flatter peak around its
mean, which causes thin tails within
the distribution.
Leptokurtic is a description of the
kurtosis in a distribution in which

the statistical value is positive.


Leptokurtic distributions have higher peaks around the mean compared
to normal distributions. The
Japanese scientist, Genechi Taguchi,
argued that the goal of manufacturing should not be to simply produce
product within the specification,
but rather the goal should be to produce product as close to nominal as
possible. He argued that any deviation from nominal has a cost.
There isnt space in this column to
fully explain this ideasuffice to say
that a leptokurtic distribution will
produce superior product. There is a
greater difference between a part
produced near the statistical design
limit in a process producing a
platykurtic distribution and one
with a leptokurtic distribution.
The Taguchi Principle is the basic
upon which six-sigma theory and
practice are based.

BIO
Leslie W. Flott, Ph.B., CQE, ASQ Fellow,
is certified as an IDEM Wastewater
Treatment Operator and Indiana
Wastewater Treatment Operator. He
received his Bachelor of Science Degree in
Chemistry
from
Northwestern
University and his Masters Degree in
materials engineering from Notre Dame
University. Most recently, Flott served as
the environmental program director and
instructor at Ivy Tech Community
College. Prior to that, he was the health,
environment, and safety manager at
Wayne Metal Protection Company.
www.metalfinishing.com

You might also like