You are on page 1of 31

STATISTICAL DISTRIBUTIONS

A. CHAPTER OBJECTIVES
B. INTRODUCTION AND OVERVIEW
OF DISTRIBUTIONS
C. BASIC PROBABILITY
D. STATISTICAL DISTRIBUTIONS
E. TRANSFORMING NON-NORMAL
DATA
TO NORMAL DATA
F. SAMPLING DISTRIBUTIONS

13-1

CHAPTER OBJECTIVES

To provide an introduction to the theory of


basic probability.
To provide an understanding of statistical distributions
including Normal, Lognormal, Weibull, Binomial and
Poisson.
To obtain an introduction on how to identify, analyze and
transform non-normal data.
To provide an understanding of sampling distributions
including Chi-Square distribution, t distribution and F
distribution.

13-2

INTRODUCTION AND OVERVIEW OF


DISTRIBUTIONS
A general knowledge of distribution can be helpful when
choosing a good test/analysis strategy to answer
questions.
Probability of a process meeting customer requirements is
closely linked to the distribution of data collected from that
process.
Process data (population data) can usually be
characterized by a probability distribution that is an
assignment of probabilities to all possible values that the
random variable can assume.
Total area under the probability distribution (normal curve)
must equal exactly 1.

13-3

INTRODUCTION cont.
Population of a continuous variable may be represented
as a Normal, Weibull or Lognormal distribution.
From a population, samples can be taken with the
objective of characterizing the process.
A distribution that describes the characteristic of this
sampling is called a sampling distribution (t distribution, F
distribution and Chi-Square distribution).
Information relative to the needs of our customer is
obtained by understanding percentiles of the population.
This is obtained with the shape of the population.
One of the difficult jobs in dealing with probability
distributions is determining the underlying distribution
you will be provided with the tools needed.

13-4

BASIC

PROBABILITY

Data can originate as samples of measurements from production samples or


a business process.
Measurements can be the result of an experimental evaluation.
If an experiment were repeated, we probably would not get the same
response. As a result, we need to assess occurrence probabilities.
Random experiment:
conducting an experiment or making an evaluation where results will not
be the same even though conditions may be identical.
Random experiments can be associated with a flip of the coin and the roll of
the die.
Flip of a coin - 1 in 2 chance flip will result in heads (1/2=.5).
6 sided die - 1 in 6 chance that 2 will occur on single roll of die (1/6=.167).
Difference between roll of die and manufacturing situation is we normally do
not know the underlying population from which we are sampling.

13-5

BASIC PROBABILITY

Suppose a certain customer permits only those


combinations which yield 3, 4, 5, . . . , or 11.

A 2 or 12 represents nonconformance.
We have to apply some fundamental probability theory to
find the probability of not rolling a 2 or 12.

13-6

JOINT PROBABILITY
Likelihood of some event A may be given by P(A).
If some event A is independent of some other event B,
the probability of both A and B occurring is:
P(A and B) = P(A) x P(B).
Joint probability of A and B is multiplicative by nature.
Since a single die has 6 sides, the random chance
probability that any given side will be face up is 1/6=.167
because:
Only 1 side can be up at any given time.
There are a total of 6 possibilities.
Each of 6 possibilities has the same probability of
occurrence.
Probability of rolling any given die combination (ie. 2-6s)
would be .1667 x .1667 = .0278 (2.78%).
13-7

MUTUALLY EXCLUSIVE

When rolling dice, the occurrence of a 2 or 12 cannot happen


concurrently.
The 2 outcomes are mutually exclusive of each other; a 2 and 12
cannot occur at the same time.
When 2 events (A and B) are mutually exclusive the probability of event
A and B occurring is given by the sum of their individual probabilities:
P(A or B) = P(A) + P(B).
There is one way to form a 2 and one way to form a 12. Probability of
not meeting customer requirements is .0278+.0278=.0556(5.56%).
If two events are not mutually exclusive, the joint probabilities have to
be subtracted to avoid double counting:
P(A or B) = P(A) + P(B) - P(A and B)
Probability of 3,4,5,6,7,8,9,10 or 11 is 1-.0556=.9444 (94.44%)
Expected yield expressed as P(Y)=1-[P(A)+P(B)]
We can say the likelihood of customer satisfaction is 94.44%.
13-8

STATISTICAL DISTRIBUTIONS
Distribution referred to as statistical distributions or
probability distributions.
Sum of all the probabilities making up the distribution must
equal 1 since a probability can only be between and
including 0 and 1.

13-9

STATISTICAL DISTRIBUTIONS
0.14
0.12

Discrete
Variable

0.1
0.08
0.06
0.04
0.02
0 0

10

12

14

16

18

20

0.45
0.4
0.35
0.3

Continuous
Variable

0.25
0.2
0.15
0.1
0.05
0
0

10 11 12 13 14 15 16 17 18 19 20

Probability distributions can be represented graphically with range of


possible values of random variables plotted along the x-axis and the
probability of each value on the y-axis.
If a random variable is discrete, distribution will be a histogram
(heights=1).
If a random variable is continuous, distribution will be a continuous
curve (area under curve=1).

13-10

NORMAL DISTRIBUTION
Smooth curve interconnecting
the center of each bar

Center of the bar

Given that 100% of the


area under the normal
curve lies between , we
may calculate that area
which lies beyond the
performance limit. Doing
so would reveal the
random chance probability
of creating a defect.

Units of Measure

Specification
Limit

Area of Yield

Probability
of a Defect

Most important distribution relative to continuous data.


Characterized by bell curve.
Applicable to situations where independent, randomly occurring values are
equally likely to occur on either side of an expected mean and extreme values
are less likely to occur than moderate values.
Can be characterized primarily by the mean and standard deviation of
distribution.
13-11

NORMAL DISTRIBUTION cont.

Examples where data follows normal distribution.


A dimension on a part is critical. The critical dimension is
measured hourly on random samples obtained from a
continuous production process. The measurements on any
given day are noted to follow a normal distribution.
One of our customers orders a product on a regular basis.
The time it takes us to fill these orders are noted to follow a
normal distribution.
Once normally distributed process data is plotted as a histogram,
the midpoints of each bar can be connected to form a smooth
normal probability curve.
By adding spec limits or targets, you can determine the probability
of defects (area under curve beyond spec limits).
Special form of normal distribution is called standard normal, or Z,
which is transformed to yield a mean=0 and standard deviation=1.

13-12


Point of Inflection

CHARACTERISTICS OF THE
NORMAL DISTRIBUTION

1
-

+
68.26%
95.44%
99.73%

Characteristics of the normal distribution relating to its bell-shaped


curve and its symmetric nature are as follows:
As with other probability distributions, the total area under the normal curve is
approximately equal to 1.
Data which is normally distributed may assume any value from - to + .
The Point of inflection of the normal curve, where the curve changes from
convex to concave, corresponds to the point exactly 1 standard deviation from
the mean.
The normal curve is symmetrical and the mean equals the median equals the
mode (cell with highest frequency). Also, 50% of the area under the curve is
located to the right of the mean, while 50% lies to the left.
The normal curve approaches - to + at both extremes (often called tails),
though it never touches the horizontal axis.

13-13

CHARACTERISTICS OF THE NORMAL


CURVE - cont.

For a Normal Distribution with a Mean of and Standard Deviation of ,


then:
68.26% of the data lies between -1 and +1
95.44% of the data lies between -2 and +2
99.73% of the data lies between -3 and +3
99.9937% of the data lies between -4 and +4
99.999943% of the data lies between -5 and +5
99.9999997% of the data lies between -6 and +6

13-14

LOGNORMAL DISTRIBUTION
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
1

10

11

12

13

14

15

16

17

18

19

20

21

Often used in situations where time is the CTQ


characteristic.
Models processes where the relation of the process
variable to time is progressive or non-linear.

13-15

LOGNORMAL DISTRIBUTION - cont.


Examples of situations where the lognormal distribution
would be noticed are as follows:
Failure times of structural components.
A non-repairable device experiences failure through metal
fatigue. Time of failure data from this source often follows
the lognormal distribution.
Lognormal distribution has relationship to normal distribution
since the logarithm of data from above examples would be
normally distributed.
If transformed, data can be analyzed as if they came from a
normal distribution.
Data that is lognormally distributed can usually be
transformed into normally distributed data.

13-16

Parts-Per-Million Goal

APPLICATION OF THE LOG SCALE


7000
6000
5000
4000
3000
2000
1000

Improvement is
difficult to detect

0
-1000

Parts-Per-Million Goal

Linear Scale

-10

10

20
30
40
Forecast Period
(Months from Baseline)

50

60

70

10000

1000

Log Scale

100
10

1
-10

10

20

30

40

50

60

70

Forecast Period
(Months from Baseline)

If we use the linear scale the tendency is to look at the flat


part of the curve and conclude that the improvement gains
are over.
Linear scale graph does not have enough sensitivity. 13-17

SIX SIGMA LEARNING CURVE


PPM
Performance

10000
1000
100
Goal Line
10
1
Reporting Period
(Time)

Log scale enables us to establish a straight goal line that


has a continual improvement rate each period.
Concept is called Six Sigma Learning Curve.

13-18

SIX SIGMA LEARNING CURVE - cont.


70% reduction
in defects each
year

PPM
10000
1000

Area Z

70% reduction
100
in defects each
year
10

Area Y

1
Reporting Period
(Time)

Approach allows us to set goals that have different quality


levels.
Log scale allows you to establish yearly reduction goals.
Approach allows management to hold functions
accountable for their respective rates of improvement.13-19

WEIBULL DISTRIBUTION
Infant
Mortality

Useful
Life

Wearout

Quality Failures
Design Related Failures

Failure
Rate
Time
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1

10

Family of distributions; many of which closely match


distributions such as the normal.
Utilized to transform non-normal data for analysis purposes.
Powerful tool used in reliability engineering (Bathtub Curve).
13-20

BINOMIAL DISTRIBUTION
0.2
0.15
0.1
0.05
0

10

12

14

16

18

20

Useful discrete probability distribution where there are only 2


outcomes in a random experiment (success and failure).
If probability of success is P, then probability of failure is 1-p.

13-21

BINOMIAL DISTRIBUTION EXAMPLES

Product either passes or fails a test; determine the number of defective


units.
Switch functions or does not function; determine the number of defective
switches.
People respond yes or no to a survey question; determine the proportion
of people who answer yes to the question.
Purchase order forms are either filled out correctly or incorrectly;
determine the number of transactional errors.
The appearance of the wallplate is acceptable or unacceptable;
determine the number of unacceptable or acceptable.
When we have a large population and a low probability of success or
failure, the Binomial can be approximated using the Poisson distribution.
Binomial distribution is a more general distribution with wider application.

13-22

POISSON DISTRIBUTION
0.14
0.12
0.1
0.08
0.06
0.04
0.02
00

10

12

14

16

18

20

Poisson is a discrete distribution in which the:


Probability of the occurrence of an event within a very small time
period is very small.
Probability that two or more such events will occur within the same
small time interval is effectively 0.
Probability of the occurrence of the event within one time period is
independent of where that time period is.
Used when number of opportunities for a non-conformance is large and
probability of an event is low.

13-23

POISSON DISTRIBUTION - cont.


Situations where data follow a Poisson distribution.
There are a large number of dimensions on a part that are critical.
Dimensions are measured on a random sample of parts from a
large production process. The number of out-of specification
conditions are noted on each sample. This collective number-offailures information from the samples can often be modeled by
using a Poisson distribution.
Distribution of telephone calls going through a switchboard
system.
Arrival of cars and trucks at a toll booth.
The number of calls arriving a the IM Help Desk.
Good approximation of binomial when sample size is large.
Critical application to Process Yield, Process Capability.

13-24

TRANSFORMING NON-NORMAL DATA TO


NORMAL DATA
Process Capability Analysis for C1
Upper Spec

Data Transformed by
Log

(X)
10

Process Capability Analysis for log


-25
.
0.0

25. 5.0

7.5

10.0

12.5

15.0

17.5

Upper Spec

Original Data (X)

-1.0

-0.5

0.0

0.5

1.0

1.5

We will encounter situations where data will not fit the


normal distribution.
Makes data analysis very difficult.
Minitab will be utilized to transform non-normal data into a
distribution which is useable for data analysis.

13-25

TRANSFORMING NON-NORMAL
DATA - cont.
Data transformation should be
considered a last resort.
Analyze non-normal data (via
graphical display) to determine if
some physical effect has made the
data non-normal (ie; Bimodal - 2
sets of data).
Non-normal data can be transformed
to a Weibull distribution with Minitab.

13-26

SAMPLING DISTRIBUTIONS

Sampling distributions are derived from the population


distribution by random sampling.
Primary statistics that we sample and generate sampling
distribution for are the Mean and Variance.
Three sampling distributions will be reviewed:
Chi-Square, t and F.

13-27

CHI-SQUARE DISTRIBUTION
0.35
0.3
0.25

df=2

0.2

df=4
0.15

df=6

df=8
df=10

0.1
0.05
0
1

10

11

12

13

14

15

16

17

18

19

20

Applications:
Used to determine the confidence interval for the
standard deviation of a population.
Used to test whether more than two population
proportions can be considered equal.

13-28

CHI-SQUARE - cont.

Degrees of freedom (df) is the number n of independent


observations in the sample minus the number of
parameters (i.e. Mean) being estimated from the sample.
Since we usually estimate one parameter, df=n-1.
Chi-Square distribution changes with the number of
degrees of freedom.
As df gets very large, Chi-Square distribution resembles
the normal distribution.

13-29

t DISTRIBUTION
0.4

df=5
0.35
0.3
0.25

df=100

0.2
0.15

df=1

0.1
0.05
0
-5

-4

-3

-2

-1

t distribution (student t distribution) can be used to approximate


the Normal distribution when sample size is 30 or less.
Normal distribution becomes less representative as a sampling
distribution as the sample size gets smaller.
Normal distribution is best applied when sample size is at least
30.

13-30

F DISTRIBUTION

Frequently used in ANOVA (Analysis of Variance).


Used to test significance of difference between the
variances of samples from a population.
Like the t distribution, the F distribution has a different
shape for each df (degree of freedom).

13-31

You might also like