Output Analysis

Analysis of Simulation
Experiments
Outline
Introduction
Classification of Outputs
DIDO vs. RIRO Simulation
Analysis of One System
Terminating vs. Steady-State Simulations
Analysis of Terminating Simulations
Obtaining a Specified Precision
Analysis of Steady-State Simulations
Method of Moving Average for Removing the
Initial Bias
Method of Batch Means
Multiple Measures of Performance
Analysis of Several Systems
Comparison of Two Alternative Systems
Comparison of More than Two Systems
Ranking and Selection
Introduction
The greatest disadvantage of

simulation:
Careful design and analysis is

needed to:
Dont get exact answers

Results are only estimates
Make these estimates as valid and

precise as possible
Interpret their meanings properly
Statistical methods are used to

analyze the results of simulation
experiments.
What Outputs to
Watch?
Need to think ahead about what you

would want to get out of the
simulation:
Average, and worst (longest) time in

system
Average, and worst time in queue(s)
Average hourly production
Standard deviation of hourly

production
Proportion of time a machine is up,

idle, or down
Maximum queue length
Average number of parts in system
Classification of
Outputs
There are typically two types

of dynamic processes:
Discrete-time process: There is a

natural first observation, second
observation, etc.but can only
observe them when they happen.
If Wi = time in system for the ith
part produced (for i = 1, 2, ..., N),
and there are N parts produced
during the simulation
Wi
..................................
i
Classification of
Outputs
Typical discrete-time output

performance measures:
Average time in system

N
Wi
W(N ) i 1
N
Maximum time in system
Proportion of parts that were in the

system for more than 1 hour
Delay of ith customer in queue
Throughput during ith hour
Classification of
Outputs
Continuous-time process: Can

jump into system at any point in
time (real, continuous time) and
take a snapshot of somethingthere is no natural first or second
observation.
If Q(t) = number of parts in a
particular queue at time t between
[0,T] and we run simulation for T
units of simulated time
3
2
Q(t )
1
0
Classification of
Outputs
Typical continuous-time output

performance measures:
Time-average length of queue

T
Q(t )dt
0
Q( T )
Server Utilization (proportion of time the

server is busy)
T
(T )
B(T )dt
0
1
B(t )
0
t
8
Classification of
Outputs
Other continuous-time performance
measures:
Number of parts in the system at time t
Number of machines down at time t
Proportion of time that there were more

than n parts in the queue
DIDO Vs. RIRO

Simulation
Inputs:
Cycle
times
Interarrival
times
Batch
sizes
Simulation
Model
Outputs:
Hourly
production
Machine
utilization
10
DIDO
DIDO Vs. RIRO

Simulation
Inputs:
Cycle
times
Interarrival
times
Batch
sizes
Simulation
Model
Outputs:
Hourly
production
Machine
utilization
11
RIRO
S
e
rv
e
ru
tiliz
a
tio
n
A
v
e
ra
g
en
u
m
b
e
r inq
u
e
u
A
ev
e
ra
g
ed
e
la
y inq
u
e
u
e

Single-server queue (M/M/1), Replicated 10 times
8
6
4
2
0
10
10
10
Replication
8
6
4
2
0
Replication
1.0
0.9
0.8
0.7
0.6
0.5
Replication
12

CAUTION: Because of autocorrelation
that exists in the output of virtually all
simulation models, classical
statistical methods dont work directly
within a simulation run.
Time in system for individual jobs: Y1, Y2, Y3, ...,
Yn
= E(average time in system)
Sample mean:
Y ( n)
Y
i 1
is an unbiased estimator for , but how close is

this sample mean to ?
Need to estimate Var( Y (n) ) to get confidence intervals
on
13

Problem:
Because of positive autocorrelation
between Yi and Yi+1 (Correl (Yi, Yi+l) > 0), sample

variance is no longer an unbiased estimator
of the population variance (i.e., unbiasedness
of variance estimators can only be achieved if
Y1, Y2, Y3, ..., Yn are independent).
As a result, the sample variance
n
[Y Y (n)]
S ( n)
i 1
n(n 1)
may be severely biased for Var[ Y ( n)].

In fact, usually E[ S
( n )]
n
< Var[ Y (n) ]
Implications: Understating variances
causes us to have too much faith in our

point estimates and believe the results too
much.
14
Types of Simulations with

Regard to Output Analysis
Terminating: A simulation where

there is a specific starting and
stopping condition that is part of
the model.
Steady-state: A simulation where

there is no specific starting and
ending conditions. Here, we are
interested in the steady-state
behavior of the system.
The type of analysis depends

on the goal of the study.
15
Examples of Terminating
Simulations
A retail/commercial establishment (a
bank) that operates from 9 to 5 daily and
starts empty and idle at the beginning of
each day. The output of interest may be
the average wait time of first 50 customers
in the system.
A military confrontation between a blue

force and a red force. The output of
interest may be the probability that the
red force loses half of its strength before
the blue force loses half of its strength.
16
Examples of SteadyState Simulations
A manufacturing company that operates

16 hours a day. The system here is a
continuous process where the ending
condition for one day is the initial
condition for the next day. The output of
interest here may be the expected longrun daily production.
A communication system where service

must be provided continuously.
17
Analysis for Terminating

Simulations
Objective: Obtain a point estimate and
confidence interval for some parameter
Examples:
= E (average time in system for n customers)

= E (machine utilization)
= E (work-in-process)
Reminder: Can not use classical

statistical methods within a
simulation run because observations
from one run are not independently
and identically distributed (i.i.d.)
18

Simulations
Make n independent replications of the model
Let Yi be the performance measure from the

ith replication
Yi = average time in system, or
Yi = work-in-process, or
Yi = utilization of a critical facility
Performance measures from different

replications, Y1, Y2, ..., Yn, are i.i.d.
But, only one sample is obtained from each

replication
Apply classical statistics to Yis, not to

observations within a run
Select confidence level 1 (0.90, 0.95, etc.)
19

Simulations
Approximate 100(1 a)% confidence

interval for :
n
Y ( n)
Y
i 1
unbiased estimator of
n
n
S 2 ( n)
2
[
Y
Y
(
n
)]
i
i 1
unbiased estimator of Var(Yi)
n 1
Y ( n) t n 1,1
S ( n)
n
(n, ) t n 1,1 2
S ( n)
n
covers with approximate

probability (1 a)
is the Half-Width expression
20
Example
Consider a single-server (M/M/1) queue. The
objective is to calculate a confidence interval
for the delay of customers in the queue.
n = 10 replications of a single-server queue
Yi = average delay in queue from ith replication
Yis: 2.02, 0.73, 3.20, 6.23, 1.76, 0.47, 3.89,
5.45, 1.44, 1.23
For 90% confidence interval, = 0.10

Y(10) = 2.64, S 2 (10) = 3.96, t9, 0.95 = 1.833
Approximate 90% confidence interval is
2.64 1.15, or [1.49, 3.79]
21

Simulations
Interpretation: 100(1 a)% of the time,
the confidence interval formed in this way

covers
( unknown)
Wrong Interpretation: I am 90%
confident that is between 1.49 and

3.79
22
Issue 1
This confidence-interval method

assumes Yis are normally
distributed. In real life, this is
almost never true.
Because of central-limit theorem, as

the number of replications (n) grows,
the coverage probability approaches
1 a.
In general, if Yis are averages of

something, their distribution tends
not to be too asymmetric, and the
confidence- interval method shown
above has reasonably good coverage.
23
Issue 2
The confidence interval may be too wide
In the M/M/1 queue example, the approximate

90% C.I. was:
2.64 1.15, or [1.49, 3.79]
The half-width is 1.15 which is 44% of the
mean (1.15/2.64)
That means that the C.I. is 2.64 44% which is
not very precise.
To decrease the half-width:

Increase n until ( , n) is small enough
(this is called Sequential Sampling)
There are two ways of defining the

precision in the estimate Y:
Absolute precision
Relative precision
24
Obtaining a Specified
Precision
Absolute Precision:
Want to make n large enough such that

( , n) , where ( , n) is the halfwidth and > 0 .
Make n0 replications of the simulation

2
model and compute Y (n) , S (n), and the
half-width, ( , n) .
Assuming that
the estimate of the
2
variance, S (n) , does not change
appreciably, an approximate expression
for the required number of replications to
achieve an absolute error of is
na* ( ) min i n: ti 1,1 2
25
S 2 ( n)

i
Obtaining a Specified
Precision
Relative Precision:
Want to make n large enough such that

where
,
n
)
Y
(
n
)
0 1
Make n0 replications of the simulation model

and compute
,
, and the half-width,
.
Y ( n) S 2 ( n)
( , estimates
n)
Assuming thatthe
of both population
mean,
, and population variance,
, do
not change appreciably, an approximate
expression for the required
Y (n) number of
replications to
an absolute error of is
S 2 achieve
(n)
n ( ) min
*
r
S 2(n)
ti 1,1 a 2
i
i n:
Y ( n)
26
Analysis for Steady-State

Simulations
Objective: Estimate the steady state mean
limi E (Yi )
Basic question: Should you do many short
runs or one long run ?????
Many short
runs
One long
run
X1
X1
X2
X3
X4
X5
27

Simulations
Advantages:
Many short runs:
One long run:
Simple analysis, similar to the analysis for

terminating systems
The data from different replications are i.i.d.
Less initial bias
No restarts
Disadvantages
Many short runs:
Initial bias is introduced several times
One long run:
Sample of size 1
Difficult to get a good estimate of the
variance
28

Simulations
Make many short runs: The analysis is

exactly the same as for terminating
systems. The (1 a)% C.I. is computed as
before.
Problem: Because of initial bias, Y (n) may

no longer be an unbiased estimator for the
steady state mean, .
Solution: Remove the initial portion of the

data (warm-up period) beyond which
observations are in steady-state.
Specifically pick l (warm-up period) and n
(number of observations in one run) such
that
n
i l 1
nl
29
Method of Moving Average for

Removing the Initial Bias
Welchs method for removing the warm-up

period, l:
Make n replications of the model (n>5),

each of length m, where m is large. Let
Yji be the ith observation from the jth
replication ( j = 1, 2, , n; i =1, 2, , m).
n
Let Y i Yji n for i =1, 2, , m.

j 1
To smooth out the high frequency

oscillations in Y 1 , Y 2 , ..., Y m define the
moving average Y i ( w) as follows (w is the
window and is a positive integer such
that w m / 2 ):
w
is
s w
Y i ( w)
if i w 1, ..., m w
2w 1
i 1
is
s ( i 1)
if i 1, ..., m
2i 1
30
Method of Moving Average for

Removing the Initial Bias
Plot Y i ( w) and choose l to be the value

of i beyond which Y 1 ( w), Y 2 ( w), ... seem
to have converged.
Note: Perform this procedure for several

values of w and choose the smallest w for
which the plot of Y i ( w) looks reasonably
smooth.
31

Simulations
Make one Long run: Make just one long

replication so that the initial bias is only
introduced once. This way, you will not be
throwing out a lot of data.
Problem: How do you estimate the variance

because there is only one run?
Solution: Several methods to estimate the
variance:
Batch means (only approach to be
discussed)
Time-series models
Spectral analysis
Standardized time series
32
Divide a run of length m into n adjacent

batches of length k where m = nk.
Let Y j be the sample or (batch) mean of

the jth batch.
Yi
Y1
Y2
Y3
Y4
Y5
i
m nk
The grand sample mean Y is computed as

n
Y
j 1
33
Y
i 1
The sample variance is computed as

n
S ( n)
2
Y
(Y
Y )2
j 1
n 1
The approximate 100(1 a )% confidence

interval for is
Y t n 1,1 2
34
SY (n)
n

Two important issues:
Issue 1: How do we choose the

batch size k?
Choose the batch size k large enough

so that the batch means, Y j ' s are
approximately uncorrelated.
2
S
Otherwise, the variance, Y ( n) , will be
biased low and the confidence interval
will be too small which means that it
will cover the mean with a probability
lower than the desired probability of
(1 a ).
35
Issue 2: How many batches n?
Due to autocorrelation, splitting the

run into a larger number of smaller
batches, degrades the quality of each
individual batch. Therefore, 20 to 30
batches are sufficient.
36
Multiple Measures of
Performance
In most real-world simulation models, several

measures of performance are considered
simultaneously.
Examples include:
Throughput
Average length of queue
Utilization
Average time in system
Each performance measure is perhaps

estimated with a confidence interval.
Any of the intervals could miss its expected
performance measure.
Must be careful about overall statements of
coverage (i.e., that all intervals contain their
expected performance measures
simultaneously).
37
Performance
Suppose we have k performance measures

and the confidence interval for performance
measure s for s = 1, 2, ..., k, is at confidence
level 1 s .
Then the probability that all k confidence
intervals simultaneously contain their
respective true measures is
k
All s intervals contain their
P
1 s
respective performance measure
s 1
This is referred to as the Bonferroni

inequality.
38
Multiple Measure of
Performance
To ensure that the overall probability (of all

k confidence intervals simultaneously
containing their respective true mean) is at
least 100( 1 ) percent, choose s such
that
k
s1
Can select s k for all s, or pick s

differently with smaller s for the more
important performance measures.
39
Performance
Example: If k =2 and we want the desired

overall confidence level to be at least 90%,
we can construct two 95% confidence
intervals.
Difficulty: If there are a large number of

performance measures, and we want a
reasonable overall confidence level (e.g.,
90% ), the individual s s could become
small, making the corresponding
confidence intervals very wide. Therefore,
it is recommended that the number of
performance measures do not exceed 10.
40
Analysis of Several
Systems
Most simulation projects involve comparison of

two or more systems or configurations:
With two alternative systems, the goal may be to:
Change the number of machines in some

workcenters
Evaluate various job-dispatch policies (FIFO, SPT,
etc.)
test the hypotheses:
, or
build confidence interval for
With k > 2 alternatives, the objective may be to:
build
confidence
for
H0 :simultaneous
1 2
H0 : 1 intervals
2
various combinations of
1 2
select the best of the k alternatives
select a subset of size m < k that contains the
best alternative
select the m best (unranked)
of the
alternatives
i i
1
41
Analysis of Several
Systems
To illustrate the danger in making only

one run and eyeballing the results when
comparing alternatives, consider the
following example:
Compare:
Alternative 1: M/M/1 queue with
interarrival time of 1 min., and one fast
machine with service time of 0.9 min.,
and
Alternative 2: M/M/2 queue with
interarrival time of 1 min., and two slow
machines with service time of 1.8 min. for
each machine.
vs.
42
Analysis of Several
Systems
If the performance measure of interest is

the expected average delay in queue of the
first 100 customers with empty-and-idle
initial conditions, using queuing analysis,
the true steady-state average delays in the
queues are:
1 4.13 3.70 2
Therefore, system 2 is better
If we run each model just once and

calculate the average delay, Y, from each
i
alternative, and select the system with the
smallest Y , then
i
Prob(selecting system 1 (wrong answer)) = 0.52
Reason: Randomness in the output
43
Analysis of Several
Systems
Solution:
Replicate each alternative n times

Let Yij = average delay from jth replication
of alternative i
Compute the average of all replications for
alternative i
n
Y
j 1
Yi
ij
Select the alternative with the lowest
Yi .
If we conduct this experiment many times,

the following results are obtained:
n
P(wrong Answer)
1
5
10
20
0.52
0.43
0.38
0.34
44
Comparison of Two
Alternative Systems
Form a confidence interval for the difference

between the performance measures of the two
systems ( i.e., 1 2 ).
If the interval misses 0, there is a statistical

difference between the two systems.
Confidence intervals are better than

hypothesis tests because if a difference exists,
the confidence interval measures its
magnitude, while a hypothesis test does not.
There are two slightly different ways for

constructing the confidence intervals:
Paired-t
Two-Sample-t.
45
Paired-t Confidence Interval
Make n replications of the two systems.
Let
Yijbe the jth observation from system i
(i = 1, 2).
Pair
define
Y1 jwith Y2and
j
Z j Y1 j for
Y2 j
j = 1, 2, , n.
Then, the
are IID random variables and
Z 's
j
, the
quantity for which we want to construct
( Z j ) interval.
aEconfidence
Let
n
Z
j 1
Z ( n)
and
Z
n
j 1
Z ( n)
Var Z (n)
Then, the approximate 100(1C.I. is
n(n) percent
1)
Z (n) t n 1,1 2 Var Z ( n)

46
Two-Sample-t Confidence
Interval
Make n1 replications of system 1 and n2

replications of system 2. Here n1 n2 .
Again, for system i= 1, 2, let
ni
Y
j 1
Yi (ni )
and
ni
Y
ni
Si2 ( ni )
Yi (ni )
ij
ni
Estimate the degrees of freedom as
j 1
ij
2
1
(n1 ) n1
S12 (n1 ) n1 S22 (n2 ) n2
(n1 1) S (n2 ) n2
2
2
(n2 1)
Then, the approximate 100(1- ) percent C.I. is
Y1 (n1 ) Y 2 (n2 ) t f ,1 2
47
S12 (n1 ) S22 (n2 )
n1
n2
Contrasting the Two

Methods
The two-sample-t approach requires

independence of Y1 j ' s and Y2 j ' s , whereas
in the paired-t approach Y1 j ' s and Y2 j ' s do
not have to be independent.
Therefore, in the paired-t approach,

common random numbers can be used to
induce positive correlation between the
observations on the different systems to
reduce the variance.
In the paired-t approach, n1 = n2, whereas

in the two-sample-t method , n1 n2 .
48
Confidence Intervals For

Comparing More Than Two
Systems
In the case of more than two alternative

systems, there are two ways to construct a
confidence interval on selected differences
. i 1 i 2
Comparison with a standard, and

All pairwise comparisons
NOTE: Since we are making c > 1 confidence

intervals, in order to have an overall
confidence level of 1 , we must make
each interval at level 1 c (Bonferroni).
49
Comparison with a
Standard
In this case, one of the systems (perhaps the

existing system or policy) is a standard. If
system 1 is the standard and we want to
compare systems 2, 3, ..., k to system 1, k-1
confidence intervals must be constructed for
the k-1 differences
2 1 , 3 1 , ..., k 1
In order to achieve an overall confidence level

of at least
, 1
each
of the k-1 confidence
intervals must be constructed at level
1. ( k 1)
Can use paired-t or two-sample-t methods

described in the previous section to make
the individual intervals.
50
All Pairwise
Comparisons
In this case, each system is compared to

every other system to detect and quantify any
significant differences. Therefore, for k
systems, we construct k (k -1) / 2 confidence
intervals for the k (k -1) / 2 differences:
2 1
3 1
3 2
...
...
k 1
k 2
..
.
k k1
Each of the confidence intervals must be

constructed at a level of
, so
that an overall confidence of at least
1 [ k ( k 1) 2]
can be achieved.
Again,
1 we can use paired-t or two-sample-t
methods to make the individual confidence
intervals.
51
The goals of ranking and selection are

different and more ambitious than simply
making a comparison between several
alternative systems. Here, the goal may
be to:
Select the best of k systems
Select a subset of size m containing the

best of k systems
Select the m best of k systems
52

1. Selecting the best of k systems:
Want to select one of the k alternatives as the

best.
Because of the inherent randomness in

simulation modeling, we cant be sure that the
selected system is the one with smallest
i(assuming small is good). Therefore, we
specify a correct-selection probability P* (like
0.90 or 0.95).
Also we specify an indifference zone d* which

means that if the best mean and next-best mean
differ by more than d*, we select the best one
with probability P*.
As an example, suppose that we have 5

alternative configurations and we want to
identify the best system with a probability of at
least 95%.
53

2. Selecting a subset of size m containing
the best of k systems:
Want to select a subset of size m (< k) that

contains the best system with probability of
at least P*.
This approach is useful in initial screening

of alternatives to eliminate the inferior
options.
For example, suppose that we have 10

identify a subset of 3 alternatives that
contains the best system with a probability
of at least 95% .
54

3. Selecting the m best of k systems:
Want to select the m best (unranked) of the

k systems so that with probability of at
least P* the expected responses of the
selected subset are equal to the m smallest
expected responses.
This situation may be useful when we want

to identify several good options, in case the
best one is unacceptable for some reason.
For example, suppose that we have 5

select the 3 best alternatives and we want
the probability of correct selection to be at
least 90% .
55

Output Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Output Analysis

Uploaded by

Copyright:

Available Formats

Analysis of Simulation

The greatest disadvantage of

Careful design and analysis is

Dont get exact answers

Make these estimates as valid and

Statistical methods are used to

Need to think ahead about what you

Average, and worst (longest) time in

Average, and worst time in queue(s)

Average hourly production

Standard deviation of hourly

Proportion of time a machine is up,

Maximum queue length

Average number of parts in system

There are typically two types

Discrete-time process: There is a

Typical discrete-time output

Average time in system

Maximum time in system

Proportion of parts that were in the

Delay of ith customer in queue

Throughput during ith hour

Continuous-time process: Can

Typical continuous-time output

Time-average length of queue

Server Utilization (proportion of time the

Number of parts in the system at time t

Number of machines down at time t

Proportion of time that there were more

DIDO Vs. RIRO

DIDO Vs. RIRO

Analysis of One System

Analysis of One System

is an unbiased estimator for , but how close is

Analysis of One System

Because of positive autocorrelation

between Yi and Yi+1 (Correl (Yi, Yi+l) > 0), sample

may be severely biased for Var[ Y ( n)].

< Var[ Y (n) ]

Implications: Understating variances

causes us to have too much faith in our

Types of Simulations with

Terminating: A simulation where

Steady-state: A simulation where

The type of analysis depends

A military confrontation between a blue

Examples of SteadyState Simulations

A manufacturing company that operates

A communication system where service

Analysis for Terminating

= E (average time in system for n customers)

Reminder: Can not use classical

Analysis for Terminating

Make n independent replications of the model

Let Yi be the performance measure from the

Performance measures from different

But, only one sample is obtained from each

Apply classical statistics to Yis, not to

Select confidence level 1 (0.90, 0.95, etc.)

Analysis for Terminating

Approximate 100(1 a)% confidence

unbiased estimator of Var(Yi)

covers with approximate

is the Half-Width expression