You are on page 1of 55

Analysis of Simulation

Experiments

Outline

Introduction
Classification of Outputs
DIDO vs. RIRO Simulation
Analysis of One System
Terminating vs. Steady-State Simulations
Analysis of Terminating Simulations
Obtaining a Specified Precision
Analysis of Steady-State Simulations
Method of Moving Average for Removing the
Initial Bias
Method of Batch Means
Multiple Measures of Performance
Analysis of Several Systems
Comparison of Two Alternative Systems
Comparison of More than Two Systems
Ranking and Selection

Introduction

The greatest disadvantage of


simulation:

Careful design and analysis is


needed to:

Dont get exact answers


Results are only estimates

Make these estimates as valid and


precise as possible
Interpret their meanings properly

Statistical methods are used to


analyze the results of simulation
experiments.

What Outputs to
Watch?

Need to think ahead about what you


would want to get out of the
simulation:

Average, and worst (longest) time in


system

Average, and worst time in queue(s)

Average hourly production

Standard deviation of hourly


production

Proportion of time a machine is up,


idle, or down

Maximum queue length

Average number of parts in system

Classification of
Outputs

There are typically two types


of dynamic processes:

Discrete-time process: There is a


natural first observation, second
observation, etc.but can only
observe them when they happen.
If Wi = time in system for the ith
part produced (for i = 1, 2, ..., N),
and there are N parts produced
during the simulation

Wi

..................................
i

Classification of
Outputs

Typical discrete-time output


performance measures:

Average time in system


N

Wi

W(N ) i 1
N

Maximum time in system

Proportion of parts that were in the


system for more than 1 hour

Delay of ith customer in queue

Throughput during ith hour

Classification of
Outputs

Continuous-time process: Can


jump into system at any point in
time (real, continuous time) and
take a snapshot of somethingthere is no natural first or second
observation.
If Q(t) = number of parts in a
particular queue at time t between
[0,T] and we run simulation for T
units of simulated time
3

2
Q(t )
1
0

Classification of
Outputs

Typical continuous-time output


performance measures:

Time-average length of queue


T

Q(t )dt
0

Q( T )

Server Utilization (proportion of time the


server is busy)
T

(T )

B(T )dt
0

1
B(t )
0

t
8

Classification of
Outputs
Other continuous-time performance
measures:

Number of parts in the system at time t

Number of machines down at time t

Proportion of time that there were more


than n parts in the queue

DIDO Vs. RIRO


Simulation

Inputs:

Cycle
times

Interarrival
times

Batch
sizes

Simulation
Model

Outputs:

Hourly
production

Machine
utilization

10

DIDO

DIDO Vs. RIRO


Simulation
Inputs:

Cycle
times

Interarrival
times

Batch
sizes

Simulation
Model

Outputs:

Hourly
production

Machine
utilization

11

RIRO

S
e
rv
e
ru
tiliz
a
tio
n

A
v
e
ra
g
en
u
m
b
e
r inq
u
e
u
A
ev
e
ra
g
ed
e
la
y inq
u
e
u
e

Analysis of One System


Single-server queue (M/M/1), Replicated 10 times
8
6
4
2
0

10

10

10

Replication

8
6
4
2
0

Replication

1.0
0.9
0.8
0.7
0.6
0.5

Replication

12

Analysis of One System


CAUTION: Because of autocorrelation
that exists in the output of virtually all
simulation models, classical
statistical methods dont work directly
within a simulation run.
Time in system for individual jobs: Y1, Y2, Y3, ...,
Yn
= E(average time in system)
Sample mean:

Y ( n)

Y
i 1

is an unbiased estimator for , but how close is


this sample mean to ?
Need to estimate Var( Y (n) ) to get confidence intervals
on

13

Analysis of One System


Problem:

Because of positive autocorrelation

between Yi and Yi+1 (Correl (Yi, Yi+l) > 0), sample


variance is no longer an unbiased estimator
of the population variance (i.e., unbiasedness
of variance estimators can only be achieved if
Y1, Y2, Y3, ..., Yn are independent).
As a result, the sample variance
n

[Y Y (n)]

S ( n)

i 1

n(n 1)

may be severely biased for Var[ Y ( n)].


In fact, usually E[ S

( n )]
n

< Var[ Y (n) ]

Implications: Understating variances

causes us to have too much faith in our


point estimates and believe the results too
much.
14

Types of Simulations with


Regard to Output Analysis

Terminating: A simulation where


there is a specific starting and
stopping condition that is part of
the model.

Steady-state: A simulation where


there is no specific starting and
ending conditions. Here, we are
interested in the steady-state
behavior of the system.

The type of analysis depends


on the goal of the study.
15

Examples of Terminating
Simulations

A retail/commercial establishment (a
bank) that operates from 9 to 5 daily and
starts empty and idle at the beginning of
each day. The output of interest may be
the average wait time of first 50 customers
in the system.

A military confrontation between a blue


force and a red force. The output of
interest may be the probability that the
red force loses half of its strength before
the blue force loses half of its strength.

16

Examples of SteadyState Simulations

A manufacturing company that operates


16 hours a day. The system here is a
continuous process where the ending
condition for one day is the initial
condition for the next day. The output of
interest here may be the expected longrun daily production.

A communication system where service


must be provided continuously.

17

Analysis for Terminating


Simulations
Objective: Obtain a point estimate and
confidence interval for some parameter
Examples:

= E (average time in system for n customers)


= E (machine utilization)
= E (work-in-process)

Reminder: Can not use classical


statistical methods within a
simulation run because observations
from one run are not independently
and identically distributed (i.i.d.)

18

Analysis for Terminating


Simulations

Make n independent replications of the model

Let Yi be the performance measure from the


ith replication
Yi = average time in system, or
Yi = work-in-process, or
Yi = utilization of a critical facility

Performance measures from different


replications, Y1, Y2, ..., Yn, are i.i.d.

But, only one sample is obtained from each


replication

Apply classical statistics to Yis, not to


observations within a run

Select confidence level 1 (0.90, 0.95, etc.)

19

Analysis for Terminating


Simulations

Approximate 100(1 a)% confidence


interval for :
n

Y ( n)

Y
i 1

unbiased estimator of

n
n

S 2 ( n)

2
[
Y

Y
(
n
)]
i
i 1

unbiased estimator of Var(Yi)

n 1

Y ( n) t n 1,1

S ( n)
n

(n, ) t n 1,1 2

S ( n)
n

covers with approximate


probability (1 a)

is the Half-Width expression

20

Example
Consider a single-server (M/M/1) queue. The
objective is to calculate a confidence interval
for the delay of customers in the queue.
n = 10 replications of a single-server queue
Yi = average delay in queue from ith replication
Yis: 2.02, 0.73, 3.20, 6.23, 1.76, 0.47, 3.89,
5.45, 1.44, 1.23

For 90% confidence interval, = 0.10


Y(10) = 2.64, S 2 (10) = 3.96, t9, 0.95 = 1.833
Approximate 90% confidence interval is
2.64 1.15, or [1.49, 3.79]

21

Analysis for Terminating


Simulations
Interpretation: 100(1 a)% of the time,

the confidence interval formed in this way


covers

( unknown)

Wrong Interpretation: I am 90%

confident that is between 1.49 and


3.79

22

Issue 1

This confidence-interval method


assumes Yis are normally
distributed. In real life, this is
almost never true.

Because of central-limit theorem, as


the number of replications (n) grows,
the coverage probability approaches
1 a.

In general, if Yis are averages of


something, their distribution tends
not to be too asymmetric, and the
confidence- interval method shown
above has reasonably good coverage.

23

Issue 2

The confidence interval may be too wide

In the M/M/1 queue example, the approximate


90% C.I. was:
2.64 1.15, or [1.49, 3.79]
The half-width is 1.15 which is 44% of the
mean (1.15/2.64)
That means that the C.I. is 2.64 44% which is
not very precise.

To decrease the half-width:


Increase n until ( , n) is small enough
(this is called Sequential Sampling)

There are two ways of defining the


precision in the estimate Y:

Absolute precision
Relative precision

24

Obtaining a Specified
Precision

Absolute Precision:

Want to make n large enough such that


( , n) , where ( , n) is the halfwidth and > 0 .

Make n0 replications of the simulation


2
model and compute Y (n) , S (n), and the
half-width, ( , n) .

Assuming that
the estimate of the
2
variance, S (n) , does not change
appreciably, an approximate expression
for the required number of replications to
achieve an absolute error of is

na* ( ) min i n: ti 1,1 2

25

S 2 ( n)

i

Obtaining a Specified
Precision

Relative Precision:

Want to make n large enough such that


where

,
n
)
Y
(
n
)

0 1

Make n0 replications of the simulation model


and compute
,
, and the half-width,
.
Y ( n) S 2 ( n)

( , estimates
n)
Assuming thatthe
of both population
mean,
, and population variance,
, do
not change appreciably, an approximate
expression for the required
Y (n) number of
replications to
an absolute error of is
S 2 achieve
(n)

n ( ) min

*
r

S 2(n)
ti 1,1 a 2

i
i n:

Y ( n)

26

Analysis for Steady-State


Simulations
Objective: Estimate the steady state mean

limi E (Yi )
Basic question: Should you do many short
runs or one long run ?????
Many short
runs

One long
run
X1

X1

X2

X3

X4

X5

27

Analysis for Steady-State


Simulations

Advantages:

Many short runs:

One long run:

Simple analysis, similar to the analysis for


terminating systems
The data from different replications are i.i.d.
Less initial bias
No restarts

Disadvantages

Many short runs:

Initial bias is introduced several times

One long run:

Sample of size 1
Difficult to get a good estimate of the
variance

28

Analysis for Steady-State


Simulations

Make many short runs: The analysis is


exactly the same as for terminating
systems. The (1 a)% C.I. is computed as
before.

Problem: Because of initial bias, Y (n) may


no longer be an unbiased estimator for the
steady state mean, .

Solution: Remove the initial portion of the


data (warm-up period) beyond which
observations are in steady-state.
Specifically pick l (warm-up period) and n
(number of observations in one run) such
that
n

i l 1

nl

29

Method of Moving Average for


Removing the Initial Bias

Welchs method for removing the warm-up


period, l:

Make n replications of the model (n>5),


each of length m, where m is large. Let
Yji be the ith observation from the jth
replication ( j = 1, 2, , n; i =1, 2, , m).
n

Let Y i Yji n for i =1, 2, , m.


j 1

To smooth out the high frequency


oscillations in Y 1 , Y 2 , ..., Y m define the
moving average Y i ( w) as follows (w is the
window and is a positive integer such
that w m / 2 ):
w

is

s w

Y i ( w)

if i w 1, ..., m w

2w 1
i 1

is

s ( i 1)

if i 1, ..., m

2i 1
30

Method of Moving Average for


Removing the Initial Bias

Plot Y i ( w) and choose l to be the value


of i beyond which Y 1 ( w), Y 2 ( w), ... seem
to have converged.

Note: Perform this procedure for several


values of w and choose the smallest w for
which the plot of Y i ( w) looks reasonably
smooth.

31

Analysis for Steady-State


Simulations

Make one Long run: Make just one long


replication so that the initial bias is only
introduced once. This way, you will not be
throwing out a lot of data.

Problem: How do you estimate the variance


because there is only one run?
Solution: Several methods to estimate the
variance:
Batch means (only approach to be
discussed)
Time-series models
Spectral analysis
Standardized time series

32

Method of Batch Means

Divide a run of length m into n adjacent


batches of length k where m = nk.

Let Y j be the sample or (batch) mean of


the jth batch.

Yi

Y1

Y2

Y3

Y4

Y5

i
m nk

The grand sample mean Y is computed as


n

Y
j 1

33

Y
i 1

Method of Batch Means

The sample variance is computed as


n

S ( n)
2
Y

(Y

Y )2

j 1

n 1

The approximate 100(1 a )% confidence


interval for is

Y t n 1,1 2

34

SY (n)
n

Method of Batch Means


Two important issues:

Issue 1: How do we choose the


batch size k?

Choose the batch size k large enough


so that the batch means, Y j ' s are
approximately uncorrelated.
2
S
Otherwise, the variance, Y ( n) , will be
biased low and the confidence interval
will be too small which means that it
will cover the mean with a probability
lower than the desired probability of
(1 a ).

35

Method of Batch Means

Issue 2: How many batches n?

Due to autocorrelation, splitting the


run into a larger number of smaller
batches, degrades the quality of each
individual batch. Therefore, 20 to 30
batches are sufficient.

36

Multiple Measures of
Performance

In most real-world simulation models, several


measures of performance are considered
simultaneously.
Examples include:

Throughput
Average length of queue
Utilization
Average time in system

Each performance measure is perhaps


estimated with a confidence interval.
Any of the intervals could miss its expected
performance measure.
Must be careful about overall statements of
coverage (i.e., that all intervals contain their
expected performance measures
simultaneously).

37

Multiple Measures of
Performance

Suppose we have k performance measures


and the confidence interval for performance
measure s for s = 1, 2, ..., k, is at confidence
level 1 s .
Then the probability that all k confidence
intervals simultaneously contain their
respective true measures is
k
All s intervals contain their

P
1 s
respective performance measure
s 1

This is referred to as the Bonferroni


inequality.

38

Multiple Measure of
Performance

To ensure that the overall probability (of all


k confidence intervals simultaneously
containing their respective true mean) is at
least 100( 1 ) percent, choose s such
that
k

s1

Can select s k for all s, or pick s


differently with smaller s for the more
important performance measures.

39

Multiple Measures of
Performance

Example: If k =2 and we want the desired


overall confidence level to be at least 90%,
we can construct two 95% confidence
intervals.

Difficulty: If there are a large number of


performance measures, and we want a
reasonable overall confidence level (e.g.,
90% ), the individual s s could become
small, making the corresponding
confidence intervals very wide. Therefore,
it is recommended that the number of
performance measures do not exceed 10.

40

Analysis of Several
Systems

Most simulation projects involve comparison of


two or more systems or configurations:

With two alternative systems, the goal may be to:

Change the number of machines in some


workcenters
Evaluate various job-dispatch policies (FIFO, SPT,
etc.)
test the hypotheses:
, or
build confidence interval for

With k > 2 alternatives, the objective may be to:

build
confidence
for
H0 :simultaneous
1 2
H0 : 1 intervals
2
various combinations of
1 2
select the best of the k alternatives
select a subset of size m < k that contains the
best alternative
select the m best (unranked)
of the
alternatives
i i
1

41

Analysis of Several
Systems

To illustrate the danger in making only


one run and eyeballing the results when
comparing alternatives, consider the
following example:
Compare:
Alternative 1: M/M/1 queue with
interarrival time of 1 min., and one fast
machine with service time of 0.9 min.,
and
Alternative 2: M/M/2 queue with
interarrival time of 1 min., and two slow
machines with service time of 1.8 min. for
each machine.
vs.

42

Analysis of Several
Systems

If the performance measure of interest is


the expected average delay in queue of the
first 100 customers with empty-and-idle
initial conditions, using queuing analysis,
the true steady-state average delays in the
queues are:

1 4.13 3.70 2
Therefore, system 2 is better

If we run each model just once and


calculate the average delay, Y, from each
i
alternative, and select the system with the
smallest Y , then
i

Prob(selecting system 1 (wrong answer)) = 0.52

Reason: Randomness in the output

43

Analysis of Several
Systems

Solution:

Replicate each alternative n times


Let Yij = average delay from jth replication
of alternative i
Compute the average of all replications for
alternative i
n

Y
j 1

Yi

ij

Select the alternative with the lowest

Yi .

If we conduct this experiment many times,


the following results are obtained:
n

P(wrong Answer)

1
5
10
20

0.52
0.43
0.38
0.34

44

Comparison of Two
Alternative Systems

Form a confidence interval for the difference


between the performance measures of the two
systems ( i.e., 1 2 ).

If the interval misses 0, there is a statistical


difference between the two systems.

Confidence intervals are better than


hypothesis tests because if a difference exists,
the confidence interval measures its
magnitude, while a hypothesis test does not.

There are two slightly different ways for


constructing the confidence intervals:

Paired-t
Two-Sample-t.

45

Paired-t Confidence Interval

Make n replications of the two systems.

Let

Yijbe the jth observation from system i

(i = 1, 2).

Pair

define
Y1 jwith Y2and
j

Z j Y1 j for
Y2 j

j = 1, 2, , n.

Then, the

are IID random variables and

Z 's

j
, the
quantity for which we want to construct
( Z j ) interval.
aEconfidence

Let
n

Z
j 1

Z ( n)

and

Z
n

j 1

Z ( n)

Var Z (n)
Then, the approximate 100(1C.I. is
n(n) percent
1)

Z (n) t n 1,1 2 Var Z ( n)


46

Two-Sample-t Confidence
Interval

Make n1 replications of system 1 and n2


replications of system 2. Here n1 n2 .
Again, for system i= 1, 2, let
ni

Y
j 1

Yi (ni )

and

ni

Y
ni

Si2 ( ni )

Yi (ni )

ij

ni

Estimate the degrees of freedom as

j 1

ij

2
1

(n1 ) n1

S12 (n1 ) n1 S22 (n2 ) n2

(n1 1) S (n2 ) n2
2
2

(n2 1)

Then, the approximate 100(1- ) percent C.I. is

Y1 (n1 ) Y 2 (n2 ) t f ,1 2
47

S12 (n1 ) S22 (n2 )

n1
n2

Contrasting the Two


Methods

The two-sample-t approach requires


independence of Y1 j ' s and Y2 j ' s , whereas
in the paired-t approach Y1 j ' s and Y2 j ' s do
not have to be independent.

Therefore, in the paired-t approach,


common random numbers can be used to
induce positive correlation between the
observations on the different systems to
reduce the variance.

In the paired-t approach, n1 = n2, whereas


in the two-sample-t method , n1 n2 .

48

Confidence Intervals For


Comparing More Than Two
Systems

In the case of more than two alternative


systems, there are two ways to construct a
confidence interval on selected differences
. i 1 i 2

Comparison with a standard, and


All pairwise comparisons

NOTE: Since we are making c > 1 confidence


intervals, in order to have an overall
confidence level of 1 , we must make
each interval at level 1 c (Bonferroni).

49

Comparison with a
Standard

In this case, one of the systems (perhaps the


existing system or policy) is a standard. If
system 1 is the standard and we want to
compare systems 2, 3, ..., k to system 1, k-1
confidence intervals must be constructed for
the k-1 differences

2 1 , 3 1 , ..., k 1

In order to achieve an overall confidence level


of at least
, 1
each
of the k-1 confidence
intervals must be constructed at level
1. ( k 1)

Can use paired-t or two-sample-t methods


described in the previous section to make
the individual intervals.

50

All Pairwise
Comparisons

In this case, each system is compared to


every other system to detect and quantify any
significant differences. Therefore, for k
systems, we construct k (k -1) / 2 confidence
intervals for the k (k -1) / 2 differences:

2 1

3 1
3 2

...
...

k 1
k 2
..
.
k k1

Each of the confidence intervals must be


constructed at a level of
, so
that an overall confidence of at least
1 [ k ( k 1) 2]
can be achieved.

Again,
1 we can use paired-t or two-sample-t
methods to make the individual confidence
intervals.

51

Ranking and Selection

The goals of ranking and selection are


different and more ambitious than simply
making a comparison between several
alternative systems. Here, the goal may
be to:
Select the best of k systems

Select a subset of size m containing the


best of k systems

Select the m best of k systems

52

Ranking and Selection


1. Selecting the best of k systems:

Want to select one of the k alternatives as the


best.

Because of the inherent randomness in


simulation modeling, we cant be sure that the
selected system is the one with smallest
i(assuming small is good). Therefore, we
specify a correct-selection probability P* (like
0.90 or 0.95).

Also we specify an indifference zone d* which


means that if the best mean and next-best mean
differ by more than d*, we select the best one
with probability P*.

As an example, suppose that we have 5


alternative configurations and we want to
identify the best system with a probability of at
least 95%.

53

Ranking and Selection


2. Selecting a subset of size m containing
the best of k systems:

Want to select a subset of size m (< k) that


contains the best system with probability of
at least P*.

This approach is useful in initial screening


of alternatives to eliminate the inferior
options.

For example, suppose that we have 10


alternative configurations and we want to
identify a subset of 3 alternatives that
contains the best system with a probability
of at least 95% .

54

Ranking and Selection


3. Selecting the m best of k systems:

Want to select the m best (unranked) of the


k systems so that with probability of at
least P* the expected responses of the
selected subset are equal to the m smallest
expected responses.

This situation may be useful when we want


to identify several good options, in case the
best one is unacceptable for some reason.

For example, suppose that we have 5


alternative configurations and we want to
select the 3 best alternatives and we want
the probability of correct selection to be at
least 90% .

55

You might also like