You are on page 1of 45

"The method that proceeds without analysis is like the

groping of a blind man. Socrates

Questions Addressed
What

are the issues involved in comparing


two or more system configurations?
What is hypothesis testing?
How do common random numbers work?

9-2

Purpose
Purpose:

comparison of alternative system designs.


Approach: discuss a few of many statistical
methods that can be used to compare two or more
system designs.

Statistical analysis is needed to discover whether observed


differences are due to:
Differences in design, or
The random fluctuation inherent in the models.
3

Outline
For two-system

comparisons:
Independent sampling.
Correlated sampling (common random numbers).

For multiple

system comparisons:
Bonferroni approach: confidence-interval
estimation, screening, and selecting the best.

Comparing Two Systems

Challenge is to determine if differences are


attributable to actual differences in
performance and not to statistical variation.

Running multiple replications, or batches,


is required.
9-5

Hypothesis Testing
Used

when outcomes are close or high


precision is required.
A hypothesis is first formulated (e.g. that
methods A and B both result in the same throughput)

and then a test is made to see whether the


results of the simulation lead us to reject
the hypothesis.
9-6

What does it mean if you


fail to reject a hypothesis?
1. Our hypothesis is correct
OR
2. The variance in the observed outcomes
are too high given the number of
replications to be conclusive (run more
replications or use variance reduction techniques).
9-7

Types of Errors
Type

I error: Reject True Hypothesis


Type II error: Accept False Hypothesis
True
Accept
Reject

False

II
I
9-8

Possible Confidence Intervals


[------|------]

(A)

(B)

(C)

[------|------]

Fail to
Reject H0

Reject H0
[------|------]

Reject H0

U1-U2=0
[------|------] denotes confidence Interval

9-9

Comparison of two Buffer Strategies

Assume we have two candidate strategies believed to


maximize the throughput of the system.
We offer two methods based on the Confidence Interval
approach.
We seek to discover if the mean throughput of the system
due to Strategy 1 and Strategy 2 are significantly different.
We begin by estimating the mean performance of the two
proposed strategies by simulating each strategy for 16 days
past the warm-up period.
The simulation experiment was replicated 10 times for
each strategy.
The throughput achieved by each strategy is shown next.
9-10

Comparison of two Buffer Strategies


(A)
Replication

(C)
(B)
Strategy 2
Strategy 1
Throughput Throughput

54.48

56.01

57.36

54.08

54.81

52.14

56.20

53.49

54.83

55.49

57.69

55.00

58.33

54.88

57.19

54.47

56.84

54.93

10

55.29

55.84

mean
Standard Deviation

56.30
1.37

54.63
1.17

Variance

1.89

1.36

9-11

Welch Confidence Interval


Requires

that the observations drawn


from each simulated system be
normally distributed and independent
within a population and between
populations.
9-12

9-13

17.5

= T.INV.2T(0.05,17.5) = 2.10

9-14

9-15

9-16

Paired-t CI for Comparing 2 Systems


Like

the Welch CI method, the paired-t CI


method requires that the observations drawing
from each population be normally distributed
and independent within a population.
However, the paired-t CI method does not
require that the observations between
populations be independent.

9-17

Paired-t CI for Comparing 2 Systems


This

allows us to use Common Random


Numbers to force a positive correlation between
the two populations in order to reduce the halfwidth.
Finally, like the Welch method, the paired-t CI
method does not require that the population have
equal variances.
Given n observations (n1=n2=n), we pair the
observations from each population (x1 and x2) to
define a new random variable: x(1-2)j = x1j - x2j
9-18

9-19

Paired-t Comparison
(A)
Replication (j)

(B)
Strategy 1
Throughput x1j

(C)
Strategy 2
Throughput
x2j

(D)
Difference (B C)
X(1-2)j = x1j - x2j

54.48

56.01

-1.53

57.36

54.08

3.28

54.81

52.14

2.67

56.20

53.49

2.71

54.83

55.49

-0.66

57.69

55.00

2.69

58.33

54.88

3.45

57.19

54.47

2.72

56.84

54.93

1.91

10

55.29

55.84

-0.55

mean
Standard Dev.
Variance

1.67
1.85
3.42

9-20

9-21

Comparing More than 2 Systems


Bonferroni

approach

ANOVA
Factorial

design and
optimization experiments

9-22

9-23

9-24

9-25

9-26

Factorial Design
Tests system response(s) when
multiple factors are being manipulated.
Input

variables are the factors


Output measures are the responses

9-27

Two-level Full-factorial Design


Define a high and low level setting for each factor.
Try every combination of factor settings.
Run detailed studies for factors that have the
greatest impact.

9-28

Fractional-factorial Design
Strategically "screen out" factors that have little or no
impact on system performance.
On remaining factors, run a full-factorial experiment.
Run detailed studies for factors that have the greatest
impact.

9-29

Variance Reduction
Common

Random Numbers (CRN)


Evaluates each system under the exact same
circumstances.
Helps ensure that observed differences of
system designs are due to the differences in
the designs and not to differences in
experimental conditions.
Antithetic Random Numbers (ARN)
9-30

Table 7.10. Comparison of two buffer allocation strategies using


common random numbers.

CRN Continued

(A)
Replication
1
2
3
4
5
6
7
8
9
10

(B)
Strategy 1
Throughput
79.05
54.96

(C)
Strategy 2
Throughput
75.09
51.09

(D)
Difference (B-C)
3.96
3.87

51.23

49.09

2.14

88.74
56.43
70.42

88.01
53.34
67.54

0.73
3.09
2.88

35.71

34.87

0.84

58.12
57.77
45.08

54.24
55.03
42.55

3.88
2.74
2.53

X Difference = 2.67
s Difference = 1.16
9-31

More Detail on Comparison of Two


System Designs
Goal: compare two possible system configurations, e.g.:
two possible ordering policies in a supply-chain system,
two possible scheduling rules in a job shop.

Approach: the method of replications is used to analyze the


output data.

The mean performance measure for system i is denoted by


qi (i = 1, 2) to obtain point and interval estimates for the
difference in mean performance, namely q1 q2.
32

Comparison of Two System Designs


Vehicle-safety inspection example:
The station performs 3 jobs: (1) brake check, (2) headlight check, and (3)

steering check.
Vehicles arrival: Possion with rate = 9.5/hour.
Present system:
Three stalls in parallel (one attendant makes all 3 inspections at each stall).

Service times for the 3 jobs: normally distributed with means 6.5, 6.0 and 5.5

minutes, respectively.
Alternative system:
Each attendant specializes in a single task, each vehicle will pass through
three work stations in series
Mean service times for each job decreases by 10% (5.85, 5.4, and 4.95
minutes).
Performance measure: mean response time per vehicle (total time from

vehicle arrival to its departure).

33

Comparison of Two System Designs

From replication r of system i, the simulation analyst obtains an estimate

Yir of the mean performance measure qi .


Assuming that the estimators Yir are (at least approximately) unbiased:

q1 = E(Y1r ), r = 1, , R1;

q2 = E(Y2r ), r = 1, , R2

Goal: compute a confidence interval for q1 q2 to compare the

two system designs

Confidence interval for q1 q2 :


If c.i. is totally to the left of 0, strong evidence for the hypothesis that q1 q2 < 0 (q1 < q2 ).
If c.i. is totally to the right of 0, strong evidence for the hypothesis that q1 q2 > 0 (q1 > q2 ).
If c.i. contains 0, no strong statistical evidence that one system is better than the other
If enough additional data were collected (i.e., Ri increased), the c.i. would

most likely shift, and definitely shrink in length, until conclusion of q1 < q2 or
q1 > q2 would be drawn.
34

Comparison of Two System Designs


A two-sided 100(1-)% c.i. for q1 q2 always takes the
form of:

Y.1 Y.2 t / 2, s.e.(Y.1 Y.2 )


where Y.i is the sample mean performance measure for system i over all replications,
and is the degress of freedom.

To calculate the Standard Error, the analyst uses one of


two statistical techniques.
Both techniques assume that the basic data Yir are
approximately normally distributed.
We will discuss these two methods next.

35

Comparison of Two System Designs


Statistically significant versus practically significant
Statistical significance: is the observed difference Y.1 Y.2

larger than the variability in Y.1 Y.2 ?


Practical significance: is the true difference q1 q2 large

enough to matter for the decision we need to make?


Confidence intervals do not answer the question of

practical significance directly, instead, they bound the true


difference within a range.
36

Independent Sampling with Equal Variances


[Comparison of 2 systems]

Different and independent random number streams are


used to simulate the two systems
All observations of simulated system 1 are statistically
independent of all the observations of simulated system 2.
The variance of the sample mean, Y.i , is:
V Y.i i2
V Y.i

,
Ri
Ri

i 1,2

For independent samples:

V Y.1 Y.2 V Y.1 V Y.2

12
R1

22
R2
37

Independent Sampling with Equal Variances


[Comparison of 2 systems]
If it is reasonable to assume that 21 22 (approximately) or if R1 = R2,
a two-sample-t confidence-interval approach can be used:
The point estimate of the mean performance difference is: Y.1 Y.2
The sample variance for system i is:
The pooled estimate of 2 is:

( R1 1) S12 ( R2 1) S 22
S
,
R1 R2 2
2
p

i
1
2
Si
Yri Y.i
Ri 1 r 1

i
1

Yri 2 RiY.i 2
Ri 1 r 1

where R R -2 degrees of freedom


1
2

C.I. is given by:

Y.1 Y.2 t / 2, s.e.(Y.1 Y.2 )

Standard error:

s.e. Y.1 Y.2 S p

1
1

R1 R2
38

Independent Sampling with Unequal Variances


[Comparison of 2 systems]

If the assumption of equal variances cannot safely be made, an


approximate 100(1-)% C.I. for can be computed as:

s.e. Y.1 Y.2

S12 S 22

R1 R2

With degrees of freedom:

/ R1 S 22 / R2

S 2 / R 2 / R 1 S 2 / R
1
1 1
2 2

2
1

/ R
2

, round to an interger

Minimum number of replications R1 > 7 and R2 > 7 is recommended.

39

Common Random Numbers (CRN)


[Comparison of 2 systems]

For each replication, the same random numbers are used to simulate
both systems.
For each replication r, the two estimates, Yr1 and Yr2, are correlated.
However, independent streams of random numbers are used on different

replications, so the pairs (Yr1 ,Ys2 ) are mutually independent.

Purpose: induce positive correlation between Y.1,Y.2 (for each r) to


reduce variance in the point estimator of Y.1 Y.2 .

V Y.1 Y.2 V Y.1 V Y.2 2 cov Y.1 , Y.2

12
R

22
R

2 12 1 2
R

12 is positive

Variance of Y.1 Y.2 arising from CRN is less than that of independent

sampling (with R1=R2).


40

Common Random Numbers (CRN)


[Comparison of 2 systems]

The estimator based on CRN is more precise, leading to a


shorter confidence interval for the difference.

Sample variance of the differences

S D2

Dr D
R 1 r 1

where Dr Yr1-Yr 2

Standard error:

D Y.1 Y.2 :

1
2
2

Dr RD

R 1 r 1

1
and D
R

D ,
r

with degress of freedom R - 1

r 1

s.e.(D ) s.e. Y.1 Y.2

SD
R
41

Common Random Numbers (CRN)


[Comparison of 2 systems]

It is never enough to simply use the same seed for the


random-number generator(s):
The random numbers must be synchronized: each
random number used in one model for some purpose
should be used for the same purpose in the other
model.
e.g., if the ith random number is used to generate a
service time at work station 2 for the 5th arrival in
model 1, the ith random number should be used for
the very same purpose in model 2.
42

Comparison of Several System Designs


To compare K alternative system designs based on some specific
performance measure, qi, of system i , for i = 1, 2, , K.
Procedures are classified as:
Fixed-sample-size procedures: predetermined sample size is used to
draw inferences via hypothesis tests of confidence intervals.
Sequential sampling (multistage): more and more data are collected
until an estimator with a pre-specified precision is achieved or until one
of several alternative hypotheses is selected.

Some goals/approaches of system comparison:


Estimation of each parameter q.
Comparison of each performance measure qi, to control q1.
All pairwise comparisons, qi - qj, for all i not equal to j
Selection of the best qi.

43

Bonferroni Approach
[Multiple Comparisons]

To make statements about several parameters simultaneously,


(where all statements are true simultaneously).
Bonferroni inequality:
C

P(all statements Si are true, i 1, ...,C ) 1 j 1 E


j 1

Overall error probability, provides an upper bound on the


probability of a false conclusion

The smaller j is, the wider the jth confidence interval will be.

Major advantage: inequality holds whether models are run with


independent sampling or CRN
Major disadvantage: width of each individual interval increases
as the number of comparisons increases.
44

Bonferroni Approach

[Multiple Comparisons]

Should be used only when a small number of comparisons are made.


Practical upper limit: about 10 comparisons

3 possible applications:
Individual c.i.s: Construct a 100(1- j)% c.i. for parameter qi,
where # of comparisons = K.
Comparison to an existing system: Construct a 100(1- j)%
c.i. for parameter qi- q1 (i = 2,3, K), where # of comparisons
= K 1.
All pairwise: For any 2 different system designs, construct a
100(1- j)% c.i. for parameter qi- qj. Hence, total # of
comparisons = K(K 1)/2.
45