Professional Documents
Culture Documents
Distributed 1/29/03
Robert H. Storer
http://www.lehigh.edu/~rhs2/ie305/ie305.html
1.We will not have to throw away data from many initial
transients, thus saving computer time in steady state
simulations.
2.We may be able to provide useful statistics to people who
do not have the statistical background to understand
replicates, correlation, etc.
The data points are not independent, but rather are correlated.
The estimate S2 will be biased, and the confidence interval wrong.
The idea is that if the batches are big enough, the batch averages
will not be highly correlated.
Once we have the batch averages, we can treat them in the same
fashion as if they were averages from independent replicates.
Of course we have to make sure that the batches are big enough so
that there is little correlation between batch averages.
The means and variances of the two random variables alone do not
capture an important behavioral feature, namely the degree to
which the two random variables are related.
i.e. are they independent, or how dependent are they? There are
two essentially equivalent measures that attempt to capture this
relationship, covariance and correlation.
The covariance between two random variables X and Y is given
by:
COV(X,Y) = E{[X-E(X)](Y-E(Y)]}
= (x-x)(y-y)fXY(x,y)dxdy
Note that the covariance between a random variable and itself is:
rXY = COV(X,Y) / V ( X )V (Y )
x
x
0<rXY<1 -1<rXY<0
y y
x x
rXY = 1 rXY = -1
(Y - Y )(X - X )
i =1
i i
rXY =
N N
(Y - Y ) (X - X )
i =1
i
2
i =1
i
2
COV(Xi , Xi+k)
V( X ) = (g0/n)[ 1 + 2 (1 - k/n)rk]
k
Example
Suppose we have data on the time in queue for 100 consecutive
customers. Suppose the time in queue random variable has the
following parameters:
= 50
s2 = g 0 = 4
r1 = .94, r2 = .86, r3 = .71, r4 = .65, r5 = .55,
r6 = .43, r7 = .28, r8 = .19, r9 = .05
All other autocorrelations are 0
V( X ) = (g0/n)[ 1 + 2 (1 - k/n)rk]
i =1
50 (1.96)Sqrt(4/100) = 50 0.392
50 (1.96)Sqrt((4/100)(9.987)) = 50 0.632
Aside: You may observe that the equation above will give a good,
unbiased estimate of the true variance. We would have to plug our
estimates of correlation rk into the equation in place of the rk
(which are unknown). Using this variance estimate, we can
construct a fairly valid confidence interval on the mean. This
general approach forms the basis of what is called the
Standardized Time Series approach to output analysis.
From the correlogram picture above we can see that after roughly
lag 10, the correlation has died out. This gives us some indication
how big our batches have to be! Certainly batches of 10 is not big
enough.
Rule of Thumb
The method of batch means works pretty much the same way for
time persistent variables as it does for observation based (i.e. tally)
variables.
To decide how long (i.e. how much time) each batch should be, the
rule of thumb is that the batch should be long enough so that the
variable changes value at least 40 times.
Correlation Test and ARENA Output
ARENA divides the output data into 20 batches, and then uses the
20 batch means to calculate a confidence interval.
If your run is too short so that the 20 batches are not sufficient to
wash out the correlation, ARENA will print a message under the
confidence interval half width column indicating this.
2.1 5.6 8.0 7.5 7.2 7.0 7.4 7.8 8.3 8.8 8.4 8.0
8
7.5
7.2
7
7.4
7.8
8.3
8.8
8.4
8
average= 7.84
8
7.5
7.2
7
7.4
7.8
8.3
8.8
8.4
8
average= 7.84
std dev= 0.56999
10
0
1 2 3 4 5 6 7 8 9 10
Far from appearing random, the plot above would seem to indicate
that there exists positive correlation. A correlogram would verify
this.