You are on page 1of 20

CPS 424/552 Discrete-Event Simulation Techniques Spring 2012

Chapter 4.1 Sample Statistics


Zhongmei Yao Department of Computer Science University of Dayton 2-16-2012

Review: Chapter 1 Models


1.1 Introduction
Model characterization, development

1.2 A Single-Server Queue


1) Conceptual model 2) Specification model 3) Output statistics 4) Computational model

1.3 A Simple Inventory System


Conceptual model, specification model Output statistics Computational model

Textbook copyright 2006, Prentice Halls

Review: Chapter 2 RNG


2.1 Lehmer Random-Number Generators
Introduction

2.2 Lehmer Random-Number Generators


Implementation

2.3 Monte Carlo Simulation 2.4 Monte Carlo Simulation Examples 2.5 Finite-State Sequences

Textbook copyright 2006, Prentice Halls

Review: Chapter 3 DES


3.1 Discrete-event simulation
Exponential random variate, geometric random variate

3.2 Multi-stream Lehmer RNGs


Streams, examples

3.3 Discrete-event simulation examples


SSQ with immediate feedback Simple inventory systems with delivery lag Single-server machine shop

Textbook copyright 2006, Prentice Halls

Chapter 4 Statistics
4.1 Sample statistics
Sample mean, sample standard deviation, examples

4.2 Discrete-data histograms


Histograms, empirical cumulative distribution functions

4.3 Continuous-data histograms


Histograms, empirical cumulative distribution functions

4.4 Correlation

Textbook copyright 2006, Prentice Halls

Chapter Overview
Discrete-event simulations generate a lot of experimental data This chapter considers how we can compress data into meaningful statistics and interpret sample statistics A sample is data collected from a much larger population If the size of sample is small, essentially all that can be done is compute the sample mean and standard deviation
Section 4.1

If the size of sample is not small, a sample-data histogram can be computed and then used to analyze the distribution of data in the sample
Section 4.2 and 4.3

Sample Mean and Standard Deviation


How to collect data in DES?
Within-the-run (e.g., job avg and time avg used to characterize the

performance of a SSQ system) Between-the-run: simulate the system repeatedly by simply changing the initial seed from run to run

Def. 4.1.1: Given a sample x1, x2, , xn (continuous or discrete)


Sample mean: Sample variance: The sample standard deviation:

Sample Mean and Standard Deviation


Sample mean: a measure of central tendency of data values Sample variance and sample standard deviation are measures of dispersion
The spread of data about the sample If the unit of the data is sec, then the units of the sample mean and

sample standard deviation are sec as well

variance mean

From http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg

Sample Variance
A common alternative definition of the sample variance s2:
rather than

The 1/(n 1) version appears universally


The s2 is undefined for n = 1 The 1/(n 1) form is an unbiased estimate of the population

variance (means that the sample variance converges to the population variance)

Why consider the 1/n form?


The sample size n is typically large in simulations If n is large, the difference is negligible We will use the 1/n version

Relating the Mean and Standard Deviation


The root-mean-square (rms) function d(x) measures dispersion about any value x

Theorem 4.1.1
The sample mean gives the smallest possible value for d(x) The standard deviation s is that smallest value:

Relating the Mean and Standard Deviation


Example 4.1.1:
Collect 50 observations The sample mean is 1.095 The sample standard deviation is 0.354

The smallest value of d(x) is s, as shown in the figure

Chebyshevs Inequality
To better understand how the mean and s are related, consider the number of points that lie within k standard deviations of the mean
The parameter k > 1

Let the set contain the points satisfying:

2ks

Let pk = |Sk| / n be the proportion of xi that lie within ks of the mean Chebyshevs inequality states: pk 1 1/k2

Chebyshevs Inequality
For k = 2, we have from Chebyshevs inequality that pk 1 = 75%
For any sample, at least 75% of data values lie within 2s of the

sample mean. What is pk for k = 3? Example 4.1.1: 95% of points lie within 2s of the sample mean 4s

Chebyshevs is very conservative for k = 2

Chebyshevs inequality and practical experience suggest that the is the effective width of a sample
Most (but not all) points will lie in this interval Outliers must be viewed with suspicion

Linear Data Transformation


Often the output data generated by simulations should be converted to different units
Example 4.1.2: Suppose x1, x2, , xn measured in seconds. To

convert to minutes, we let xi = xi / 60

Let xi = a xi + b be the new data Sample mean:

Sample variance:

Sample standard deviation:

Linear Data Transformation


Example 4.1.2: Suppose x1, x2, , xn measured in seconds. To convert to minutes, we let xi = xi / 60
Given

is 45 sec, what is ? Given s is 15 sec, what is s ?

Example 4.1.3: Standardize data by subtracting the sample mean and dividing the result by s
For sample x1, x2,

, xn , standardized sample is

Used to avoid issues with vary large (or small) valued data What is

? What is s ?

Nonlinear Data Transformation


When data is used to generate a Boolean (1 or 0) outcome, we need nonlinear data transformation
The value of xi is not important as the effect E.g., consider the effect: it will rain tomorrow. How much rain we

will have is not important

Let A be a fixed set and

Let p be the proportion of xi that fall in A

Then,

and

Nonlinear Data Transformation


Example 4.1.4: A SSQ system
Let xi = di be the queueing delay for job i Let A = R+ be the set of all positive numbers Then xi = 1 if and only if di > 0 From Exercise 1.2.3, proportion of jobs delayed is p = 0.723 Therefore,

= 0.723 What about s ?

Computational Considerations
Recall that the sample standard deviation is given by

Require two passes through the data

1. Compute the sample mean 2. Compute the squared differences

The two-pass approach is undesirable for large n since we need to temporarily store data
Can we find a one-pass algorithm for computing s?

Conventional One-Pass Algorithm


A one-pass equation for s2:

Thus, s2 can be computed in one pass by accumulating these two

partial sums:

Next Time
Section 4.1
Welfords one-pass algorithm Time-Averaged Sample Statistics

Section 4.2

You might also like