Professional Documents
Culture Documents
2.3 Monte Carlo Simulation 2.4 Monte Carlo Simulation Examples 2.5 Finite-State Sequences
Chapter 4 Statistics
4.1 Sample statistics
Sample mean, sample standard deviation, examples
4.4 Correlation
Chapter Overview
Discrete-event simulations generate a lot of experimental data This chapter considers how we can compress data into meaningful statistics and interpret sample statistics A sample is data collected from a much larger population If the size of sample is small, essentially all that can be done is compute the sample mean and standard deviation
Section 4.1
If the size of sample is not small, a sample-data histogram can be computed and then used to analyze the distribution of data in the sample
Section 4.2 and 4.3
performance of a SSQ system) Between-the-run: simulate the system repeatedly by simply changing the initial seed from run to run
variance mean
From http://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
Sample Variance
A common alternative definition of the sample variance s2:
rather than
variance (means that the sample variance converges to the population variance)
Theorem 4.1.1
The sample mean gives the smallest possible value for d(x) The standard deviation s is that smallest value:
Chebyshevs Inequality
To better understand how the mean and s are related, consider the number of points that lie within k standard deviations of the mean
The parameter k > 1
2ks
Let pk = |Sk| / n be the proportion of xi that lie within ks of the mean Chebyshevs inequality states: pk 1 1/k2
Chebyshevs Inequality
For k = 2, we have from Chebyshevs inequality that pk 1 = 75%
For any sample, at least 75% of data values lie within 2s of the
sample mean. What is pk for k = 3? Example 4.1.1: 95% of points lie within 2s of the sample mean 4s
Chebyshevs inequality and practical experience suggest that the is the effective width of a sample
Most (but not all) points will lie in this interval Outliers must be viewed with suspicion
Sample variance:
Example 4.1.3: Standardize data by subtracting the sample mean and dividing the result by s
For sample x1, x2,
, xn , standardized sample is
Used to avoid issues with vary large (or small) valued data What is
? What is s ?
Then,
and
Computational Considerations
Recall that the sample standard deviation is given by
The two-pass approach is undesirable for large n since we need to temporarily store data
Can we find a one-pass algorithm for computing s?
partial sums:
Next Time
Section 4.1
Welfords one-pass algorithm Time-Averaged Sample Statistics
Section 4.2