Professional Documents
Culture Documents
9/20/2012
Simulation
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 2
Input Modelling
The input data is the driving force for the simulation - the behavior of the simulation and all the results/conclusions that can be reached depend on appropriate inputs
9/20/2012 Simulation 3
Motivation All performance techniques must make some assumptions regarding traffic or workload
Queuing models: arrival and service processes Simulation: traffic generators Experimental: work load generators
Assumptions are captured in traffic models Modern traffic models need to be used
9/20/2012
Simulation
Requirements
Specified in terms of services of the system under study: types, interarrival times, and resource demands of service requests Representative Relevant:
To find the maximum throughput you have to saturate the system To test a congestion control algorithm you need congestion To test convergence speed of an adaptive algorithm you need workload variations
Reproducible
9/20/2012
Simulation
Overview Real-world" workloads typically have random components, e.g. random interarrival times Two fundamental types of workloads:
9/20/2012
Simulation
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 7
9/20/2012
Simulation
set tfile [new Tracefile] $tfile filename <file> set src [new Application/Traffic/Trace] $src attach-tracefile $tfile
<file>:
Binary format (native!) inter-packet time (msec) and packet size (byte)
9/20/2012
Simulation
Advantages of Trace Driven Simulation The workload is realistic", which increases credibility of your results
The workload has full stochastic complexity: correlations, instationarity, changing distributions, . . .
9/20/2012
Simulation
10
9/20/2012
Simulation
11
Disadvantages of Trace Driven Simulation Trace files can grow large, making their handling rather clumsy You need several trace files to represent different workloads A single trace represents just a single operational scenario of the system How to obtain independent replications? Traces could not react on feedback from the underlying system, they are open loop
trace would change the packets on the network, and TCP is in turn influenced by the packets
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 13
9/20/2012
Simulation
14
Even when later on more complex stochastic processes are used, the iid sequence is often a reasonable first shot"
9/20/2012 Simulation 15
We don't have data at hand: guess a distribution We have measured data at hand: infer a
distribution
9/20/2012
Simulation
16
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 17
Guessing a Distribution (1/2) Apply a-priori knowledge about the workload and certain heuristics:
9/20/2012
9/20/2012
Simulation
19
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 20
Deterministic traffic is not bursty Poisson process (in continuous time) and
Bernoulli process (in discrete time) are bursty as single processes
Burstiness of Poisson and Bernoulli processes decreases when aggregating traffic sources Other traffic sources exhibit more burstiness
9/20/2012 Simulation 21
Importance of Burstiness Peak traffic demands on buffer resources can lead to overflow and lost traffic Peak demands may create quality of service (QoS) problems in a network Need to characterize burstiness for traffic sources in a QoS environment
9/20/2012
Simulation
22
Models
Bit rate Burst Burst Bursty data
inactive interval
time
Bit rate
Batch Arrivals Some sources transmit in packet bursts May be better modeled by a batch arrival process (e.g., bursts of packets arriving according to a Poisson process) The case for a batch model is weaker at queues after the first, because of shaping
Interarrival Times
Time
Time
The departure traffic from a bottleneck is more regular than the arrival traffic
High data rates in ftp download but less activity between downloads http: activities after mouse-clicks Video streaming: high data rates in frame transmissions Interactive Voice: talk and silent periods Bulk Arrival processes MMPPs
Model Modifications:
GMPP (1/2)
A Poisson arrival process, but with timevarying arrival rate, (t) Doubly stochastic process Switched Poisson process
9/20/2012
Simulation
29
GMPP (2/2)
Rate is controlled by the modulating process (here two-state Markov chain, but could be more states) The arrival process is Poisson for given rate
9/20/2012
Simulation
30
On and Off period durations are random variables (e.g. exponentially distributed) Sometimes called
Source Types Voice sources Video sources File transfers Web traffic Interactive traffic Different application types have different QoS requirements, e.g., delay, jitter, loss, throughput, etc.
QoS Requirements
Delay < ~150 ms Jitter < ~30 ms Packet loss < ~1%
Model
* Two-state (on-off) Markov Modulated Rate Process (MMRP) * Exponentially distributed time at each state
Video
Delay < ~ 400 ms Jitter < ~ 30 ms Packet loss < ~1% Zero or near-sero packet loss Delay may be important
Interactive
FTP telnet web
MPEG1 Video Source Model The MPEG1 MMRP model can be extremely bursty, and has long range dependency behavior due to the deterministic frame sequence
Diagram Source: Mark W. Garrett and Walter Willinger, Analysis, Modeling, and Generation of Self-Similar VBR Video Traffic, BELLCORE, 1994
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 36
Inferring a Distribution There is data available and we want to find a well known theoretical distribution (normal, exponential, binomial, Bernoulli, geometric, . . . ) that fits the data best" There are three steps involved:
1. Identify candidate distribution (normal vs.
exponential vs. . . . ) 2. estimate the parameters from the observations 3. determine how good the estimated distribution ts your data (Goodness-of-fit tests") 4. If the test is not satisfactory, go back to the first step
9/20/2012 Simulation 37
9/20/2012
Simulation
38
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 39
9/20/2012
Simulation
40
Problem :appropriate choice of number of intervals k Rule of thumb: k=int(1+log2n) Sometimes doesnt work well. Recommendation:
Try several values Choose one that gives a smooth histogram with a shape similar to the pdf of one of the standard distributions
41
Sample Histograms
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 45
Parameter Estimation
The sample mean, X, is defined by
X = ( Xi ) / n
i=1 n 2 2 i 2
(1)
S = ( X n X ) /( n 1) (2)
i =1
46
X = ( f jX j ) / n
j=1
k j=1
(3)
S2 = ( f jX 2 n X 2 ) /( n 1) (4) j
47
48
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 49
Empirical observation
Formulate hypotheses
9/20/2012
Simulation
50
Formulate hypotheses
CHANCE
Accept hypothesis Reject hypothesis
Quantile-Quantile Plots If X is a RV with CDF F, the q-quantile of X is the value such that F() = P(X < ) = q Raw data {xi} Data rearranged by magnitude {yj} Then: yj is an estimate of the (j-1/2)/n quantile of X, i.e. yj ~ F-1[(j-1/2)/n]
9 9/20/2012 Simulation 52
Quantile-Quantile Plots If F is a member of an appropriate family then a plot of yj vs. F-1[(j-1/2)/n] is a straight line If F also has the appropriate parameter values the line has a slope = 1 .
10 9/20/2012 Simulation 53
Quantile-Quantile Plots :Example Check whether the measured times follows a normal distribution (the observations are sorted in increased order)
j 1 2 3 4 5 Value 99.55 99.56 99.62 99.65 99.79 j 6 7 8 9 10 Value 99.98 100.02 100.06 100.17 100.21 j 11 12 13 14 15 Value 100.26 100.27 100.33 100.41 100.47
9/20/2012
Simulation
54
9/20/2012
Simulation
55
Testing of hypotheses The hypotheses are: H0: the random variable, X, conforms to the distributional assumption with the parameter(s) given by the parameter estimate(s) HA: the random variable, X, does not conform
9/20/2012
Simulation
56
- level of significance
9/20/2012 Simulation
2 Test
9/20/2012
Simulation
58
k N
2 2, ,Nk K 1 < s 1
2
59
60
>
2 0
2 , k s 1
9/20/2012
Simulation
61
62
Since the number of automobiles is a discrete variable, and since there are sample data, the histogram can have a cell for each possible value in the range of data. The resulting histogram is shown on the next slide
63
64
Since the histogram of the data appeared to follow a Poisson distribution, the parameter, = 3.64, was determined. Thus, the following hypotheses are formed: H0: the random variable is Poisson distributed H1: the random variable is not Poisson distributed
65
66
27.68
67
9/20/2012
Simulation
70