שפות סימולציה- הרצאה 15 - Traffic Models I

Traffic Models -I Models are to be used, not believed.
H. Theil `Principles of Econometrics'
9/20/2012
Simulation
Agenda
Motivation Trace Driven Simulation Synthetic Traffic
Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)
Modern Traffic Models

9/20/2012
Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 2
Input Modelling
The input data is the driving force for the simulation - the behavior of the simulation and all the results/conclusions that can be reached depend on appropriate inputs
9/20/2012 Simulation 3
Motivation All performance techniques must make some assumptions regarding traffic or workload
Queuing models: arrival and service processes Simulation: traffic generators Experimental: work load generators
Assumptions are captured in traffic models Modern traffic models need to be used
9/20/2012
Simulation
Requirements
Specified in terms of services of the system under study: types, interarrival times, and resource demands of service requests Representative Relevant:
To find the maximum throughput you have to saturate the system To test a congestion control algorithm you need congestion To test convergence speed of an adaptive algorithm you need workload variations
Reproducible
9/20/2012
Simulation
Overview Real-world" workloads typically have random components, e.g. random interarrival times Two fundamental types of workloads:
Real workloads" / traces Synthetic workloads: artificially generated

stochastic processes
9/20/2012
Simulation
Agenda

9/20/2012
Simulation 7
Trace Driven Simulation Basic idea:
Conduct a measurement at a real system,

observe the service requests (type, arrival times, resource demands, further parameters) Save all this into a log file / trace file Feed the simulation from the trace file
The real system must be available and accessible to obtain traces
9/20/2012
Simulation
Trace Driven Traffic in NS-2 Trace driven
set tfile [new Tracefile] $tfile filename <file> set src [new Application/Traffic/Trace] $src attach-tracefile $tfile
<file>:
Binary format (native!) inter-packet time (msec) and packet size (byte)
9/20/2012
Simulation
Advantages of Trace Driven Simulation The workload is realistic", which increases credibility of your results
You don't need to find (and to defend your

choice) appropriate parameters like for synthetic workloads
The workload has full stochastic complexity: correlations, instationarity, changing distributions, . . .
9/20/2012
Simulation
10
Trace-driven simulation example

Background traffic
9/20/2012
Simulation
11
Disadvantages of Trace Driven Simulation Trace files can grow large, making their handling rather clumsy You need several trace files to represent different workloads A single trace represents just a single operational scenario of the system How to obtain independent replications? Traces could not react on feedback from the underlying system, they are open loop
Example: running TCP over a recorded Ethernet

trace would change the packets on the network, and TCP is in turn influenced by the packets
Agenda

9/20/2012
Simulation 13
Synthetic Traffic: Overview (1/3)

Synthetic workloads are an alternative to traces The workload is generated by a dedicated piece of software (simulation module) which creates a stochastic process out of random variables As compared to a trace, the stochastic processes used in workload generation can be characterized by very few parameters
Example: a Poisson process is characterized by a single parameter > 0
9/20/2012
Simulation
14
Synthetic Traffic: Overview (2/3)

Most commonly used stochastic process in workload modeling: a sequence of iid random variables for inter arrival times (renewal process):
Advantages: conceptually simple, easy to implement Disadvantages:

Correlation not modeled (network traffic is often long-range dependent) If the parameters of the random variables have to be estimated from a trace (from observations), we have to assume iid observation which is often not true Example: interarrival times of http get requests are grouped due to embedded objects
Even when later on more complex stochastic processes are used, the iid sequence is often a reasonable first shot"
Synthetic Traffic: Overview (3/3) Two possibilities of generating iid sequences
We don't have data at hand: guess a distribution We have measured data at hand: infer a
distribution
9/20/2012
Simulation
16
Agenda

9/20/2012
Simulation 17
Guessing a Distribution (1/2) Apply a-priori knowledge about the workload and certain heuristics:
Inter arrival times are always non-negative and

finite normal distribution is no candidate Data is known to be within some range [a; b] uniform distribution U(a; b) might be a candidate The range is finite ([a; b]) and a particular value in [a; b] occurs most frequently triangular distribution could be chosen The range is finite and you want to vary mean and variance arbitrarily within this range choose a Beta distribution
Simulation 18
9/20/2012
Guessing a Distribution (2/2)

The arrivals can be thought of as coming from a large population of independent individuals, each of which arrives with small probability (e.g. customers arriving to a bank) Poisson arrivals exponential inter arrival times The sum of a large number of more or less similar (and not necessarily independent components) Normal distribution (due to central limit theorem) You want to compare simulation results with analytical results Exponential distribution
9/20/2012
Simulation
19
Agenda

9/20/2012
Simulation 20
Burstiness A bursty source generates traffics in random clusters
Deterministic traffic is not bursty Poisson process (in continuous time) and
Bernoulli process (in discrete time) are bursty as single processes
Burstiness of Poisson and Bernoulli processes decreases when aggregating traffic sources Other traffic sources exhibit more burstiness
Importance of Burstiness Peak traffic demands on buffer resources can lead to overflow and lost traffic Peak demands may create quality of service (QoS) problems in a network Need to characterize burstiness for traffic sources in a QoS environment
9/20/2012
Simulation
22
Models
Bit rate Burst Burst Bursty data
active interval Bit rate talk spurt
inactive interval
time
Constant bit rate (CBR) voice silence interval time
Bit rate
Variable bit rate (VBR) video time

23
Batch Arrivals Some sources transmit in packet bursts May be better modeled by a batch arrival process (e.g., bursts of packets arriving according to a Poisson process) The case for a batch model is weaker at queues after the first, because of shaping
Interarrival Times
Time
Bottlenecks Types of bottlenecks
At access points (flow control, prioritization,

QoS enforcement needed) At points within the network core Isolated (can be analyzed in isolation) Interrelated (network or chain analysis needed)
Bottlenecks result from overloads caused by:
High load sessions, or Convergence of sufficient number of moderate

load sessions at the same queue
Bottlenecks Cause Shaping
Time
The departure traffic from a bottleneck is more regular than the arrival traffic
Models for Bursty Traffic

Poisson assumption for packet arrivals may be applicable for highly aggregated traffic (core networks), but otherwise traffic tends to be bursty
High data rates in ftp download but less activity between downloads http: activities after mouse-clicks Video streaming: high data rates in frame transmissions Interactive Voice: talk and silent periods Bulk Arrival processes MMPPs
Model Modifications:
Bulk Arrival Process

Queue-length at arrival instances increases not only by 1, but by a Random Variable B, the bulksize Parameter set of model

Bulk arrival process, e.g. exponential with rate Bulk-Size distribution: pi (e.g. geometric) Service rate (single packet)
E.g. M(B)/M/1 queue with geometrically distributed B
GMPP (1/2)
Generally Modulated Poisson Process (GMPP)
A Poisson arrival process, but with timevarying arrival rate, (t) Doubly stochastic process Switched Poisson process
9/20/2012
Simulation
29
GMPP (2/2)
Rate is controlled by the modulating process (here two-state Markov chain, but could be more states) The arrival process is Poisson for given rate
9/20/2012
Simulation
30
On-Off Traffic Source There are two states on and off
The rate in the off state is 0 and the rate in

theon state is greater than 0
On and Off period durations are random variables (e.g. exponentially distributed) Sometimes called
On/off source Burst/silence model Talkspurt/silence model (voice)

Source Types Voice sources Video sources File transfers Web traffic Interactive traffic Different application types have different QoS requirements, e.g., delay, jitter, loss, throughput, etc.
Source Type Properties

Characteristics Voice
* Alternating talkspurts and silence intervals. * Talk-spurts produce constant packet-rate traffic * Highly bursty traffic (when encoded) * Long range dependencies * Poisson type * Sometimes batcharrivals, or bursty, or sometimes on-off
QoS Requirements
Delay < ~150 ms Jitter < ~30 ms Packet loss < ~1%
Model
* Two-state (on-off) Markov Modulated Rate Process (MMRP) * Exponentially distributed time at each state
Video
Delay < ~ 400 ms Jitter < ~ 30 ms Packet loss < ~1% Zero or near-sero packet loss Delay may be important
K-state (on-off) Markov Modulated Rate Process (MMRP)
Interactive
FTP telnet web
Poisson, Poisson with batch arrivals, Two-state MMRP
After Prof. Dimitri P. Bertsekas Department of Electrical Engineering M.I.T.
Typical Voice Source Behavior
After Prof. Dimitri P. Bertsekas Department of Electrical Engineering M.I.T.
MPEG1 Video Source Model The MPEG1 MMRP model can be extremely bursty, and has long range dependency behavior due to the deterministic frame sequence
Diagram Source: Mark W. Garrett and Walter Willinger, Analysis, Modeling, and Generation of Self-Similar VBR Video Traffic, BELLCORE, 1994
Agenda

9/20/2012
Simulation 36
Inferring a Distribution There is data available and we want to find a well known theoretical distribution (normal, exponential, binomial, Bernoulli, geometric, . . . ) that fits the data best" There are three steps involved:
1. Identify candidate distribution (normal vs.
exponential vs. . . . ) 2. estimate the parameters from the observations 3. determine how good the estimated distribution ts your data (Goodness-of-fit tests") 4. If the test is not satisfactory, go back to the first step
Developing a Model of Input Data
9/20/2012
Simulation
38
Agenda

9/20/2012
Simulation 39
Selecting the Family of the Distribution

There are a large number of probability distributions
generated from observations of the real world proposed as theoretical models
What is known about the physical characteristics of the input process?

Is it naturally discrete or continuous valued? Are the observable values inherently bounded or is there no natural bound? Can you infer a distribution from what you know about the process that generates input values?
Normal (Gaussian) process is derived from the sum of a large numberof independent random variables Erlang process is sum of several exponential processes Lognormal process is derived from product of several componentprocesses Poisson process models the number of independent events that occur in a bounded period of time or area in space.
9/20/2012
Simulation
40
Identifying the Distribution
Histograms may infer a known pdf or pmf.

Example: Exponential, Normal, and Poisson distributions are frequently encountered, and less difficult to analyze.
Problem :appropriate choice of number of intervals k Rule of thumb: k=int(1+log2n) Sometimes doesnt work well. Recommendation:
Try several values Choose one that gives a smooth histogram with a shape similar to the pdf of one of the standard distributions
41
Sample Histograms
Coarse, ragged, and appropriate histogram

6 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 20 22 24
(1) Original Data - Too ragged

42
Sample Histograms (cont.)

25 20 15 10 5 0 0~7 8 ~ 15 16 ~ 24
(2) Combining adjacent cells - too coarse

43
Sample Histograms (cont.)

12 10 8 6 4 2 0 0~2 3~5 6~8 9~11 12~14 15~17 18~20 21~24
(3) Combining adjacent cells - appropriate

44
Agenda

9/20/2012
Simulation 45
Parameter Estimation
The sample mean, X, is defined by
X = ( Xi ) / n
i=1 n 2 2 i 2
(1)
And the sample variance, S2, is defined by
S = ( X n X ) /( n 1) (2)
i =1
46
Parameter Estimation (cont.)

If the data are discrete and grouped in a frequency distribution, Eq1 and Eq2 can be modified to provide for much greater computational efficiency. The sample mean can be computed by
X = ( f jX j ) / n
j=1
k j=1
(3)
And the sample variance, S2, is defined by
S2 = ( f jX 2 n X 2 ) /( n 1) (4) j
47
Suggested Estimators for distr. often used in Simulation
48
Agenda

9/20/2012
Simulation 49
Statistical inference. Role of chance.

Scientific knowledge
Reason and intuition
Empirical observation
Formulate hypotheses
Collect data to test hypotheses
9/20/2012
Simulation
50
Statistical inference. Role of chance.

Systematic error
Formulate hypotheses
Collect data to test hypotheses
CHANCE
Accept hypothesis Reject hypothesis
Random error (chance) can be controlled by statistical significance or by confidence interval

Quantile-Quantile Plots If X is a RV with CDF F, the q-quantile of X is the value such that F() = P(X < ) = q Raw data {xi} Data rearranged by magnitude {yj} Then: yj is an estimate of the (j-1/2)/n quantile of X, i.e. yj ~ F-1[(j-1/2)/n]
9 9/20/2012 Simulation 52
Quantile-Quantile Plots If F is a member of an appropriate family then a plot of yj vs. F-1[(j-1/2)/n] is a straight line If F also has the appropriate parameter values the line has a slope = 1 .
10 9/20/2012 Simulation 53
Quantile-Quantile Plots :Example Check whether the measured times follows a normal distribution (the observations are sorted in increased order)
j 1 2 3 4 5 Value 99.55 99.56 99.62 99.65 99.79 j 6 7 8 9 10 Value 99.98 100.02 100.06 100.17 100.21 j 11 12 13 14 15 Value 100.26 100.27 100.33 100.41 100.47
9/20/2012
Simulation
54
Quantile-Quantile Plots :Example

yj are plotted versus F-1( (j-0.5)/n) where F has a normal distribution with the sample mean (99.99 sec) and sample variance (0.28322 sec2)
9/20/2012
Simulation
55
Testing of hypotheses The hypotheses are: H0: the random variable, X, conforms to the distributional assumption with the parameter(s) given by the parameter estimate(s) HA: the random variable, X, does not conform
9/20/2012
Simulation
56
Testing of hypotheses Type I and Type II Errors

No study is perfect, there is always the chance for error
Decision
"Reality"
H0 true / HA false OK p=1- Type I error () ("Missing target") p=
H0 false / HA true Type II error () ("False alarm") p= OK p=1-
Accept H0 / reject HA Reject H0 /accept HA
- level of significance
9/20/2012 Simulation
1- - power of the test

57
2 Test
9/20/2012
Simulation
58
Goodness-of-fit Test, Chi-Square Test

i = histogram cell i = 1, , k n = number of observations Oi = real number of observations in cell i Pi = theoretical number of observations in cell i = F(ai) - F(a i-1) ai = upper bound of cell i a i-1 = lower bound of cell i F(x) = cumulative density function = cdf
= (oi npi ) / npi

2 2 i =1
k N
Then cdf is a good fit if s = number of parameters of distribution

9/20/2012 Simulation
2 2, ,Nk K 1 < s 1
2
59
Goodness-of-Fit Tests Chi-Square Test (cont.)

Recommendations for number of class intervals for continuous data Sample Size, Number of Class Intervals, n k 20 Do not use the chi-square test 50 5 to 10 100 10 to 20 >100 n to n/5
60
Goodness-of-fit Test, Chi-Square Test

2 ,k s 1 is found in The critical value
statistical tables. The null hypothesis, H0, is rejected if
>
2 0
2 , k s 1
9/20/2012
Simulation
61
Chi-Square Test Example (1/8)

The number of vehicles arriving at the northwest corner of an intersection in a 5-minute period between 7:00 a.m. and 7:05 a.m. was monitored for five workdays over a 20-week period. Following table shows the resulting data. The first entry in the table indicates that there were 12 5minute periods during which zero vehicles arrived, 10 periods during which one vehicle arrived, and so on.
62

Arrivals per Period 0 1 2 3 4 5 Frequency 12 10 19 17 10 8 Arrivals per Period 6 7 8 9 10 11 Frequency 7 5 5 3 3 1
Since the number of automobiles is a discrete variable, and since there are sample data, the histogram can have a cell for each possible value in the range of data. The resulting histogram is shown on the next slide
63

Histogram of number of arrivals per period
20 18 16 14 12 10 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 11
Number of arrivals per period
64
Since the histogram of the data appeared to follow a Poisson distribution, the parameter, = 3.64, was determined. Thus, the following hypotheses are formed: H0: the random variable is Poisson distributed H1: the random variable is not Poisson distributed
65

The pmf for the Poisson distribution was given: (e- x) / x! , x = 0, 1, 2 ... p(x) = 0 , otherwise For = 3.64, the probabilities associated with various values of x are obtained using equation above with the following results.
p(0) = 0.026 p(1) = 0.096 p(2) = 0.174 p(3) = 0.211 p(4) = 0.192 p(5) = 0.140 p(6) = 0.085 p(7) = 0.044 p(8) = 0.020 p(9) = 0.008 p(10) = 0.003 p(11) = 0.001
66

Chi-square goodness-of fit test for example
xi 0 1 2 3 4 5 6 7 8 9 10 11 Observed Frequency, Oi 12 10 22 19 17 10 8 7 5 5 3 17 3 1 100 Expected Frequency, Ei 2.6 9.6 12.2 17.4 21.1 19.2 14.0 8.5 4.4 2.0 0.8 7.6 0.3 0.1 100.0 (Oi - Ei)2 / Ei 7.87 0.15 0.80 4.41 2.57 0.26 11.62
27.68
67

With this results of the probabilities, table on the previous slide is constructed. The value of E1 is given by np1 = 100 (0.026) = 2.6. In a similar manner, the remaining Ei values are determined. Since E1 = 2.6 < 5, E1 and E2 are combined ( empiric rule) In that case O1 and O2 are also combined and k is reduced by one. The last five class intervals are also combined for the same reason and k is further reduced by four.
68

The calculated is 27.68. The degrees of freedom for the tabulated value of 2 is k-s-1 = 7-1-1 = 5. Here, s = 1, since one parameter was estimated 2 from the data. At = 0.05, the critical value 0.05,5 is 11.1. Thus, H0 would be rejected at level of significance 0.05. The analyst must now search for a better-fitting model or use the empirical distribution of the data.
2 0
69
Kolmogorov-Smirnov frequency test
9/20/2012
Simulation
70

שפות סימולציה- הרצאה 15 - Traffic Models I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

שפות סימולציה- הרצאה 15 - Traffic Models I

Uploaded by

Copyright:

Available Formats

Traffic Models -I Models are to be used, not believed.

H. Theil `Principles of Econometrics'

Modern Traffic Models

Real workloads" / traces Synthetic workloads: artificially generated

Modern Traffic Models

Trace Driven Simulation Basic idea:

Conduct a measurement at a real system,

The real system must be available and accessible to obtain traces

Trace Driven Traffic in NS-2 Trace driven

You don't need to find (and to defend your

Trace-driven simulation example

Example: running TCP over a recorded Ethernet

Modern Traffic Models

Synthetic Traffic: Overview (1/3)

Example: a Poisson process is characterized by a single parameter > 0

Synthetic Traffic: Overview (2/3)

Advantages: conceptually simple, easy to implement Disadvantages:

Synthetic Traffic: Overview (3/3) Two possibilities of generating iid sequences

Modern Traffic Models

Inter arrival times are always non-negative and

Guessing a Distribution (2/2)

Modern Traffic Models

Burstiness A bursty source generates traffics in random clusters

active interval Bit rate talk spurt

Constant bit rate (CBR) voice silence interval time

Variable bit rate (VBR) video time

Bottlenecks Types of bottlenecks

At access points (flow control, prioritization,

Bottlenecks result from overloads caused by:

High load sessions, or Convergence of sufficient number of moderate

Bottlenecks Cause Shaping

Models for Bursty Traffic

Bulk Arrival Process

E.g. M(B)/M/1 queue with geometrically distributed B

Generally Modulated Poisson Process (GMPP)

On-Off Traffic Source There are two states on and off

The rate in the off state is 0 and the rate in

On/off source Burst/silence model Talkspurt/silence model (voice)

Source Type Properties

K-state (on-off) Markov Modulated Rate Process (MMRP)

Poisson, Poisson with batch arrivals, Two-state MMRP

After Prof. Dimitri P. Bertsekas Department of Electrical Engineering M.I.T.

Typical Voice Source Behavior

After Prof. Dimitri P. Bertsekas Department of Electrical Engineering M.I.T.

Modern Traffic Models

Developing a Model of Input Data

Modern Traffic Models

Selecting the Family of the Distribution

generated from observations of the real world proposed as theoretical models

What is known about the physical characteristics of the input process?

Identifying the Distribution

Histograms may infer a known pdf or pmf.

Coarse, ragged, and appropriate histogram

(1) Original Data - Too ragged

Sample Histograms (cont.)

Coarse, ragged, and appropriate histogram

(2) Combining adjacent cells - too coarse

Sample Histograms (cont.)

Coarse, ragged, and appropriate histogram

(3) Combining adjacent cells - appropriate

Modern Traffic Models

And the sample variance, S2, is defined by

Parameter Estimation (cont.)