You are on page 1of 70

Traffic Models -I Models are to be used, not believed.

H. Theil `Principles of Econometrics'

9/20/2012

Simulation

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 2

Input Modelling

The input data is the driving force for the simulation - the behavior of the simulation and all the results/conclusions that can be reached depend on appropriate inputs
9/20/2012 Simulation 3

Motivation All performance techniques must make some assumptions regarding traffic or workload

Queuing models: arrival and service processes Simulation: traffic generators Experimental: work load generators
Assumptions are captured in traffic models Modern traffic models need to be used

9/20/2012

Simulation

Requirements
Specified in terms of services of the system under study: types, interarrival times, and resource demands of service requests Representative Relevant:

To find the maximum throughput you have to saturate the system To test a congestion control algorithm you need congestion To test convergence speed of an adaptive algorithm you need workload variations

Reproducible

9/20/2012

Simulation

Overview Real-world" workloads typically have random components, e.g. random interarrival times Two fundamental types of workloads:

Real workloads" / traces Synthetic workloads: artificially generated


stochastic processes

9/20/2012

Simulation

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 7

Trace Driven Simulation Basic idea:

Conduct a measurement at a real system,


observe the service requests (type, arrival times, resource demands, further parameters) Save all this into a log file / trace file Feed the simulation from the trace file

The real system must be available and accessible to obtain traces

9/20/2012

Simulation

Trace Driven Traffic in NS-2 Trace driven

set tfile [new Tracefile] $tfile filename <file> set src [new Application/Traffic/Trace] $src attach-tracefile $tfile
<file>:

Binary format (native!) inter-packet time (msec) and packet size (byte)

9/20/2012

Simulation

Advantages of Trace Driven Simulation The workload is realistic", which increases credibility of your results

You don't need to find (and to defend your


choice) appropriate parameters like for synthetic workloads

The workload has full stochastic complexity: correlations, instationarity, changing distributions, . . .

9/20/2012

Simulation

10

Trace-driven simulation example


Background traffic

9/20/2012

Simulation

11

Disadvantages of Trace Driven Simulation Trace files can grow large, making their handling rather clumsy You need several trace files to represent different workloads A single trace represents just a single operational scenario of the system How to obtain independent replications? Traces could not react on feedback from the underlying system, they are open loop

Example: running TCP over a recorded Ethernet


9/20/2012 Simulation 12

trace would change the packets on the network, and TCP is in turn influenced by the packets

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 13

Synthetic Traffic: Overview (1/3)


Synthetic workloads are an alternative to traces The workload is generated by a dedicated piece of software (simulation module) which creates a stochastic process out of random variables As compared to a trace, the stochastic processes used in workload generation can be characterized by very few parameters

Example: a Poisson process is characterized by a single parameter > 0

9/20/2012

Simulation

14

Synthetic Traffic: Overview (2/3)


Most commonly used stochastic process in workload modeling: a sequence of iid random variables for inter arrival times (renewal process):

Advantages: conceptually simple, easy to implement Disadvantages:


Correlation not modeled (network traffic is often long-range dependent) If the parameters of the random variables have to be estimated from a trace (from observations), we have to assume iid observation which is often not true Example: interarrival times of http get requests are grouped due to embedded objects

Even when later on more complex stochastic processes are used, the iid sequence is often a reasonable first shot"
9/20/2012 Simulation 15

Synthetic Traffic: Overview (3/3) Two possibilities of generating iid sequences

We don't have data at hand: guess a distribution We have measured data at hand: infer a
distribution

9/20/2012

Simulation

16

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 17

Guessing a Distribution (1/2) Apply a-priori knowledge about the workload and certain heuristics:

Inter arrival times are always non-negative and


finite normal distribution is no candidate Data is known to be within some range [a; b] uniform distribution U(a; b) might be a candidate The range is finite ([a; b]) and a particular value in [a; b] occurs most frequently triangular distribution could be chosen The range is finite and you want to vary mean and variance arbitrarily within this range choose a Beta distribution
Simulation 18

9/20/2012

Guessing a Distribution (2/2)


The arrivals can be thought of as coming from a large population of independent individuals, each of which arrives with small probability (e.g. customers arriving to a bank) Poisson arrivals exponential inter arrival times The sum of a large number of more or less similar (and not necessarily independent components) Normal distribution (due to central limit theorem) You want to compare simulation results with analytical results Exponential distribution

9/20/2012

Simulation

19

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 20

Burstiness A bursty source generates traffics in random clusters

Deterministic traffic is not bursty Poisson process (in continuous time) and
Bernoulli process (in discrete time) are bursty as single processes

Burstiness of Poisson and Bernoulli processes decreases when aggregating traffic sources Other traffic sources exhibit more burstiness
9/20/2012 Simulation 21

Importance of Burstiness Peak traffic demands on buffer resources can lead to overflow and lost traffic Peak demands may create quality of service (QoS) problems in a network Need to characterize burstiness for traffic sources in a QoS environment

9/20/2012

Simulation

22

Models
Bit rate Burst Burst Bursty data

active interval Bit rate talk spurt

inactive interval

time

Constant bit rate (CBR) voice silence interval time

Bit rate

Variable bit rate (VBR) video time


23

Batch Arrivals Some sources transmit in packet bursts May be better modeled by a batch arrival process (e.g., bursts of packets arriving according to a Poisson process) The case for a batch model is weaker at queues after the first, because of shaping
Interarrival Times

Time

Bottlenecks Types of bottlenecks

At access points (flow control, prioritization,


QoS enforcement needed) At points within the network core Isolated (can be analyzed in isolation) Interrelated (network or chain analysis needed)

Bottlenecks result from overloads caused by:

High load sessions, or Convergence of sufficient number of moderate


load sessions at the same queue

Bottlenecks Cause Shaping

Time

The departure traffic from a bottleneck is more regular than the arrival traffic

Models for Bursty Traffic


Poisson assumption for packet arrivals may be applicable for highly aggregated traffic (core networks), but otherwise traffic tends to be bursty

High data rates in ftp download but less activity between downloads http: activities after mouse-clicks Video streaming: high data rates in frame transmissions Interactive Voice: talk and silent periods Bulk Arrival processes MMPPs

Model Modifications:

Bulk Arrival Process


Queue-length at arrival instances increases not only by 1, but by a Random Variable B, the bulksize Parameter set of model

Bulk arrival process, e.g. exponential with rate Bulk-Size distribution: pi (e.g. geometric) Service rate (single packet)

E.g. M(B)/M/1 queue with geometrically distributed B

GMPP (1/2)

Generally Modulated Poisson Process (GMPP)

A Poisson arrival process, but with timevarying arrival rate, (t) Doubly stochastic process Switched Poisson process

9/20/2012

Simulation

29

GMPP (2/2)

Rate is controlled by the modulating process (here two-state Markov chain, but could be more states) The arrival process is Poisson for given rate

9/20/2012

Simulation

30

On-Off Traffic Source There are two states on and off

The rate in the off state is 0 and the rate in


theon state is greater than 0

On and Off period durations are random variables (e.g. exponentially distributed) Sometimes called

On/off source Burst/silence model Talkspurt/silence model (voice)


9/20/2012 Simulation 31

Source Types Voice sources Video sources File transfers Web traffic Interactive traffic Different application types have different QoS requirements, e.g., delay, jitter, loss, throughput, etc.

Source Type Properties


Characteristics Voice
* Alternating talkspurts and silence intervals. * Talk-spurts produce constant packet-rate traffic * Highly bursty traffic (when encoded) * Long range dependencies * Poisson type * Sometimes batcharrivals, or bursty, or sometimes on-off

QoS Requirements
Delay < ~150 ms Jitter < ~30 ms Packet loss < ~1%

Model
* Two-state (on-off) Markov Modulated Rate Process (MMRP) * Exponentially distributed time at each state

Video

Delay < ~ 400 ms Jitter < ~ 30 ms Packet loss < ~1% Zero or near-sero packet loss Delay may be important

K-state (on-off) Markov Modulated Rate Process (MMRP)

Interactive
FTP telnet web

Poisson, Poisson with batch arrivals, Two-state MMRP

After Prof. Dimitri P. Bertsekas Department of Electrical Engineering M.I.T.

Typical Voice Source Behavior

After Prof. Dimitri P. Bertsekas Department of Electrical Engineering M.I.T.

MPEG1 Video Source Model The MPEG1 MMRP model can be extremely bursty, and has long range dependency behavior due to the deterministic frame sequence

Diagram Source: Mark W. Garrett and Walter Willinger, Analysis, Modeling, and Generation of Self-Similar VBR Video Traffic, BELLCORE, 1994

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 36

Inferring a Distribution There is data available and we want to find a well known theoretical distribution (normal, exponential, binomial, Bernoulli, geometric, . . . ) that fits the data best" There are three steps involved:
1. Identify candidate distribution (normal vs.

exponential vs. . . . ) 2. estimate the parameters from the observations 3. determine how good the estimated distribution ts your data (Goodness-of-fit tests") 4. If the test is not satisfactory, go back to the first step
9/20/2012 Simulation 37

Developing a Model of Input Data

9/20/2012

Simulation

38

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 39

Selecting the Family of the Distribution


There are a large number of probability distributions

generated from observations of the real world proposed as theoretical models

What is known about the physical characteristics of the input process?


Is it naturally discrete or continuous valued? Are the observable values inherently bounded or is there no natural bound? Can you infer a distribution from what you know about the process that generates input values?
Normal (Gaussian) process is derived from the sum of a large numberof independent random variables Erlang process is sum of several exponential processes Lognormal process is derived from product of several componentprocesses Poisson process models the number of independent events that occur in a bounded period of time or area in space.

9/20/2012

Simulation

40

Identifying the Distribution

Histograms may infer a known pdf or pmf.


Example: Exponential, Normal, and Poisson distributions are frequently encountered, and less difficult to analyze.

Problem :appropriate choice of number of intervals k Rule of thumb: k=int(1+log2n) Sometimes doesnt work well. Recommendation:

Try several values Choose one that gives a smooth histogram with a shape similar to the pdf of one of the standard distributions

41

Sample Histograms

Coarse, ragged, and appropriate histogram


6 5 4 3 2 1 0 0 2 4 6 8 10 12 14 16 18 20 22 24

(1) Original Data - Too ragged


42

Sample Histograms (cont.)

Coarse, ragged, and appropriate histogram


25 20 15 10 5 0 0~7 8 ~ 15 16 ~ 24

(2) Combining adjacent cells - too coarse


43

Sample Histograms (cont.)

Coarse, ragged, and appropriate histogram


12 10 8 6 4 2 0 0~2 3~5 6~8 9~11 12~14 15~17 18~20 21~24

(3) Combining adjacent cells - appropriate


44

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 45

Parameter Estimation
The sample mean, X, is defined by

X = ( Xi ) / n
i=1 n 2 2 i 2

(1)

And the sample variance, S2, is defined by

S = ( X n X ) /( n 1) (2)
i =1

46

Parameter Estimation (cont.)


If the data are discrete and grouped in a frequency distribution, Eq1 and Eq2 can be modified to provide for much greater computational efficiency. The sample mean can be computed by

X = ( f jX j ) / n
j=1
k j=1

(3)

And the sample variance, S2, is defined by

S2 = ( f jX 2 n X 2 ) /( n 1) (4) j

47

Suggested Estimators for distr. often used in Simulation

48

Agenda
Motivation Trace Driven Simulation Synthetic Traffic

Guessing a Distribution Simple analytic models for bursty traffic Inferring a Distribution
Selecting the Family of the Distribution Parameter Estimation Statistical Inference, Testing of Hypotheses (Goodness-of-fit Test)

Modern Traffic Models


9/20/2012

Nature of self-similar traffic Self-similarity in data traffic Self-similarity in simulation, performance issues
Simulation 49

Statistical inference. Role of chance.


Scientific knowledge

Reason and intuition

Empirical observation

Formulate hypotheses

Collect data to test hypotheses

9/20/2012

Simulation

50

Statistical inference. Role of chance.


Systematic error

Formulate hypotheses

Collect data to test hypotheses

CHANCE
Accept hypothesis Reject hypothesis

Random error (chance) can be controlled by statistical significance or by confidence interval


9/20/2012 Simulation 51

Quantile-Quantile Plots If X is a RV with CDF F, the q-quantile of X is the value such that F() = P(X < ) = q Raw data {xi} Data rearranged by magnitude {yj} Then: yj is an estimate of the (j-1/2)/n quantile of X, i.e. yj ~ F-1[(j-1/2)/n]

9 9/20/2012 Simulation 52

Quantile-Quantile Plots If F is a member of an appropriate family then a plot of yj vs. F-1[(j-1/2)/n] is a straight line If F also has the appropriate parameter values the line has a slope = 1 .

10 9/20/2012 Simulation 53

Quantile-Quantile Plots :Example Check whether the measured times follows a normal distribution (the observations are sorted in increased order)
j 1 2 3 4 5 Value 99.55 99.56 99.62 99.65 99.79 j 6 7 8 9 10 Value 99.98 100.02 100.06 100.17 100.21 j 11 12 13 14 15 Value 100.26 100.27 100.33 100.41 100.47

9/20/2012

Simulation

54

Quantile-Quantile Plots :Example


yj are plotted versus F-1( (j-0.5)/n) where F has a normal distribution with the sample mean (99.99 sec) and sample variance (0.28322 sec2)

9/20/2012

Simulation

55

Testing of hypotheses The hypotheses are: H0: the random variable, X, conforms to the distributional assumption with the parameter(s) given by the parameter estimate(s) HA: the random variable, X, does not conform

9/20/2012

Simulation

56

Testing of hypotheses Type I and Type II Errors


No study is perfect, there is always the chance for error
Decision
"Reality"

H0 true / HA false OK p=1- Type I error () ("Missing target") p=

H0 false / HA true Type II error () ("False alarm") p= OK p=1-

Accept H0 / reject HA Reject H0 /accept HA

- level of significance
9/20/2012 Simulation

1- - power of the test


57

2 Test

9/20/2012

Simulation

58

Goodness-of-fit Test, Chi-Square Test


i = histogram cell i = 1, , k n = number of observations Oi = real number of observations in cell i Pi = theoretical number of observations in cell i = F(ai) - F(a i-1) ai = upper bound of cell i a i-1 = lower bound of cell i F(x) = cumulative density function = cdf

= (oi npi ) / npi


2 2 i =1

k N

Then cdf is a good fit if s = number of parameters of distribution


9/20/2012 Simulation

2 2, ,Nk K 1 < s 1
2

59

Goodness-of-Fit Tests Chi-Square Test (cont.)


Recommendations for number of class intervals for continuous data Sample Size, Number of Class Intervals, n k 20 Do not use the chi-square test 50 5 to 10 100 10 to 20 >100 n to n/5

60

Goodness-of-fit Test, Chi-Square Test


2 ,k s 1 is found in The critical value

statistical tables. The null hypothesis, H0, is rejected if

>
2 0

2 , k s 1

9/20/2012

Simulation

61

Chi-Square Test Example (1/8)


The number of vehicles arriving at the northwest corner of an intersection in a 5-minute period between 7:00 a.m. and 7:05 a.m. was monitored for five workdays over a 20-week period. Following table shows the resulting data. The first entry in the table indicates that there were 12 5minute periods during which zero vehicles arrived, 10 periods during which one vehicle arrived, and so on.

62

Chi-Square Test Example (2/8)


Arrivals per Period 0 1 2 3 4 5 Frequency 12 10 19 17 10 8 Arrivals per Period 6 7 8 9 10 11 Frequency 7 5 5 3 3 1

Since the number of automobiles is a discrete variable, and since there are sample data, the histogram can have a cell for each possible value in the range of data. The resulting histogram is shown on the next slide

63

Chi-Square Test Example (3/8)


Histogram of number of arrivals per period
20 18 16 14 12 10 8 6 4 2 0 0 1 2 3 4 5 6 7 8 9 10 11

Number of arrivals per period

64

Chi-Square Test Example (4/8)

Since the histogram of the data appeared to follow a Poisson distribution, the parameter, = 3.64, was determined. Thus, the following hypotheses are formed: H0: the random variable is Poisson distributed H1: the random variable is not Poisson distributed

65

Chi-Square Test Example (5/8)


The pmf for the Poisson distribution was given: (e- x) / x! , x = 0, 1, 2 ... p(x) = 0 , otherwise For = 3.64, the probabilities associated with various values of x are obtained using equation above with the following results.
p(0) = 0.026 p(1) = 0.096 p(2) = 0.174 p(3) = 0.211 p(4) = 0.192 p(5) = 0.140 p(6) = 0.085 p(7) = 0.044 p(8) = 0.020 p(9) = 0.008 p(10) = 0.003 p(11) = 0.001

66

Chi-Square Test Example (6/8)


Chi-square goodness-of fit test for example
xi 0 1 2 3 4 5 6 7 8 9 10 11 Observed Frequency, Oi 12 10 22 19 17 10 8 7 5 5 3 17 3 1 100 Expected Frequency, Ei 2.6 9.6 12.2 17.4 21.1 19.2 14.0 8.5 4.4 2.0 0.8 7.6 0.3 0.1 100.0 (Oi - Ei)2 / Ei 7.87 0.15 0.80 4.41 2.57 0.26 11.62

27.68

67

Chi-Square Test Example (7/8)


With this results of the probabilities, table on the previous slide is constructed. The value of E1 is given by np1 = 100 (0.026) = 2.6. In a similar manner, the remaining Ei values are determined. Since E1 = 2.6 < 5, E1 and E2 are combined ( empiric rule) In that case O1 and O2 are also combined and k is reduced by one. The last five class intervals are also combined for the same reason and k is further reduced by four.
68

Chi-Square Test Example (8/8)


The calculated is 27.68. The degrees of freedom for the tabulated value of 2 is k-s-1 = 7-1-1 = 5. Here, s = 1, since one parameter was estimated 2 from the data. At = 0.05, the critical value 0.05,5 is 11.1. Thus, H0 would be rejected at level of significance 0.05. The analyst must now search for a better-fitting model or use the empirical distribution of the data.
2 0
69

Kolmogorov-Smirnov frequency test

9/20/2012

Simulation

70

You might also like