You are on page 1of 9

Minimizing End-to-End Delay in High-Speed

Networks with a Simple Coordinated Schedule


Matthew Andrews
Lisa Zhang
Bell Laboratories
600-700 Mountain Avenue
Murray Hill, NJ 07974
fandrews,ylzg@research.bell-labs.com
AbstractWe study the problem of providing end-to-end delay guarantees in connection-oriented networks. In this environment, multiple-hop
sessions coexist and interfere with one another.
Parekh and Gallager showed that the Weighted Fair Queueing (WFQ)
scheduling discipline provides a worst-case delay guarantee comparable to
1
i Ki for a session with rate i and Ki hops. Such delays can occur since
a session-i packet can wait for time 1 at every hop.
i
We describe a work-conserving scheme that guarantees an additive delay
1
Ki . This bound is smaller than the multiplicabound of approximately 

i+

tive bound 1
i Ki of WFQ, especially when the hop count Ki is large. We
call our scheme C OORDINATED -E ARLIEST-D EADLINE -F IRST (CEDF) since
it uses an earliest-deadline-first approach in which simple coordination is
applied to the deadlines for consecutive hops of a session. The key to the
bound is that once a packet has passed through its first server, it can pass
through all its subsequent servers quickly.
We conduct simulations to compare the delays actually produced by the
two scheduling disciplines. In many cases, these actual delays are comparable to their analytical worst-case bounds, implying that CEDF outperforms
WFQ.

I. I NTRODUCTION
The provision of end-to-end delay guarantees in high-speed
networks remains one of the most important and widely studied Quality-of-Service (QoS) issues. Many real time audio and
video applications rely on the ability of the network to provide small delays. One key mechanism for achieving this aim
is scheduling at the outputs of the switches. In this paper, we
attempt to minimize end-to-end delay using a novel scheduling
scheme.
Before we introduce our scheme we first recall the delay
bounds for the much studied Weighted Fair Queueing (WFQ)
scheduling discipline, also known as Packet-by-Packet Generalized Processor-Sharing (PGPS). In their seminal papers [1], [2],
Parekh and Gallager showed that WFQ achieves the following
session-i delay bound for Rate Proportional Processor Sharing
(RPPS).1

Ki L
i + (Ki ; 1)Li + X
max
m
i
m=1 r

i + Ki ; 1 + K :
i
i
Hence, for a small burst size, e.g.  i = 1, the multiple-hop delay
is essentially

i  Ki 

and the single-hop delay is essentially 1i . Moreover, it is possible to construct an example in which this bound is achieved
since a packet can wait for time 1i at every hop. This illustrates
our earlier observation.
In this paper, we demonstrate with both analysis and simulation that even for small burst sizes, a bound of 1i  Ki is not
necessary, i.e. the K -hop delay does not have to be K times the
1-hop delay. Indeed, in the case of uniform packet sizes, uniform service rates and small burst sizes, [3] showed that each
session i can achieve a delay bound, 2

1

O  + Ki 
i

using a centralized scheme. The same paper also proposed a


simple distributed protocol with a slightly weaker bound,

1
 n 
O  + Ki log 
:
i
min

(1)

For session i, Li is the maximum packet size, K i is the number


of servers and r m is the service rate of the mth server. The maximum packet size over all sessions is L max . Session i is leakybucket constrained with burst size  i and rate i . Throughout
this paper, we assume that all service is non-cut-through and
non-preemptive.
1 We briefly review the definitions of WFQ and RPPS in Section II.

To understand the delay guarantee of (1) better, we compare


the delay bound when session i has a single hop (K i = 1) with
the bound when session i has multiple hops (K i > 1). We observe the following. When the burst size  i is large then the
multiple-hop delay bound is much less than K i times the singlehop delay bound. However, when  i is small then the multiplehop delay can be approximately K i times the single-hop delay.
To see this, let us assume a uniform packet size for all sessions
(Li = 1) and a uniform service rate for all servers (r m = 1).
The delay bound of (1) now becomes,

Here, n is the number of servers in the network and  min is the


minimum session rate.
Our Results In Section III we generalize the above simple protocol to accommodate arbitrary packet sizes and arbitrary server
2 The bound

O 1
i

+ Ki

is best possible up to a constant factor. To see


this, under non-cut-through service all sessions must suffer delay Ki . Moreover,
examples can be constructed in which some sessions must suffer delay i .

Multiplicative Delay Bound

Additive Delay Bound

150

150
rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140
130
120
110

130
120
110
100

90

90

80

80

Delay

Delay

100

rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140

70

70

60

60

50

50

40

40

30

30

20

20

10

10

0
0

10

15

20

25
30
Session Length

35

40

45

50

Fig. 1. A plot of the multiplicative delay bound 1


i Ki . Each curve
represents a different value of i . The delays are plotted against
Ki .

rates. We derive the following exact delay bound which allows


us to provide a direct comparison with (1).

Ki L
i + 4Li =" + X
max log():
m
i
r
m=1

(2)

The parameter " is the server utilization factor defined later. The
logarithmic term, although small, is somewhat involved. We
give the full definition later. In Section IV we provide simulation
results to compare the actual performance of our protocol and
WFQ.
The basic ideas of our protocol are an earliest-deadline-first
approach coupled with randomization and coordination. We assign a deadline for every server through which a packet passes.
By introducing some randomness, the deadlines can be sufficiently spread out so that all the packets can meet all their
deadlines. By introducing simple coordination among the deadlines, we can ensure that once a packet has passed through
its first server, it can pass through all its subsequent servers
quickly. We refer to our protocol as C OORDINATED -E ARLIESTD EADLINE -F IRST (CEDF).
The traffic lights in Manhattan provide an intuitive analogue
to CEDF. Since the lights are coordinated, when one traffic light
turns green, many lights further down the street turn green also.
This means that once a car waits through one red light it can
then drive through many green lights quickly. In this way, delay
does not have to accumulate at every light.
From now on, we refer to a delay bound of the form 1i  Ki
as a multiplicative bound and a bound of the form 1i + Ki as
an additive bound. In Figures 1 and 2, we plot these bounds for
different values of K i and i . The curves for the multiplicative
bound have different slopes for different  i , whereas the curves
for the additive bound all have the same slope. We can see that
in general it is desirable to have an additive bound. We note that
the bound (2) of CEDF is close to an additive bound. (It does not
contain a term K i =i .) Apart from the bound in reference [3] we
know of no previous end-to-end delay bound that is close to an
additive bound.
In our simulations, we observe that the actual delays under WFQ and CEDF are often comparable to their analytical

10

15

20

25
30
Session Length

35

40

Fig. 2. A plot of the additive delay bound 1

45

50

+ Ki .

bounds. In many scenarios, the former exhibits the behavior of


a multiplicative bound, and the latter exhibits the behavior of
an additive bound. For these scenarios, CEDF produces significantly lower delays. In other scenarios where there is less contention between sessions, both protocols exhibit the behavior of
an additive bound.
CEDF has other desirable properties. First, we do not need
traffic reshaping between hops. Second, we only need to do
per-session processing at the points where the sessions enter the
network. That is, we do no per-session processing within the
network.
Previous Work The Earliest-Deadline-First (EDF) scheduling
discipline when applied to a single server has received much attention. For example, Ferrari and Verma [4] and Verma, Zhang
and Ferrari [5] showed that it can provide delay bounds and
delay-jitter bounds. Georgiadis, Guerin and Parekh [6] and
Liebeherr, Wrege and Ferrari [7] proved that EDF is delayoptimal in the sense that if a set of delay bounds is achievable
then it can be achieved by EDF. Necessary and sufficient conditions for a set of delay bounds to be achievable were given.
Liebeherr et al. also presented schemes with low implementation complexity that approximate EDF [7], [8]. For networks,
Georgiadis, Guerin, Peris and Sivarajan [9] showed that EDF
can be sub-optimal. Nevertheless they proved that if the traffic
is correctly reshaped after each node then EDF can outperform
Weighted Fair Queueing. However, the best explicit bound on
end-to-end delay given in [9] is the same as Equation (1). General techniques for calculating end-to-end delay bounds were
obtained by Goyal, Lam and Vin [10] and Goyal and Vin [11].
A number of papers have simulated end-to-end delay performance. Simulation results for EDF are presented in [4],
[5]. Clark, Shenker and Zhang [12] used simulation to compare WFQ with variants of FIFO. Yates, Kurose, Towsley and
Hluchyj [13] examined end-to-end delay distributions for WFQ,
FIFO and Golestanis Stop-and-Go Fair Queueing [14], [15].
They found that the analytic delay bounds can be too pessimistic. Grossglauser and Keshav [16] showed that FIFO
can outperform the Weighted Round Robin (WRR) and Round
Robin (RR) disciplines for CBR traffic.

Our protocol CEDF is motivated by techniques of Leighton,


Maggs and Rao [17] and Leighton, Maggs and Richa [18] for
static packet scheduling. In this static setting, all packets are
present in the network initially. Similar techniques were used
by Rabani and Tardos [19] and Ostrovsky and Rabani [20]. For
an overview of different scheduling disciplines, see [21], [22].
The rest of the paper is divided into sections as follows. We
define our model and briefly review WFQ and RPPS in Section II. Our protocol CEDF is described and analyzed in Section III. The simulation results are presented in Section IV. We
give our conclusions in Section V. The Appendix provides the
details of the proofs.

WFQ is a non-preemptive scheme that emulates GPS on a


packet-by-packet basis. In particular, if a server needs to select
a packet for transmission at time t then it selects the first packet
that would complete service under GPS if no additional packets
were to arrive after time t.
In this paper we restrict our attention to a special case of WFQ
known as Rate Proportional Processor Sharing (RPPS) in which
m
i = i for all sessions i and servers m. The end-to-end delay
bound for RPPS derived in [2] is stated in Equation (1).

II. M ODEL AND D EFINITIONS

The basic idea of C OORDINATED -EDF is very simple. For


each packet p, we assign deadlines D 1  D2 : : : DK for every
server, m1  m2 : : : mK , through which p passes. The deadlines
at a server m are defined using a parameter G m , where Gm is
essentially Lrmax
m log(). (We define the logarithmic term in G m
later.) In particular, D 1 is rand + Gm1 time after ps injection,
where rand is a random number chosen from an appropriate
range. Each subsequent deadline D k+1 is Dk + Gmk . CEDF
gives priority to the packet with the earliest deadline if more
than one packet is waiting for a server. Ties are broken arbitrarily.
Note that randomness is only added to the first deadline
of each packet. This randomness has the important effect of
spreading out the deadlines. If rand is chosen from a large
enough range, i.e. proportional to L i =i for session i, then deadlines from different sessions do not cluster together. In this way,
packets do not compete for the same server simultaneously, and
hence all packets are able to meet all their deadlines.
The Gm s provide coordination among the deadlines. We
point out that the values of the G m s are usually small, especially in high-speed networks where the server rates r m are
large. This means that once a packet passes through its first
server, it passes through all its subsequent servers quickly. As
an analogy to our strategy, consider the traffic lights on an avenue in Manhattan. If a car is stopped at a red light then once
that light turns green, many of the subsequent lights turn green
also. In other words, the coordination of the lights means that
once the car has passed through one light, it can quickly travel
through many lights in succession.
We emphasize that the Gm s are independent of the session
rates. Under CEDF, session-i packets do not accumulate a delay
of 1i for each server that they pass through. Hence, CEDF does
not have a multiplicative term of the form 1i  Ki in its delay
bound. This provides a significant contrast with the delay bound
of WFQ. We discuss in more detail the advantages of CEDF in
Section III-B.

We consider a packet-based connection-oriented network. We


equate each link in the network with the server that schedules the
sessions on the link. Each session is specified by a fixed path
through the network. Let K i be the number of servers along the
(i) (i)
(i)
path of session i, and let m 1  m2  : : :  mKi be these servers.
When it causes no confusion, we drop the superscript (i). We
define Li to be the maximum size of a session-i packet in bits.
Let Lmax = maxi Li and Lmin = mini Li .
We use the ( ) traffic model introduced by Cruz [23], [24]
in which the traffic entering the network is leaky-bucket constrained. The session-i traffic is characterized by a burst size
i and a session rate i . If Ai (t1  t2 ) denotes the amount of
session-i traffic entering the network during the time interval
(t1  t2 ], then,

Ai (t1  t2 ]  i + i (t2 ; t1 )

8 t2  t1  0:

Let rm be the service rate of server m, i.e. m can service at


most r m (t2 ; t1 ) bits during the interval (t 1  t2 ]. Let I (m) be
the set of sessions served by server m. We require the following
stability condition,

i2I (m)

i  (1 ; ")rm

for some " > 0:

The parameter " is a server utilization factor. It is crucial in


allowing us to use coordination to achieve low delay bounds.
We adopt the non-cut-through and non-preemptive convention for scheduling. First, no packet is eligible for service until
its last bit has arrived. Second, once a server begins serving a
packet, it must continue until the whole packet has been serviced.
Review of Weighted Fair Queueing Since we refer frequently to
Weighted Fair Queueing, we now provide a brief definition. For
details see [25], [1], [2]. WFQ attempts to emulate the Generalized Processor Sharing (GPS) scheme, in which all backlogged
sessions receive service simultaneously. In particular, if session
i is backlogged at server m then under GPS it receives service
at rate,

m
P i m rm
j 2B j

m
where Bm is the set of backlogged sessions at server m and the
m
i are a set of allocated weights.

III. A NALYTICAL B OUND


A. Overview

B. Protocol and Analysis


Parameters We define parameters T i and M for generating
random numbers. Roughly speaking, M serves as the period
of the deadlines. Once the deadlines are defined in an interval
of length M , all deadlines are defined. The parameter T i is the
size of the intervals from which the random numbers for session
i are chosen. When Ti is about 2"Lii , the amount of randomness
is sufficient to spread out the deadlines. We choose to write T i

in the following (slightly complicated) form, because it ensures


that M is an integral multiple of all the T i s. For reasons that
will become clear later, we also define S i such that Si =Ti is
slightly greater than the session rate  i . Let,
2Li

Ti = 2dlog2 "i e
M = max
T
i i
Si = Ti i (1 + "=2)
We define Gm for each server m, which determines how the
deadline for a packet is incremented when it advances from one
server to the next. Let,



nMrm " 
Gm =  Lrmax
log
e Lmin
m
where Lmax = maxi Li and Lmin = mini Li . The parameter
 = O(";3 log 1;p1suc ), where psuc is the success probability
of the protocol. (We discuss this success probability in the Remarks section.) Note that  is independent of L i , i , i , Ki and
rm .

Tokens We use tokens to define deadlines. For session i, let


1 , 2 : : : M=Ti be numbers chosen uniformly at random from
each of the intervals 0 T i ) Ti  2Ti ) : : : M ; Ti  M ). Session-i
tokens appear periodically with period M at the following times.
1
2
...
M=Ti
1 + M 2 + M ...
M=Ti + M
M=Ti + 2M
1 + 2M 2 + 2M ...
1 + 3M 2 + 3M ...
M=Ti + 3M
..
.

Deadlines Let m1  m2 : : : mKi be the servers on the path of


session i. For each session-i packet, we define a sequence of
deadlines D1  D2 : : : DKi for traversing the servers.
When a packet of size ` bits obtains a token, it consumes `
bits from that token. At most S i bits can be consumed from
each session-i token. Suppose a session-i packet p is injected at
time tinj and has `p bits. Suppose also that the session-i packet
injected immediately before p obtains its token at time t prev .
Packet p obtains the first session-i token after maxft inj  tprev g
that has at least `p bits unconsumed. Let be the time that the
token appears. The deadlines are defined as follows.

D1 = + Gm1
Dj = Dj;1 + Gmj
Now that all deadlines are defined, each server gives priority
to the packet that has the earliest deadline.
Remarks
1. The only coordination required comes from the above iterative definition of the deadlines. This coordination can be
achieved simply by stamping each packet with its current deadline.3 Each server can then update the deadlines of its pending
3 This can be done using techniques similar to the protocols of [26].

packets autonomously, i.e. we do not require explicit communication among servers.


2. We do not place tokens at times T i  2Ti  3Ti etc., but rather
we introduce some randomness. This randomness is essential
for spacing out the deadlines so that not many deadlines contend
for the same server simultaneously. Once the tokens are chosen,
the deadlines are chosen deterministically.
3. We emphasize that our protocol is work conserving and requires no traffic shaping. As long as some packets are waiting
for a server, the packet with the earliest deadline gets serviced.
In particular, a packet can be serviced before it obtains a token. The concept of a packet obtaining/consuming a token is
merely a method of counting for the purpose of assigning deadlines.
4. The only per-session processing is the determination of
which token a packet obtains. This can be done at the point
on the edge of the network where the session enters. Once the
token has been obtained, the deadlines for the packet are independent of its session parameters. This means that we need no
per-session state within the network.
5. We say that the protocol is successful if all the packets meet
all their deadlines. The success of the protocol is equivalent to
the successful placing of a finite number of tokens due to the periodicity of the token placement. Hence, we can use a Chernoffbound argument to analyze the success probability.
6. To prove the desired end-to-end delay bound, we prove two
statements in the Appendix. First, with high probability the protocol is successful. (See Lemmas 2 and 3.) Second, is at most
tinj + ii + 4"Lii for each session-i packet, where t inj is the injection time of that packet. (See Lemma 4.) Therefore,
Theorem 1: With high probability, the end-to-end delay guarantee for session i is



Ki L
i + 4Li =" +  X
max log nMrmk " :
e
mk
i
Lmin
k=1 r

We emphasize that when the protocol is successful, every packet


meets all of its deadlines, i.e. the bound in Theorem 1 is a worstcase delay bound.
7. The factor 1=" in the term 4L i =" is needed in the proof of
Lemma 4. However, we conjecture that in many situations it
will be possible to obtain a delay bound in which the term 4L i ="
is replaced by 4Li .
8. We now compare the bound of WFQ with the bound of
CEDF when r m is large, e.g. in high-speed networks. Here,
the terms containing 1=r m are negligible. The bound for WFQ
becomes,

i + 2(Ki ; 1)Li 
i

and the bound for CEDF becomes,

i + 4Li =" :
i
We note that the bound for CEDF does not contain K i .
IV. S IMULATION R ESULTS
Our experiments simulate a simple situation with uniform
packet sizes and uniform server rates. Since CEDF involves

WFQ

Simple-CEDF

150

150
rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140
130
120
110

130
120
110
100

90

Mean Delay

Mean Delay

100

rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140

80
70

90
80
70

60

60

50

50

40

40

30

30

20

20

10

10

0
0

10

15

20

25
30
Session Length

35

40

45

50

Fig. 3. Mean delay of the long session due to WFQ.

10

15

20

45

50

Simple-CEDF

130
120
110
100

rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140
130
120
110
100

90

98th Percentile

98th Percentile

40

150
rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140

80
70
60

90
80
70
60

50

50

40

40

30

30

20

20

10

10

0
0

Fig. 5.

10

15

20

25
30
Session Length

35

40

45

50

p:
tinj :
Dk :

98%-percentile delay of the long session due to WFQ.

many parameters, we simulate a simplified version, S IMPLE CEDF, which nevertheless contains the essence of CEDF. Under S-CEDF, the deadline for the first server is chosen randomly
(without reference to periodic tokens). Every subsequent deadline is the deadline for the previous server incremented by one
packet service time. (See Figure 11.) As we shall see, the performance of S-CEDF corresponds to the analytical bounds of
Section III.

1
2
3

35

Fig. 4. Mean delay of the long session due to S-CEDF.

WFQ
150





25
30
Session Length

A session-i packet
Injection time of p
Deadline of p at its k th hop

D1 := randomly chosen from t inj  tinj + 1i ]


Dk := Dk;1 + one packet service time

10

15

20

25
30
Session Length

35

40

45

50

98%-percentile delay of the long session due to S-CEDF.

Fig. 6.

packet is dropped in any experiments.


packet size
1000b

link speed
1Mb/sec

packet service time


1ms

buffer size

A. Single Long Session


We begin with a very simple configuration as illustrated in
Figure 12. The network consists of a line of N links. A long
session of N hops travels through the network sharing each hop
with a short session of 1 hop. These short sessions provide the
cross-traffic for the long session. The length N of the long
session varies from 5 to 40. The link utilization is set to 0:8
(i.e. " = 0:2). The rate of the long session  ` varies in the
range from 0:03 to 0:7. The rate of each short session  s is set
to 0:8 ; ` . Experiments of a similar setup were conducted in
other simulation studies, e.g. [16], [27], [5].

Each link gives priority to the packet with the


earliest deadline.

Fig. 11. S-CEDF, the S IMPLE -CEDF protocol.


2

We compare the performance of WFQ and S IMPLE -CEDF


(S-CEDF) using the mean end-to-end delay and the 98%percentile end-to-end delay. We use the following simulation
parameters. The link speed is set to 1Mb/sec and all packets
have a size 1000 bits. The packet service time on each link is
therefore 1ms. The end-to-end delay consists of the packet service time and the queueing time, i.e. the time that the packet
spends waiting in a buffer. Buffers have a large size and no

Fig. 12. Session 0 is the long session with 5 hops. Sessions 1 through 5 are the
1-hop sessions.

We first use a deterministic injection model that conforms


to the ( ) traffic model with  = 1 for each session. Figures 3, 4, 5 and 6 illustrate the end-to-end delay experienced by
the long session. We note the striking resemblance between the
curves for these actual delays and the curves for the analytical

WFQ

Simple-CEDF

150

150
rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140
130
120
110

130
120
110
100

90

Mean Delay

Mean Delay

100

rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140

80
70

90
80
70

60

60

50

50

40

40

30

30

20

20

10

10

0
0

10

15

20

25
30
Session Length

35

40

45

50

Fig. 7. Probabilistic on-off source. Mean delay due to WFQ.

10

15

20

35

40

45

50

Fig. 8. Probabilistic on-off source. Mean delay due to S-CEDF.

WFQ

Simple-CEDF

150

150
rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140
130
120
110
100

rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2
rate=0.3
rate=0.4
rate=0.5
rate=0.6
rate=0.7

140
130
120
110
100

90

98th Percentile

98th Percentile

25
30
Session Length

80
70
60

90
80
70
60

50

50

40

40

30

30

20

20

10

10

0
0

10

15

20

25
30
Session Length

Fig. 9. Probabilistic on-off source.

35

40

45

50

98%-percentile delay due to WFQ.

delay bounds. (Recall Figures 1 and 2.) These plots demonstrate


that for small values of  ` , S-CEDF has a significant advantage
over WFQ in terms of the end-to-end delay of the long session.
The two disciplines present similar behavior for larger values of
` .
We take a closer look at the behavior of the long session for
small ` . Under WFQ, packets from the long session are frequently delayed by packets from the 1-hop sessions, since  s is
much larger than  ` . Furthermore, a packet from the long session suffers from a similar amount of queueing delay at each
link. This behavior of WFQ supports the analytical bound of the
multiplicative form 1`  K .
Under S-CEDF, the long session behaves differently. When
traversing the first few links, a packet from the long session is
likely to queue in the buffers. This is because the initial deadline
is chosen from the range t inj  tinj + 1` ]. When ` is smaller
than s , the long session is likely to have later deadlines than
the interfering 1-hop sessions at the beginning of its path, and
hence its packets are delayed. However, as the packet from the
long session moves further along its path, its deadline becomes
earlier in comparison to the deadlines of the 1-hop sessions, and
hence it suffers less delay. This behavior of S-CEDF supports
the analytical bound of the additive form 1` + K .
Despite the fact that the long sessions with small  ` have
much smaller end-to-end delay under S-CEDF than under WFQ,
the 1-hop sessions do not suffer a great deal under S-CEDF. The
following table summarizes the mean delay of the 1-hop sessions.

10

15

20

25
30
Session Length

Fig. 10. Probabilistic on-off source.


CEDF.

`
s

WFQ
S-CEDF

0.03
0.77
1.0
1.06

0.1
0.7
1.0
1.16

35

40

45

50

98%-percentile delay due to S-

0.2
0.6
1.0
1.26

0.3
0.5
1.0
1.3

0.4
0.4
1.4
1.42

0.5
0.3
1.5
1.53

0.6
0.2
1.8
1.79

0.7
0.1
2.2
2.15

Variations of the above experiments are conducted. We first


vary the configuration of the network and the sessions. For example, instead of having one 1-hop session at each link, we use
multiple 1-hop sessions at each link where the total rates of these
1-hop sessions add up to 0:8 ;  ` . As another example, we experiment with a ring of 40 nodes and 40 links. Multiple long
sessions wrap around the ring interfering with one another in
addition to the 1-hop sessions on each link. These experiments
yield similar results to those shown in Figures 3-6. (We omit the
plots here.)
We also vary the injection patterns at the source for the single long session configuration shown in Figure 12. Experiments
with a larger burst size, e.g.  = 10, yields plots similar to
Figures 3-6. A probabilistic on-off source with exponentially
distributed on and off times yields the plots in Figures 7-10.
We have results for similar experiments using the FIFO discipline. In this setting, the delays produced by FIFO are close
to the delays of WFQ, i.e. the delays can be approximated by a
multiplicative formula. (We omit the plots here.)

WFQ

Simple-CEDF

100

100
rate=0.03
rate=0.05
rate=0.1
rate=0.15

rate=0.03
rate=0.05
rate=0.1
rate=0.15

90

80

80

70

70

60

60

Mean Delay

Mean Delay

90

50

50

40

40

30

30

20

20

10

10

0
0

10

15

20
25
30
Length of network

35

40

45

50

Fig. 13. Multiple long sessions. Mean delay due to WFQ.

10

15

20
25
30
Length of network

40

45

50

Fig. 14. Multiple long sessions. Mean delay due to S-CEDF.

WFQ

Simple-CEDF

100

100
rate=0.03
rate=0.05
rate=0.1
rate=0.15

90

rate=0.03
rate=0.05
rate=0.1
rate=0.15

90

80

80

70

70

60

60

98th Percentile

98th Percentile

35

50
40

50
40

30

30

20

20

10

10

0
0

10

15

20
25
30
Length of Network

Fig. 15. Multiple long sessions.

35

40

45

50

98%-percentile delay due to WFQ.

B. Multiple Long Sessions


We now consider a more complicated configuration. We use
a ring of 40 nodes, where neighboring nodes are connected by
8 links. Sessions with hops 1, 5, 10, 15, 20, 25, 30, 35 and 40
coexist and interfere with one another in this network. The paths
and rates of these sessions are chosen as follows. We first choose
a set of 40-hop paths. Each path begins with a random node and
then follows the ring. Each hop of the path between two neighboring nodes can follow any of the 8 links between these nodes.
The choice is made randomly subject to the constraint that the
number of paths going through each link is the same. We now
cut some of these 40-hop paths into shorter paths. Some 40-hop
paths are divided into a 5-hop path and a 35-hop path, others
are divided into a 10-hop path and 30-hop path, etc. After this
process, the network has paths with lengths 5, 10, 15, ..., 40. We
also have some 1-hop paths. All sessions have the same rate. By
varying the number of the original 40-hop paths, we achieve the
desired session rates. Figures 13-16 summarize the performance
of WFQ and S-CEDF. As we can see, the curves for WFQ have
the multiplicative characteristic, although it is less pronounced
than in Figures 3 and 5. The curves for S-CEDF have the additive characteristic. We also observe that long sessions perform
better under S-CEDF than under WFQ, whereas short sessions
perform marginally better under WFQ.
We finally note that the analytical bound for WFQ is a worstcase bound, and therefore can be overly conservative. In our
experiments, we have encountered situations in which WFQ behaves in a similar manner to S-CEDF, i.e. the additive form of

10

15

20
25
30
Length of Network

Fig. 16. Multiple long sessions.

35

40

45

50

98%-percentile delay due to S-CEDF.

 + K is more apparent. In one such experiment, we consider a

line of 41 nodes and 80 links, where neighboring nodes are connected by double links. All sessions have 40 hops, starting from
the node on the left end and finishing at the node on the right
end. Each hop along the path of a session can follow either the
upper or the lower link. The choice is made randomly, subject
to the constraint that each link has an equal number of sessions
passing through it. All sessions have the same injection rate.
We vary the number of sessions in order to achieve the desired
session rate. Figures 17 and 19 illustrate the end-to-end delays
due to WFQ averaging over all the 40-hop sessions. These delays have little multiplicative behavior. This is because in this
network there is little contention among packets. S-CEDF produces similar end-to-end delays.
V. C ONCLUSION
We have described a work-conserving scheduling discipline
C OORDINATED -E ARLIEST-D EADLINE -F IRST with end-to-end
delay bound,

Ki L
i + 4Li =" + X
max log():
mk
i
r
k=1
CEDF uses randomization and simple coordination to ensure
that once a packet passes through its first server it can pass
through all its subsequent servers quickly. Under CEDF, a
i
session-i packet does not accumulate a delay of LiK
i over Ki
hops, and therefore its delay bound is smaller than that of the

WFQ

Simple-CEDF

150

150
rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2

140

rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2

140
130

120

120

110

110

100

100

90

90

Mean Delay

Mean Delay

130

80
70

80
70

60

60

50

50

40

40

30

30

20

20

10

10

0
0

10

15

20

25
30
Session Length

35

40

45

50

Fig. 17. Double-link network. Mean delay due to WFQ.

10

15

20

130

120

120

110

110

100

100

90

90

98th Percentile

98th Percentile

50

rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2

140

80
70
60

80
70
60

50

50

40

40

30

30

20

20

10

10

0
0

10

15

20

Fig. 19. Double-link network.

25
30
Session Length

35

40

45

50

98%-percentile delay due to WFQ.

Weighted Fair Queueing discipline. We have also presented simulation results to show that the performance of CEDF and WFQ
can be comparable to the analytical bounds.
The major open problem is to reduce the delay bound still further. The ultimate goal is a simple protocol with a delay bound,

Ki L
X

i + Li +
max :
mk
i
r
k=1

We thank Antonio Fernandez, Mor Harchol-Balter and Tom


Leighton for their help in earlier stages of this work. Antonio
Fernandez also provided many detailed comments on a preliminary draft of this paper. We thank Jorg Liebeherr for his insight
on implementation issues.
R EFERENCES
A. K. Parekh and R. G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: The single-node
case. IEEE/ACM Transactions on Networking, 1(3):344 357, 1993.
A. K. Parekh and R. G. Gallager. A generalized processor sharing approach to flow control in integrated services networks: The multiple-node
case. IEEE/ACM Transactions on Networking, 2(2):137 150, 1994.
M. Andrews, A. Fernandez, M. Harchol-Balter, T. Leighton, and L. Zhang.
Dynamic packet routing with per-packet delay guarantees of O(distance +
1/session rate). In Proceedings of the 38th Annual Symposium on Foundations of Computer Science, pages 294 302, Miami Beach, FL, October
1997.
D. Ferrari and D. Verma. A scheme for real-time channel establishment in
wide-area networks. IEEE Journal on Selected Areas in Communications,
8(3):368 379, April 1990.

10

15

Fig. 20. Double-link network.

[5]

[6]

[7]

[8]
[9]

ACKNOWLEDGMENTS

[4]

45

Simple-CEDF

130

[3]

40

150
rate=0.03
rate=0.05
rate=0.1
rate=0.15
rate=0.2

140

[2]

35

Fig. 18. Double-link network. Mean delay due to S-CEDF.

WFQ
150

[1]

25
30
Session Length

[10]

[11]

[12]

[13]

[14]

[15]

[16]
[17]

20

25
30
Session Length

35

40

45

50

98%-percentile delay due to S-CEDF.

D. Verma, H. Zhang, and D. Ferrari. Guaranteeing delay jitter bounds in


packet switching networks. In Proceedings of Tricomm 91, Chapel Hill,
NC, April 1991.
L. Georgiadis, R. Guerin, and A. Parekh. Optimal multiplexing on a single
link: delay and buffer requirements. IEEE Transactions on Information
Theory, 43(5):1518 1535, September 1997.
J. Liebeherr, D. Wrege, and D. Ferrari. Exact admission control for networks with a bounded delay service. IEEE/ACM Transactions on Networking, 4(6):885 901, December 1996.
D. Wrege and J. Liebeherr. A near-optimal packet scheduler for QoS networks. In Proceedings of IEEE INFOCOM 97, 1997.
L. Georgiadis, R. Guerin, V. Peris, and K. Sivarajan. Efficient network
QoS provisioning based on per node traffic shaping. In Proceedings of
IEEE INFOCOM 96, pages 102 110, 1996.
P. Goyal, S. Lam, and H. Vin. Determining end-to-end delay bounds in
heterogeneous networks. In Proceedings of the Fifth International Workshop on Network and Operating System Support for Digital Audio and
Video, pages 287 298, Durham, NH, April 1995.
P. Goyal and H. Vin. Generalized guaranteed rate scheduling algorithms:
A framework. Technical Report TR-95-30, University of Texas, Austin,
September 1995.
D. Clark, S. Shenker, and L. Zhang. Supporting real-time applications in
an integrated services packet network: Architecture and mechanism. In
Proceedings of ACM SIGCOMM 92, pages 14 26, August 1992.
D. Yates, J. Kurose, D. Towsley, and M. Hluchyj. On per-session endto-end delay distributions and the call admission problem for real time
applications with QOS requirements. In Proceedings of ACM SIGCOMM
93, pages 2 12, 1993.
S. J. Golestani. A framing strategy for congestion management. IEEE
Journal on Selected Areas in Communications, 9(7):1064 1077, September 1991.
S. J. Golestani. Congestion-free communication in high-speed packet networks. IEEE Transactions on Communications, 39(12):1802 1812, December 1992.
M. Grossglauser and S. Keshav. On CBR service. In Proceedings of IEEE
INFOCOM 96, pages 129 136, 1996.
F. T. Leighton, B. M. Maggs, and S. B. Rao. Packet routing and job-shop

[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]

scheduling in O(congestion + dilation) steps. Combinatorica, 14(2):167


186, 1993.
F. T. Leighton, B. M. Maggs, and A. W. Richa. Fast algorithms for finding
O(congestion + dilation) packet routing schedules. Technical report CMUCS-96-152, Carnegie Mellon University, 1996.
Y. Rabani and E. Tardos. Distributed packet switching in arbitrary networks. In Proceedings of the 28th Annual ACM Symposium on Theory of
Computing, Philadelphia, PA, May 1996.
R. Ostrovsky and Y. Rabani. Local control packet switching algorithm. In
Proceedings of the 29th Annual ACM Symposium on Theory of Computing,
May 1997.
S. Keshav. An engineering approach to computer networking. Addison
Wesley, Reading, MA, 1997.
H. Zhang. Service disciplines for guaranteed performance service in
packet-switching networks. In Proceedings of IEEE, October 1995.
R. L. Cruz. A calculus for network delay, Part I: Network elements in
isolation. IEEE Transactions on Information Theory, pages 114 131,
1991.
R. L. Cruz. A calculus for network delay, Part II: Network analysis. IEEE
Transactions on Information Theory, pages 132 141, 1991.
A. Demers, S. Keshav, and S. Shenker. Analysis and simulation of a fair
queueing algorithm. Journal of Internetworking: Research and Experience, 1:3 26, 1990.
A. Banerjea, D. Ferrari, B. Mah, M. Moran, D. Verma, and H. Zhang. The
Tenet real-time protocol suite: Design, implementation, and experiences.
IEEE/ACM Transactions on Networking, 4(1):1 11, February 1996.
D. Stiliadis. Traffic scheduling in packet-switched networks: analysis design and implementation. PhD thesis, UCSC, 1996.

A PPENDIX P ROOF OF T HEOREM 1


Consider a server m and a time interval I . Let P be the set
of packets that have a deadline for server m in interval I . If the
total size of the packets in P is x, then we say that I services x
bits at server m.
Lemma 2: Consider any server m and any time interval I =
t ; Gm  t], where t is a potential deadline for some session at
server m. With high probability, any such interval I services
fewer than Gm rm bits at server m.
Proof: Let Xi be the number of session-i bits that I services at server m. The expectation of X i , E Xi ], is at most
Si Gm . This is because one session-i token is placed at random
Ti
in each of the intervals 0 T i ), Ti  2Ti ), etc., and the deadlines
for each session are a fixed amount of time after the tokens. In
addition, each token is consumed by at most S i bits. Let N be
the set of sessions whose paths pass through m. By linearity of
expectation,

be close to 1.

Lemma 3: If the assumption in Lemma 2 holds, then every


packet meets all its deadlines.
Proof: For the purpose of contradiction, let D be the first
deadline that is missed. This implies that all deadlines earlier
than D are met. Let p be the packet that misses deadline D for
server m. Suppose that packet p has length ` p . Since packet p
meets its previous deadlines, it must be waiting at server m at
time D ; Gm . Hence, server m is servicing other packets from
time D ;Gm to D ;`p =rm . Let p0 be such a packet, then p 0 must
have a deadline D 0  D by the definition of EDF. Moreover,
D0  D ; Gm since D is the first deadline missed. Hence, the
total size of packets that have deadlines in D ; G m  D] is at
least rm Gm . This contradicts the assumption of Lemma 2.
Lemma 2 and Lemma 3 imply that each session-i packet p
PKi
reaches its destination by time + j =1
Gmj . To complete our
analysis, we upper bound as follows.
Lemma 4: For each session-i packet p injected at t inj , we
have  tinj + ii + 4"Lii .
Proof: Let t0 be the last time before t inj that no session-i
packet is waiting to obtain a token. During (t 0  ) every sessioni token must consume packets injected during (t 0  tinj ] only and
each token must consume more than S i ; Li bits. Otherwise,
either (t0  tinj ) contains a time when no session-i packet is waiting or p would obtain a token before . The total number of bits
injected during (t 0  tinj ] is at most,

X Si
i + (tinj ; t0 )i :
Gm  (1 ; "=2)rm Gm :
T
i
i2N
i2N
The total number of session-i tokens during (t 0  ] is at least
 ;t0 ;Ti . Therefore, the total number of session-i bits consumed
P

Ti
m Gm
A Chernoff-type argument shows that Pr
X

r
i
i2N
during t0  ] is at least,
is small. (We omit details here, since the calculation is standard.) In particular,
; t0 ; Ti (S ; L ):
i
i
Ti
P
 ;"3 (1;")rm Gm =(48Lmax)
m
Pr i2N Xi  r Gm  e
:
Hence,
E

 = "3 (148;") loge 1;p1suc . Recall that Gm =


nMrm "
 Lrmax
m loge Lmin . By a union bound argument, the probability that some server m services at least G m rm bits during
some interval I is at most,
 nMrm "   3

;" (1;")rm Gm =(48Lmax )
e
2Lmin

m "   Lmin "3 (1;")=48
 nMr
2Lmin
nMrm "
 1 ; psuc :
We can choose psuc , the success probability of the protocol, to
Let

Xi ] 

Since the token placement is periodic with period M , we only


need to consider a fixed time period of length M . For each
server m, only M=T i intervals I = t ; Gm  t] can have t as
a deadline for a session-i packet in that time period. There are n
servers in the network. Hence, the total number of such intervals
I is,

X M"i nMrm "


XM

n
 2L :
min
i 2Li
i Ti

; t0 ; Ti (S ; L )   + (t ; t )
i
i
i
inj 0 i
Ti
) ;t0T; Ti (i Ti + Li ; Li )  i + tinj ; t0
i i
i

4
L
)  tinj + i + " i :
i
i

You might also like