You are on page 1of 15

Transport Layer

Process to Process Delivery


The transport layer is responsible for process-to-process delivery of the entire message. A process is an
application program running on a host.

At the transport layer, we need a transport layer address, called a port number, to choose among multiple
processes running on the destination host. The destination port number is needed for delivery; the source
port number is needed for the reply. In the Internet model, the port numbers are 16-bit integers between 0
and 65,535. The client program defines itself with a port number, chosen randomly by the transport layer
software running on the client host. This is the ephemeral port number.
IANA Ranges
The IANA (Internet Assigned Number Authority) has divided the port numbers into three ranges: well
known, registered, and dynamic (or private)
Well-known ports. The ports ranging from 0 to 1023 are assigned and controlled by lANA. These
are the well-known ports.
Registered ports. The ports ranging from 1024 to 49,151 are not assigned or controlled by lANA.
They can only be registered with lANA to prevent duplication.
Dynamic ports. The ports ranging from 49,152 to 65,535 are neither controlled nor registered.
They can be used by any process. These are the ephemeral ports.
Socket Addresses
Process-to-process delivery needs two identifiers, IP address and the port number, at each end to make a
connection. The combination of an IP address and a port number is called a socket address. The client
socket address defines the client process uniquely just as the server socket address defines the server
process uniquely
Multiplexing and Demultiplexing
The addressing mechanism allows multiplexing and demultiplexing by the transport Layer,

Multiplexing
At the sender site, there may be several processes that need to send packets. However, there is only one
transport layer protocol at any time. This is a many-to-one relationship and requires multiplexing. The
protocol accepts messages from different processes, differentiated by their assigned port numbers. After
adding the header, the transport layer passes the packet to the network layer.
Demultiplexing
At the receiver site, the relationship is one-to-many and requires demultiplexing. The transport layer
receives datagrams from the network layer. After error checking and dropping of the header, the transport
layer delivers each message to the appropriate process based on the port number.
Connectionless Versus Connection-Oriented Service
A transport layer protocol can either be connectionless or connection-oriented.
Connectionless Service
In a connectionless service, the packets are sent from one party to another with no need for connection
establishment or connection release. The packets are not numbered; they may be delayed or lost or may
arrive out of sequence. There is no acknowledgment either. We will see shortly that one of the transport
layer protocols in the Internet model, UDP, is connectionless.
Connection Oriented Service
In a connection-oriented service, a connection is first established between the sender and the receiver.
Data are transferred. At the end, the connection is released. We will see shortly that TCP and SCTP are
connection-oriented protocols.
Reliable Versus Unreliable
The transport layer service can be reliable or unreliable. If the application layer program needs reliability,
we use a reliable transport layer protocol by implementing flow and error control at the transport layer.
This means a slower and more complex service. On the other hand, if the application program does not
need reliability because it uses its own flow and error control mechanism or it needs fast service or the
nature of the service does not demand flow and error control (real-time applications), then an unreliable
protocol can be used. In the Internet, there are three common different transport layer protocols, as we
have already mentioned. UDP is connectionless and unreliable; TCP and SCTP are connection oriented
and reliable. These three can respond to the demands of the application layer programs.
USER DATAGRAM PROTOCOL (UDP)
The User Datagram Protocol (UDP) is called a connectionless, unreliable transport protocol. It does not
add anything to the services of IP except to provide process-toprocess communication instead of host-tohost communication. Also, it performs very limited error checking.
UDP is a very simple protocol using a nrinimum of overhead. If a process wants to send a small message
and does not care much about reliability, it can use UDP. Sending a small message by using UDP takes
much less interaction between the sender and receiver than using TCP or SCTP.
Well-Known Ports for UDP
7 Echo
9 Discard
11 Users
13 Daytime
69 TFTP

Source port number. This is the port number used by the process running on the source host. It is 16 bits
long, which means that the port number can range from 0 to 65,535. If the source host is the client (a
client sending a request), the port number, in most cases, is an ephemeral port number requested by the
process and chosen by the UDP software running on the source host. If the source host is the server (a
server sending a response), the port number, in most cases, is a well-known port number.
Destination port number. This is the port number used by the process running on the destination host. It
is also 16 bits long. If the destination host is the server (a client sending a request), the port number, in
most cases, is a well-known port number. If the destination host is the client (a server sending a response),
the port number, in most cases, is an ephemeral port number. In this case, the server copies the ephemeral
port number it has received in the request packet.
Length. This is a 16-bit field that defines the total length of the user datagram, header plus data. The 16
bits can define a total length of 0 to 65,535 bytes. However, the total length needs to be much less because
a UDP user datagram is stored in an IP datagram with a total length of 65,535 bytes.
UDP length = IP length - IP header's length.
Checksum. This field is used to detect errors over the entire user datagram (header plus data).
The UDP checksum calculation is different from the one for IP and ICMP. Here the checksum includes
three sections: a pseudoheader, the UDP header, and the data coming from the application layer.
The pseudoheader is the part of the header of the IP packet in which the user datagram is to be
encapsulated with some fields filled with 0s.
UDP Operation
Connectionless Services
As mentioned previously, UDP provides a connectionless service. This means that each user datagram
sent by UDP is an independent datagram. There is no relationship between the different user datagrams
even if they are coming from the same source process and going to the same destination program. The
user datagrams are not numbered. Also, there is no connection establishment and no connection
termination, as is the case for TCP. This means that each user datagram can travel on a different path.
Flow and Error Control
UDP is a very simple, unreliable transport protocol. There is no flow control and hence no window
mechanism. The receiver may overflow with incoming messages. There is no error control mechanism in
UDP except for the checksum. This means that the sender does not know if a message has been lost or
duplicated. When the receiver detects an error through the checksum, the user datagram is silently
discarded. The lack of flow control and error control means that the process using UDP
should provide these mechanisms.
Encapsulation and Decapsulation
To send a message from one process to another, the UDP protocol encapsulates and decapsulates
messages in an IP datagram.
Queuing

In UDP, queues are associated with ports. At the client site, when a process starts, it requests a port
number from the operating system. Some implementations create both an incoming and an outgoing
queue associated with each process. Other implementations create only an incoming queue associated
with each process. Note that even if a process wants to communicate with multiple processes, it
obtains only one port number and eventually one outgoing and one incoming queue. The queues opened
by the client are, in most cases, identified by ephemeral port numbers. The queues function as long as the
process is running. When the process terminates, the queues are destroyed. The client process can send
messages to the outgoing queue by using the source port number specified in the request. UDP removes
the messages one by one and, after adding the UDP header, delivers them to IP. An outgoing queue can
overflow. If this happens, the operating system can ask the client process to wait before sending any more
messages. When a message arrives for a client, UDP checks to see if an incoming queue has been created
for the port number specified in the destination port number field of the user datagram. If there is such a
queue, UDP sends the received user datagram to the end of the queue. If there is no such queue, UDP
discards the user datagram and asks the ICMP protocol to send a port unreachable message to the server.
All the incoming messages for one particular client program, whether coming from the same or a different
server, are sent to the same queue. An incoming queue can overflow. If this happens, UDP drops the user
datagram and asks for a port unreachable message to be sent to the server.

Use of UDP
The following lists some uses of the UDP protocol:
UDP is suitable for a process that requires simple request-response communication with little
concern for flow and error control. It is not usually used for a process such as FTP that needs to
send bulk data UDP is suitable for a process with internal flow and error control mechanisms. For
example, the Trivial File Transfer Protocol (TFTP) process includes flow and error control. It can
easily use UDP.
UDP is a suitable transport protocol for multicasting. Multicasting capability is embedded in the
UDP software but not in the TCP software.
UDP is used for management processes such as SNMP
UDP is used for some route updating protocols such as Routing Information Protocol
(RIP)

Transmission Control Protocol (TCP)

RELIABLE BYTE STREAM


TCP is called a connection-oriented, reliable transport protocol. It adds connection-oriented and
reliability features to the services of IP.
TCP Services
Process-to-Process Communication
Like UDP, TCP provides process-to-process communication using port numbers.
Well-known port numbers used by TCP
20 FIP, Data File Transfer Protocol (data connection)
21 FIP, Control File Transfer Protocol (control connection)
23 TELNET Tenninal Network
25 SMTP Simple Mail Transfer Protocol
53 DNS Domain Name Server
67 BOOTP Bootstrap Protocol
80 HTTP Hypertext Transfer Protocol
111 RPC Remote Procedure Call
Stream Delivery Service
TCP allows the sending process to deliver data as a stream of bytes and allows the receiving process to
obtain data as a stream of bytes. TCP creates an environment in which the two processes seem to be
connected by an imaginary "tube" that carries their data across the Internet.

End-to-End Issues
TCP supports logical connections between processes that are running on any two computers in the
Internet. This means that TCP needs an explicit connection establishment phase and an explicit
connection teardown phase. TCP connections are likely to have widely different round-trip times.
TCP assumes that each packet has a maximum lifetime. The exact lifetime, known as the maximum
segment lifetime (MSL), is an engineering choice. The amount of resources dedicated to any one TCP
connection highly variable, especially considering that any one host can potentially support hundreds of
TCP connections at the same time. The sending side of a TCP connection has no idea what links will be
traversed to reach the destination.

Segment Format

Source port address. This is a 16-bit field that defines the port number of the application program in the
host that is sending the segment. This serves the same purpose as the source port address in the UDP
header.
Destination port address. This is a 16-bit field that defines the port number of the application program in
the host that is receiving the segment. This serves the same purpose as the destination port address in the
UDP header.
Sequence number. This 32-bit field defines the number assigned to the first byte of data contained in this
segment. As we said before, TCP is a stream transport protocol. To ensure connectivity, each byte to be
transmitted is numbered. The sequence number tells the destination which byte in this sequence
comprises the first byte in the segment. During connection establishment, each party uses a random
number generator to create an initial sequence number (ISN), which is usually different in each direction.
Acknowledgment number. This 32-bit field defines the byte number that the receiver of the segment is
expecting to receive from the other party. If the receiver of the segment has successfully received byte
number x from the other party, it defines x+1 as the acknowledgment number. Acknowledgment and data
can be piggybacked together.
Header length. This 4-bit field indicates the number of 4-byte words in the TCP header. The length of the
header can be between 20 and 60 bytes. Therefore, the value of this field can be between 5 (5 x 4 =20)
and 15 (15 x 4 =60).
Reserved. This is a 6-bit field reserved for future use.
Flag Description
URG The value of the urgent pointer field is valid.
ACK The value of the acknowledgment field is valid.
PSH Push the data.
RST Reset the connection.
SYN Synchronize sequence numbers during connection.
FIN Terminate the connection.
Window size. This field defines the size of the window, in bytes, that the other party must maintain. Note
that the length of this field is 16 bits, which means that the maximum size of the window is 65,535 bytes.
This value is normally referred to as the receiving window (rwnd) and is determined by the receiver. The
sender must obey the dictation of the receiver in this case.
Checksum. This 16-bit field contains the checksum. The calculation of the checksum for TCP follows the
same procedure as the one described for UDP. However, the inclusion of the checksum in the UDP
datagram is optional, whereas the inclusion of the checksum for TCP is mandatory.
Urgent pointer. This l6-bit field, which is valid only if the urgent flag is set, is used when the segment
contains urgent data. It defines the number that must be added to the sequence number to obtain the

number of the last urgent byte in the data section of the segment. This will be discussed later in this
chapter.
Options. There can be up to 40 bytes of optional information in the TCP header.

Connection Management
Connection Establishment and Termination
A TCP connection begins with a client (caller) doing an active open to a server (callee). Assuming that the
server had earlier done a passive open, the two sides engage in an exchange of messages to establish the
connection. Only after this connection establishment phase is over do the two sides begin sending data.
Likewise, as soon as a participant is done sending data, it closes one direction of the connection, which
causes TCP to initiate a round of connection termination messages.
Three-Way Handshake
The algorithm used by TCP to establish and terminate a connection is called a three-way handshake. The
three-way handshake involves the exchange of three messages between the client and the server The idea
is that two parties want to agree on a set of parameters, which, in the case of opening a TCP connection,
are the starting sequence numbers the two sides plan to use for their respective byte streams. In general,
the parameters might be any facts that each side wants the other to know about. First, the client (the active
participant) sends a segment to the server (the passive participant) stating the initial sequence number it
plans to use (Flags = SYN, SequenceNum = x). The server then responds with a single segment that both
acknowledges the clients sequence number (Flags = ACK, Ack = x+1) and states its own beginning
sequence number (Flags = SYN, SequenceNum = y). That is, both the SYN and ACK bits are set in the
Flags field of this second message. Finally, the client responds with a third segment that acknowledges
the servers sequence number (Flags = ACK, Ack = y +1).

State-Transition Diagram
TCP is complex enough that its specification includes a state-transition diagram. This diagram
shows only the states involved in opening a connection (everything above ESTABLISHED) and in
closing a connection (everything below ESTABLISHED). Each circle denotes a state that one end of a
TCP connection can find itself in. All connections start in the CLOSED state. As the connection
progresses, the connection moves from state to state according to the arcs. Each arc is labeled with a tag
of the form event/action. Thus, if a connection is in the LISTEN state and a SYN segment arrives (i.e., a
segment with the SYN flag set), the connection makes a transition to the SYN RCVD state and takes
the action of replying with an ACK+SYN segment.

Two kinds of events trigger a state transition: (1) a segment arrives from the peer (e.g., the event on the
arc from LISTEN to SYN RCVD), or (2) the local application process invokes an operation on TCP (e.g.,
the active open event on the arc from CLOSED to SYN SENT). The arrival of segment causes the client
to move to the ESTABLISHED state and to send an ACK back to the server. When this ACK arrives, the
server finally moves to the ESTABLISHED state.
On any one side there are three combinations of transitions that get a connection from the
ESTABLISHED state to the CLOSED state:
This side closes first: ESTABLISHEDFIN WAIT 1FIN WAIT 2 TIME WAITCLOSED.
The other side closes first: ESTABLISHEDCLOSE WAIT LAST ACKCLOSED.
Both sides close at the same time: ESTABLISHEDFIN WAIT 1 CLOSINGTIME
WAITCLOSED.

Flow Control
Reliable and Ordered Delivery
TCP on the sending side maintains a send buffer. This buffer is used to store data that has been sent but
not yet acknowledged, as well as data that has been written by the sending application but not transmitted.
On the receiving side, TCP maintains a receive buffer. This buffer holds data that arrives out of order, as
well as data that is in the correct order (i.e., there are no missing bytes earlier in the stream) but that the
application process has not yet had the chance to read.

In a sliding window protocol, the size of the window sets the amount of data that can be sent without
waiting for acknowledgment from the receiver. Thus, the receiver throttles the sender by advertising a
window that is no larger than the amount of data that it can buffer. Observe that TCP on the receive side
must keep
LastByteRcvdLastByteRead MaxRcvBuffer
to avoid overflowing its buffer. It therefore advertises a window size of
AdvertisedWindow = MaxRcvBuffer((NextByteExpected1) LastByteRead)
which represents the amount of free space remaining in its buffer.
TCP on the send side must then adhere to the advertised window it gets from the receiver. This means that
at any given time, it must ensure that
LastByteSentLastByteAcked AdvertisedWindow
Said another way, the sender computes an effective window that limits how much data it can send:
EffectiveWindow = AdvertisedWindow(LastByteSentLastByteAcked)
Clearly, EffectiveWindow must be greater than 0 before the source can send more data. All the while this
is going on, the send side must also make sure that the local application process does not overflow the
send bufferthat is, that
LastByteWrittenLastByteAcked MaxSendBuffer
If the sending process tries to write y bytes to TCP, but
(LastByteWrittenLastByteAcked)+y > MaxSendBuffer
then TCP blocks the sending process and does not allow it to generate more data.

Adaptive Retransmission
Because TCP guarantees the reliable delivery of data, it retransmits each segment if an ACK is not
received in a certain period of time. TCP sets this timeout as a function of the RTT it expects between the
two ends of the connection. Unfortunately, given the range of possible RTTs between any pair of hosts in
the Internet, as well as the variation in RTT between the same two hosts over time, choosing an
appropriate timeout value is not that easy. To address this problem, TCP uses an adaptive retransmission
mechanism. We now describe this mechanism and how it has evolved over time as the Internet
community has gained more experience using TCP.
Original Algorithm
The idea is to keep a running average of the RTT and then to compute the timeout as a function of this
RTT. Specifically, every time TCP sends a data segment, it records the time. When an ACK for that
segment arrives, TCP reads the time again, and then takes the difference between these two times as a
SampleRTT. TCP then computes an EstimatedRTT as a weighted average between the previous estimate
and this new sample. That is,
EstimatedRTT = EstimatedRTT+(1)SampleRTT

The parameter is selected to smooth the EstimatedRTT. A small tracks changes in the RTT but is
perhaps too heavily influenced by temporary fluctuations. On the other hand, a large is more stable but
perhaps not quick enough to adapt to real changes. The original TCP specification recommended a setting
of between 0.8 and 0.9. TCP then uses EstimatedRTT to compute the timeout in a rather conservative
way:
TimeOut = 2EstimatedRTT
Karn/Partridge Algorithm
The problem was that an ACK does not really acknowledge a transmission; it actually acknowledges the
receipt of data. RTT. It is necessary to know which transmission to associate it with so as to compute an
accurate SampleRTT. As illustrated in diagram if you assume that the ACK is for the original transmission
but it was really for the second, then the SampleRTT is too large (a); if you assume that the ACK is for the
second transmission but it was actually for the first, then the SampleRTT is too small (b).

The solution, which was proposed in 1987, is simple. Whenever TCP retransmits a segment, it stops
taking samples of the RTT; it only measures SampleRTT for segments that have been sent only once.
This solution is known as the Karn/Partridge algorithm.
Jacobson/Karels Algorithm
In the new approach, the sender measures a new SampleRTT as before. It then folds this new sample into
the timeout calculation as follows:
Difference = SampleRTTEstimatedRTT
EstimatedRTT = EstimatedRTT+( Difference)
Deviation = Deviation+ (|Difference|Deviation)
Where is a fraction between 0 and 1. That is, we calculate both the mean
RTT and the variation in that mean.
TCP then computes the timeout value as a function of both EstimatedRTT and Deviation as follows:
TimeOut = EstimatedRTT+Deviation
where based on experience, is typically set to 1 and is set to 4. Thus, when the variance is small,
TimeOut is close to EstimatedRTT; a large variance causes the Deviation term to dominate the
calculation.

Congestion
An important issue in a packet-switched network is congestion. Congestion in a network may occur if the
load on the network-the number of packets sent to the network-is greater than the capacity of the
network-the number of packets a network can handle.
Congestion control refers to the mechanisms and techniques to control the congestion and keep the load
below the capacity.

Backpressure is a node-to-node congestion control that starts with a node and propagates, in the opposite
direction of data flow, to the source.
A choke packet is a packet sent by a node to the source to inform it of congestion. The warning is from
the router, which has encountered congestion, to the source station directly.
In implicit signaling, there is no communication between the congested node or nodes and the source. The
source guesses that there is a congestion somewhere in the network from other symptoms.
Explicit Signaling
The node that experiences congestion can explicitly send a signal to the source or destination.
Backward Signaling A bit can be set in a packet moving in the direction opposite to the congestion. This
bit can warn the source that there is congestion and that it needs to slow down to avoid the discarding of
packets.
Forward Signaling A bit can be set in a packet moving in the direction of the congestion. This bit can
warn the destination that there is congestion. The receiver in this case can use policies, such as slowing
down the acknowledgments, to alleviate the congestion.
TCP CONGESTION CONTROL
The essential strategy of TCP is to send packets into the network without a reservation and then to react to
observable events that occur. TCP assumes only FIFO queuing in the networks routers, but also works
with fair queuing.
Congestion collapsehosts would send their packets into the Internet as fast as the advertised window
would allow, congestion would occur at some router (causing packets to be dropped), and the hosts would
time out and retransmit their packets, resulting in even more congestion.
TCP congestion control is for each source to determine how much capacity is available in the network, so
that it knows how many packets it can safely have in transit. Once a given source has this many packets in
transit, it uses the arrival of an ACK as a signal that one of its packets has left the network and that it is
therefore safe to insert a new packet into the network without adding to the level of congestion. By using
ACKs to pace the transmission of packets, TCP is said to be self-clocking.
Additive Increase/Multiplicative Decrease
TCP maintains a new state variable for each connection, called CongestionWindow, which is used by the
source to limit how much data it is allowed to have in transit at a given time. The congestion window
is congestion controls counterpart to flow controls advertised window.
TCPs effective window is revised as follows:
MaxWindow = MIN(CongestionWindow,AdvertisedWindow)
EffectiveWindow = MaxWindow-(LastByteSent-LastByteAcked).
That is, MaxWindow replaces AdvertisedWindow in the calculation of EffectiveWindow. Thus, a TCP
source is allowed to send no faster than the slowest componentthe network or the destination hostcan

accommodate. TCP source sets the CongestionWindow based on the level of congestion it perceives to
exist in the network. This involves decreasing the congestion window when the level of congestion goes
up and increasing the congestion window when the level of congestion goes down. Taken together, the
mechanism is commonly called additive increase/multiplicative decrease (AIMD) TCP interprets
timeouts as a sign of congestion and reduces the rate at which it is transmitting. Specifically, each time a
timeout occurs, the source sets CongestionWindow to half of its previous value. This halving of the
CongestionWindow for each timeout corresponds to the multiplicative decrease part of AIMD.
CongestionWindow is not allowed to fall below the size of a single packet, or in TCP terminology, the
maximum segment size (MSS) Every time the source successfully sends a CongestionWindows worth of
packetsthat is, each packet sent out during the last round-trip time (RTT) has been ACKedit adds the
equivalent of 1 packet to CongestionWindow. This linear increase is illustrated as additive increase.

congestion window is incremented as follows each time an ACK arrives:


Increment = MSS(MSS/CongestionWindow)
CongestionWindow+ = Increment
Slow Start
Slow start is used to increase the congestion window rapidly from a cold start. Slow start effectively
increases the congestion window exponentially, rather than linearly. Specifically, the source starts out by
setting CongestionWindow to one packet. When the ACK for this packet arrives, TCP adds 1 to
CongestionWindow and then sends two packets. Upon receiving the corresponding two ACKs, TCP
increments CongestionWindow by 2one for each ACKand next sends four packets. The end result is
that TCP effectively doubles the number of packets it has in transit every RTT.

There are actually two different situations in which slow start runs. The first is at the very beginning of a
connection, at which time the source has no idea how many packets it is going to be able to have in transit

at a given time. In this situation, slow start continues to double CongestionWindow each RTT until there
is a loss, at which time a timeout causes multiplicative decrease to divide CongestionWindow by 2.
The second situation in which slow start is used is a bit more subtle; it occurs when the connection goes
dead while waiting for a timeout to occur. the source then uses slow start to restart the flow of data rather
than dumping a whole windows worth of data on the network all at once. Although the source is using
slow start again, it now knows more information than it did at the beginning of a connection.
quick-start mechanism is that a TCP sender can ask for an initial sending rate greater than slow start
would allowby putting a requested rate in its SYN packet as an IP option. Routers along the path can
examine the option, evaluate the current level of congestion on the outgoing link for this flow, and decide
if that rate is acceptable, if a lower rate would be acceptable, or if standard slow start should be used. By
the time the SYN reaches the receiver, it will contain either a rate that was acceptable to all routers on the
path or an indication that one or more routers on the path could not support the quick-start request. In the
former case, the TCP sender uses that rate to begin transmission; in the latter case, it falls back to standard
slow start.
Fast Retransmit and Fast Recovery
Fast retransmit is a heuristic that sometimes triggers the retransmission of a dropped packet sooner than
the regular timeout mechanism. The fast retransmit mechanism does not replace regular timeouts; it just
enhances that facility. The idea of fast retransmit is straightforward. Every time a data packet arrives at
the receiving side, the receiver responds with an acknowledgment, even if this sequence number has
already been acknowledged. Thus, when a packet arrives out of orderwhen TCP cannot yet
acknowledge the data the packet contains because earlier data has not yet arrivedTCP resends the same
acknowledgment it sent the last time. This second transmission of the same acknowledgment is called a
duplicate ACK. When the sending side sees a duplicate ACK, it knows that the other side must have
received a packet out of order, which suggests that an earlier packet might have been lost. Since it is also
possible that the earlier packet has only been delayed rather than lost, the sender waits until it sees some
number of duplicate ACKs and then retransmits the missing packet. In practice, TCP waits until it has
seen three duplicate ACKs before retransmitting the packet.

When the fast retransmit mechanism signals congestion, rather than drop the congestion window all the
way back to one packet and run slow start, it is possible to use the ACKs that are still in the pipe to clock
the sending of packets. This mechanism, which is called fast recovery, effectively removes the slow start
phase that happens between when fast retransmit detects a lost packet and additive increase begins.

CONGESTION-AVOIDANCE MECHANISMS
To predict when congestion is about to happen and then to reduce the rate at which hosts send data just
before packets start being discarded. We call such a strategy congestion avoidance, to distinguish it from
congestion control.
DECbit
The first mechanism was developed for use on the Digital Network Architecture (DNA), a connectionless
network with a connection-oriented transport protocol. This mechanism could, therefore, also be applied
to TCP and IP.
Each router monitors the load it is experiencing and explicitly notifies the end nodes when congestion is
about to occur. This notification is implemented by setting a binary congestion bit in the packets that flow
through the router, hence the name DECbit. The destination host then copies this congestion bit into the
ACK it sends back to the source. Finally, the source adjusts its sending rate so as to avoid congestion. The
following discussion describes the algorithm in more detail, starting with what happens in the router.
A single congestion bit is added to the packet header. A router sets this bit in a packet if its average queue
length is greater than or equal to 1 at the time the packet arrives. This average queue length is measured
over a time interval that spans the last busy+idle cycle, plus the current busy cycle.

Random Early Detection (RED)


A second mechanism, called random early detection (RED), is similar to the DECbit scheme in that each
router is programmed to monitor its own queue length and, when it detects that congestion is imminent, to
notify the source to adjust its congestion window. RED is most commonly implemented such that it
implicitly notifies the source of congestion by dropping one of its packets. The source is, therefore,
effectively notified by the subsequent timeout or duplicate ACK. As the early part of the RED acronym
suggests, the gateway drops the packet earlier than it would have to, so as to notify the source that it
should decrease its congestion windows ooner than it would normally have The second difference
between RED and DECbit is in the details of how RED decides when to drop a packet and what packet it
decides to drop. RED computes an average queue length using a weighted running average similar to the
one used in the original TCP timeout computation. That is, AvgLen is computed as
AvgLen = (1Weight)AvgLen+WeightSampleLen where 0 < Weight < 1
and SampleLen is the length of the queue when a sample measurement is made. RED has two queue
length thresholds that trigger certain activity: MinThreshold and MaxThreshold. When a packet arrives at
the gateway, RED compares the current AvgLen with these two thresholds, according to the following
rules:
if AvgLen _ MinThreshold
queue the packet
if MinThreshold < AvgLen < MaxThreshold
calculate probability P

drop the arriving packet with probability P


if MaxThreshold AvgLen
drop the arriving packet
If the average queue length is smaller than the lower threshold, no action is taken, and if the average
queue length is larger than the upper threshold, then the packet is always dropped. If the average queue
length is between the two thresholds, then the newly arriving packet is dropped with some probability P.

Quality of Service
Application Requirements
Four types of characteristics are attributed to a flow: reliability, delay, jitter, and bandwidth
Reliability
Reliability is a characteristic that a flow needs. Lack of reliability means losing a packet or
acknowledgment, which entails retransmission.
Delay
Source-to-destination delay is another flow characteristic. Again applications can tolerate delay in
different degrees. In this case, telephony, audio conferencing, video conferencing, and remote log-in need
minimum delay, while delay in file transfer or e-mail is less important.
Jitter
Jitter is the variation in delay for packets belonging to the same flow. For example, if four packets depart
at times 0, 1, 2, 3 and arrive at 20, 21, 22, 23, all have the same delay, 20 units of time. On the other hand,
if the above four packets arrive at 21, 23, 21, and 28, they will have different delays: 21,22, 19, and 24.
For applications such as audio and video, the first case is completely acceptable; the second case is not.
For these applications, it does not matter if the packets arrive with a short or long delay as long as the
delay is the same for all packets. For this application, the second case is not acceptable. Jitter is defined as
the variation in the packet delay. High jitter means the difference between delays is large; low jitter means
the variation is small.
Bandwidth
Different applications need different bandwidths. In video conferencing we need to send millions of bits
per second to refresh a color screen while the total number of bits in an e-mail may not reach even a
million.
Approaches to QoS Support
Considering this rich space of application requirements, what we need is a richer service model that meets
the needs of any application. This leads us to a service model with not just one class (best effort), but with
several classes, each available to meet the needs of some set of applications. Towards this end, we are
now ready to look at some of the approaches that have been developed to provide a range of qualities of
service. These can be divided into two broad categories:
Fine-grained approaches, which provide QoS to individual applications or flows
Coarse-grained approaches, which provide QoS to large classes of data or aggregated traffic
In the first category, we find Integrated Services, a QoS architecture developed in the IETF and often
associated with the Resource Reservation Protocol (RSVP); ATMs approach to QoS was also in this
category. In the second category lies Differentiated Services, which is probably the most widely deployed
QoS mechanism at the time of writing.

You might also like