Professional Documents
Culture Documents
5
®
®
A Publication of the IEEE Communications Society
in cooperation with the
IEEE Computer Society and the
Internet Society
®
LYT-TOC-SEPT 9/5/08 1:09 PM Page 1
Special Issue
Implications and Control of Middleboxes in the Internet
Guest Editors: Xiaoming Fu, Martin Stiemerling, and Henning Schulzrinne
EDITOR’S NOTE
®
D
Frank Magee, Consultant, USA
Ioanis Nikolaidis, U. of Alberta, Canada
Georgios I. Papadimitriou, Aristotle Univ., Greece
Mohammad Peyravian, IBM Corporation, USA ear readers, welcome to the September 2008 issue of IEEE Network.
Kazem Sohraby, U. of Arkansas, USA
James Sterbenz, Univ. of Kansas, USA The sound of trucks, the heavy duty disposal bins on the curb,
Joe Touch, USC/ISI, USA and the thud and bang of construction are all elements of a “quiet” sum-
Vittorio Trecordi, CEFRIEL, Italy
Guoliang Xue, Arizona State Univ., USA mer, full of renovations, in my neighborhood. Resisting this Siren’s call is
Raj Yavatkar, Intel, USA difficult even if I swore off any renovations for the rest of my life, given
Bulent Yener, Rensselaer Polytechnic Institute, USA
past experience. I naively thought that this time it wouldn’t be that bad.
Feature Editors After all, this time it looks like a much smaller job than last time. Of
Olivier Bonaventure, "Software Tools for Networking" course, I neglected a key conservation law: if a job is small, the additional
U. Catholique de Louvain, Belgium
Olivier Bonaventure, "New Books & Multimedia" delays for various reasons will expand it to be roughly equal the total time
U. Catholique de Louvain, Belgium of a “big” job (more professionally managed one might argue, and hence
IEEE Production Staff with much less slack). What I was not prepared to experience is the shift in
Joseph Milizzo, Assistant Publisher attitudes caused by the widespread adoption of many “information” appli-
Eric Levine, Associate Publisher ances in today’s household.
Susan Lange, Digital Production Manager
Catherine Kemelmacher, Associate Editor I should have spotted the shift when my contractor warned that he would
Jennifer Porcello, Publications Coordinator
Devika Mittra, Publications Assistant need to turn off the power to our house, only to qualify it with “If that’s
okay with your gear, right?” noticing that there were maybe a tad too many
2008 IEEE Communications Society Officers devices, computers, firewalls, servers, and bridges spread around the house.
Doug Zuckerman, President
Andrzej Jajszczyk, VP–Technical Activities He was concerned that some might develop bad hiccups after the switch
Mark Karol, VP–Conferences was turned off and on again. He had experienced himself some “unhealthy”
Byeong Gi Lee, VP–Member Relations
Sergio Benedetto, VP–Publications side effects to his equipment under similar circumstances, so his concern
Nim Cheung, Past President was genuine. I thought for a moment of explaining the benefits of stateless-
Stan Moyer, Treasurer
John M. Howell, Secretary ness and how, I would hope, most of my gear could survive power being cut
and restored later (no I don’t have a UPS — I believe in luck). I decided
Board of Governors
The officers above plus Members-at-Large: not to expand on the topic, just agreeing that it was okay to cut the power
Class of 2008 to the house.
Thomas M. Chen, Andrea Goldsmith
Khaled Ben-Letaief, Peter J. McLane Things indeed went as planned, although it should have struck me as odd
Class of 2009 that he did not ask about other things that might be influenced by cutting the
Thomas LaPorta, Theodore Rappaport
Catherine Rosenberg, Gordon Stuber power. A few days later, while I was at work, the contractor stumbled on a
Class of 2010 dilemma. He had to run a industrial strength vacuum cleaner to pick up lots
Fred Bauer, Victor Frost
Stefano Galli, Lajos Hanzo
of debris. Having pulled down walls and removed several wall outlets left him
with no choice but to run an extension cord to the nearest outlet he could
2008 IEEE Officers
Lewis M. Terman, President find still standing. It happened that this was an outlet already fully populated
John R. Vig, President-Elect by two cords, one connecting a refrigerator we keep in the basement, and one
Barry L. Shoop, Secretary
David G. Green, Treasurer connecting a NAT/firewall box, a nearby server, and a cable modem. Without
Leah H. Jamieson, Past President any hesitation, he removed the one least likely to create a hassle: the refrig-
Jeffry W. Raynes, Executive Director
Curtis A. Siller, Jr., Director, Division III erator!
In comparison to a NAT box, a refrigerator is low tech and almost stateless
— if not its volatile contents. His choice was reasonable. He was not expect-
ing to keep it for more than an hour this way. But human nature conspired.
The contractor forgot to plug in the refrigerator when he was done. The
® packets were running smoothly while our frozen veggies were thawing. To
EDITOR’S NOTE
make matters worse, and blame my own human nature review of where we are in middlebox evolution and
here, I did not notice the “failure” until late in the how they might further evolve. I would like to thank
evening (okay, so I do keep some beer there too). Had it the guest editors, Xiaoming Fu, Martin Stiemerling,
been the firewall malfunctioning I would have spotted it and Henning Schulzrinne, as well as the liaison editor
in minutes. I spent a good part of the evening deciding of this issue, Jon Crowcroft, for their excellent work in
what had to be thrown away and what to keep (luckily putting this issue together. I would also to welcome a
this was not a warm day) and laughing at our priorities: new member to our editorial board: Dr. Admela Jukan.
mine and the contractor’s. Dr. Jukan received her Ph.D. degree from Vienna Uni-
The fact that everyday people think of consumer-grade versity of Technology in Austria, and is currently a W3
networking and information appliances as possibly the Professor of Electrical and Computer Engineering at
most sensitive objects in a house reflects what they have the Technical University Carolo-Wilhelmina of
learned from their own experience in the recent past. Brunswick (Braunschweig), Germany. Dr. Jukan served
After all, a lost file can be a major blow, while a pound between 2002 and 2004 as Program Director in Com-
of rotten spinach is, well, compost. A handful of remark- puter and Networks System Research at the National
able technologies made it into these everyday devices, Science Foundation (NSF), responsible for funding and
and one that is still a topic of research, extension, and coordinating US-wide university research and educa-
overall controversy is Network Address Translation tion activities in the area of network technologies and
(NAT). NAT is no longer just a way to establish a home systems.
user’s little kingdom of an Internet-connected private As always, your feedback regarding the direction and
network (while guilt-free of hoarding IP addresses). NAT substance of the magazine is invaluable and always
boxes are increasingly active participants as the ‘’middle- appreciated. Please contact me, by e-mail, at
point’’ of communication paths and this has led to the yannis@cs.ualberta.ca, to let me know what you think
use of a new term, “middlebox,” to describe the particu- about the editorial comments, what type of content
lar class of technologies. might be more interesting to you, and in what ways the
This special issue, entitled “Implications and Control magazine’s distinct character could be improved or fur-
of Middleboxes in the Internet,” provides a timely ther publicized.
The New Books and Multimedia column contains brief reviews of new books in the is less interesting than the first part, where
computer communications field. Each review includes a highly abstracted description the CSP models could be of interest to
of the contents, relying on the publisher’s descriptive materials, minus advertising readers who are more interested in the
superlatives, and checked for accuracy against a copy of the book. The reviews also application of formal description tech-
comment on the structure and the target audience of each book. Publishers wishing to niques to network protocols.
have their books listed in this manner should contact Olivier Bonaventure by email.
Olivier Bonaventure Patterns in Network Architec-
Université Catholique de Louvain, Belgium
bonaventure@ieee.org ture : A Return to Fundamen-
tals
John Day, Prentice Hall, 2008, ISBN-
LAN Switch Security : What power over Ethernet. The second part 10: 0132252422, Hardbound, 464
focuses on techniques can that be used pages
Hackers Know About Your on switches to sustain denial-of-service
Switches attacks, from both forwarding and con- The architecture of today’s Internet was
Eric Vyncke and Christopher Pagen, trol plane viewpoints. The last part ana- mainly designed together with the TCP
lyzes recent techniques that can be used and IP protocols in the 1970s and early
Cisco Press, 2008, ISBN-10: 1-58705-
to improve the security of Ethernet 1980s. During the last years, researchers
256-3, Softbound, 360 pages
switches, such as 802.1x or 802.1AE and and funding organizations in America,
Ethernet is now the default fixed local area access control lists. Europe, and Asia have started to work on
network technology. Ethernet LANs are different alternative architectures for the
found in all enterprise environments, and Principles of Protocol Design Internet. Some consider an evolutionary
in more and more home networks. Ether- Robin Sharp, Springer Verlag, 2008, approach where the Internet architecture
net was designed in the 1970s when securi- would be incrementally modified in a
ISBN: 978-3-540-77540-9, Hard-
ty was not a concern. Since then, Ethernet backward compatible manner, while oth-
has evolved with the introduction of hubs bound, 402 pages ers believe a completely new architecture
and switches. Many network administra- This book takes an unusual path to should be developed to take into account
tors are aware that hubs are a security con- describe computer network protocols. the requirements of today’s and tomor-
cern since they broadcast Ethernet frames, While most standard networking texts row’s Internet.
and some of them assume that switches are mainly focus on a textual description of John Day’s book is a must read for
more secure. Unfortunately, hackers have the different protocols and mechanisms, researchers interested in the evolution of
learned the limitations of Ethernet switch- Robin Sharp starts from formal descrip- the Internet architecture. The book is
es and have developed several tools that tion techniques. More precisely, he choos- composed of two main parts. The first part
can be used to exploit them. es the Communicating Sequential is mainly a history of the evolution of
This book describes the current state of Processes (CSP) notation proposed by computer network architectures in the
the art in securing Ethernet switches. The Hoare. CSP is a process algebra that 1970s and 1980s. John Day participated
authors take a practical approach by using allows to model the interactions among actively in this research on both the Inter-
different types of Cisco switches and freely communicating processes. The book starts net side and the OSI side. He explains the
available tools to demonstrate the security with a detailed description of CSP and reasons for some of the design choices and
problems and their solutions. Despite its then uses the CSP formalism to describe discusses alternatives that were considered
focus on a single vendor, this book is an several mechanisms such as flow and error but not selected. The discussion considers
interesting reference for system adminis- control, fault-tolerant broadcast, and two- several of the key elements of a computer
trators who are willing to better under- phase commits. An advantage of using network architecture, including the proto-
stand how to secure their Ethernet CSP is that the book contains proofs of col elements, layering, naming, and
networks. This is particularly important in several of the described mechanisms. addressing.
environments such as schools were uncon- However, as CSP does not contain com- The second part describes John Day’s
trolled laptops are often connected. plex data types, it is difficult to completely vision of an alternative network architec-
The first part discusses the basic secu- model complex protocols in detail. Sur- ture. For this, he starts by reconsidering
rity problems that affect Ethernet switch- prisingly, the author did not consider network-based InterProcess Communica-
es: the learning bridge process and the more powerful formal description tech- tion (IPC) and shows that a distributed
implications of the limited size of the niques that evolved from CSP such as IPC should be at the core of a computer
MAC table on Ethernet switches. It also LOTOS. network architecture. This discussion is
discusses configurations to mitigate these The second part of the book is more interesting, but the author does explain in
problems. Then the book analyzes sever- heterogeneous. Several security protocols detail how it could be realized in practice.
al protocols and their security implica- are discussed, and the BAN logic is intro- The second part ends with two chapters
tions: the spanning tree protocol, the duced. Then the author briefly discusses on topological addressing influenced by
802.1q VLANs, DHCP, IPv4 ARP, and real protocols. The discussion considers Mike O’Dell’s GSE proposal, and a dis-
IPv6 Neighbor Discovery, but also sur- both open system interconnection (OSI) cussion of the impact of multicast and
prising electrical security issues with protocols and Internet protocols. This part multihoming on the architecture.
GUEST EDITORIAL
M
iddleboxes in the Internet have been explored, egorized as explicit control and implicit control of firewalls
sometimes quite controversially, in operations, and NATs. For explicit control, an entity, either the end host
standardization, and the research community for or a proxy in the network, has a relationship with the middle-
more than 10 years. The main concern in the box and controls its behavior (e.g., the set of policies or filter
past has been their contradicting nature to the Internet’s end- rules loaded). Examples of explicit control are universal plug
to-end principle. In the past, many have expressed concerns and play (UPnP), Internet Engineering Task Force (IETF)
that middleboxes contradict the Internet's end-to-end principle Middlebox Communications (MIDCOM), and IETF Next
that is often understood to posit that "intelligence" is placed in Steps in Signaling (NSIS). On the other hand, implicit control
end system and network elements just forward packets. Mid- is the traditional way of traversing middleboxes. Implicit con-
dleboxes introduce functions beyond forwarding in the data trol does not have any control relationship with the middlebox,
path between a source and destination, as described, for exam- because end hosts, probably with the support of other end
ple, in RFC 3234. RFC 3234 describes a wide range of middle hosts, are using hole punching techniques to get a working
boxes, from TCP performance enhancing proxies to middlebox traversal. Examples of implicit control are the
transcoders. IETF’s Session Traversal Utilities for NAT (STUN), Traversal
On the other hand, middleboxes were introduced in the Using Relays around NAT (TURN), and Interactive Connec-
Internet for various reasons: NATs intend to decouple the tivity Establishment (ICE). In addition, there have been some
internal IP addressing from the public address space while recent attempts to design or use certain types of middleboxes,
allowing multiple hosts to share a single public IP address, for such as various application proxies.
the purpose of preserving the IP address space; firewalls are In this special issue we are pleased to introduce a series of
used for administrators to enforce policies on the data traffic state-of-the-art articles on this specific area. These articles
at administrative borders with the intention of preventing their cover the subject from a variety of perspectives, offering the
networks from being attacked or monitored; application level readers an understanding of the issues and implications of var-
gateways (ALGs) are typically used to assist applications in ious middleboxes in the Internet, including their control mech-
their operations. anisms. A total of eigh articles, selected from 26 submissions
The implications of the emergence and popularity of based on a strict peer review process, cover a broad range in
middleboxes are complicated. With middleboxes it is diffi- the field of implications and control of middleboxes in the
cult to even provide basic end-to-end connectivity for many Internet. While some articles present more general issues with
applications. For example, Internet hosts behind NATs can middleboxes, understanding their behaviors and implications,
only initiate a TCP connection with another host, but can- others focus on new approaches to controlling and usiing mid-
not accept a connection request. Unlike in the past, when dleboxes.
the vast majority of applications followed the client-server NATs, an unplanned reality, have posed complications to
design pattern, and most hosts behind NATs were clients the Internet architecture and applications. The first article, “A
anyway (e.g., your browser accessing a Web server), a vari- Retrospective View of NAT” by Lixia Zhang, takes readers
ety of new applications today, such as voice-over-IP, gam- back to the early days of middleboxes. It gives a historic review
ing, and peer-to-peer file sharing cause an enormous list of of NATs and the lessons learned, including how they impeded
issues. Hosts behind NATs are not reachable from any standardization and deployment of IPv6, and an expected solu-
other host anymore, which become particularly troublesome tion for addressing the Internet address depletion problem.
for VoIP and other peer-to-peer applications. Likewise, Without a timely standardization of NAT, today there have
firewalls are usually statically configured to block certain been a number of different NAT implementations, and it is
TCP ports or do not understand non-TCP protocols, mak- vital to understand their behaviors due to their nearly ubiqui-
ing it difficult to deploy new applications and protocols. tous presence.
This results in a number of issues to be considered in the The second article, “Behavior and Classification of NAT
design and development of new protocols and applications. Devices and Implications for NAT Traversal” by Andreas
To mitigate the negative impacts of these issues, quite a Müller, Andreas Klenk, and Georg Carle, provides a compre-
number of techniques have been developed, which can be cat- hensive overview of NAT behaviors and currently available
GUEST EDITORIAL
NAT traversal techniques. The article presents a new catego- Yet another type of middlebox function, intelligent route
rization approach based on an analytical abstraction of NAT control (IRC) for multihomed sites and subscribers, has been
traversal, which classifies NAT traversal services into four dis- recently identified as a key issue in efficient network opera-
tinct types and deduces the corresponding NAT behaviors. tions. The final article, “Improving the Performance of Route
This may help developers of new protocols and applications to Control Middleboxes in a Competitive Environment” by
determine applicable techniques for NAT traversal. Marcelo Yannuzzi et al., addresses this issue and introduces an
While the first two articles describe the history, behavior, IRC approach for competitive environments, by blending ran-
and classification of NAT, the next article by Dilip Joseph and domization with adaptive filtering techniques.
Ion Stoica, “Modeling Middleboxes,” proposes a formal and We hope that these articles will help to clarify and explain
generic model for deducing middlebox functionalities and the state-of-the-art advances on middlebox issues in the Inter-
behaviors. Using this model, the article illustrates how differ- net, providing current visions of how the behaviors, implica-
ent middleboxes process packets, and how four common mid- tions, and control of middlboxes may be analyzed,
dleboxes — firewall, NAT, layer 4, and layer 7 load balancers encompassed, and utilized. In preparing this special issue, we
— may be depicted. As such, the article provides an initial step wish to thank all the peer reviewers for their efforts in careful-
for relevant designers, users, and researchers to understand ly reviewing the manuscripts to meet the tight deadlines. We
and refine the behaviors and implications of various middle- are grateful to our liaison editor Jon Crowcroft for his con-
boxes. structive feedbacks, and Editor-in-Chief Ioanis Nikolaidis for
Existing middleboxes mostly consider TCP and UDP in his timely and critical suggestions.
their implementations, and typically do not support other pro-
tocols, such as the Stream Control Transmission Protocol Biographies
(SCTP). In the fourth article, Michael Tüxen et al. describe the X IAOMING F U [M’02] (fu@cs.uni-goettingen.de) received his Ph.D. degree in
extensions required to support NAT for SCTP. The analysis computer science from Tsinghua University, Beijing, China, in 2000. After
presented in this article may be useful as a general lesson in almost two years of postdoctoral work at Technical University Berlin, he joined
the near future, as several other protocols after SCTP, includ- the University of Göttingen as an assistant professor, leading a team working on
networking research. Since April 2007 he has been a professor and head of the
ing DCCP, XCP, and HIP, use similar techniques such as mul- Computer Networks Group at the University of Göttingen. During 2003–2005
tihoming, rehoming, and handshake cookies. he also served as an expert on the ETSI Specialist Task Forces on Internet Proto-
Applications using the Session Initialization Protocol (SIP) col Testing; he was also a visiting scientist at the University of Cambridge and Columbia
or peer-to-peer way of operation (P2PSIP or just normal P2P University. In the research fields of architectures, protocols, and applications for
QoS, firewalls, p2p overlay, and mobile networking as well as related security issues,
applications) are among those that suffer most from the mid- he (co-)authored more than 50 referred papers as well as several RFCs/I-Ds. He
dlebox traversal issue. The fifth article, “Distributed Connec- has served as TPC member and session chair for several conferences, including
tivity Service for a SIP Infrastructure” by Luigi Ciminiera et IEEE INFOCOM, ICNP, ICDCS, GLOBECOM, and ICC. He was also founding
al., examines this issue and presents an alternative approach to chair of the ACM Workshop on Mobility in the Evolving Internet Architecture (MobiArch)
and is TPC Co-Chair of IEEE GLOBECOM 2009 Next Generation Networking
the current STUN/TURN/ICE approach to middlebox traver- and Internet Symposium. He is currently a member of the editorial board of
sal. The approach distributes the rendezvous and relay func- Computer Communications Journal (Elsevier).
tions among SIP user agents, which discover their peers
autonomously and maintain a P2P overlay to ensure connectiv- M ARTIN S TIEMERLING [M’00] (stiemerling@cs.uni-goettingen.de) received his
M.Sc. degree (Diploma) in electrical eengineering with a focus on IP networking
ity across NATs and firewalls in a SIP infrastructure without technologies from the Polytechnic University of Applied Sciences in Cologne in 2000.
relying on a centralized server. After that he joined the NEC Laboratories Europe, Heidelberg, Germany, where
The remaining three articles address new applications of mid- he is currently a senior researcher. His areas of research interest are Internet
dleboxes. The sixth article, “Dial M for Middlebox Managed architecture, Internet signaling protocols, network management, and overlay/
peer-to-peer systems. He has published several papers in these areas, and
Mobility” by Stephen Herborn and Aruna Seneviratne, served as a TPC member of IEEE IPOM 2007. In the IETF he is active as working
describes a new usage type of middleboxes for mobility support document editor in the MIDCOM, MMUSIC, and NSIS working groups, as well
via the concept of virtual private “personal networks.” Such a as in other IETF working groups and IRTF research groups. He is co-chair of the
network is created and maintained by way of HIP combined IETF Next Steps in Signaling (NSIS) working group, and secretary of the IP over
DVB (IPDVB) working group, and a co-author of RFC 3816, RFC 3989, and RFC
with IPsec and supported by middlebox state drop "(at least to 4540, as well as RTSPng.
some extent)" plus middlebox state, which may be interesting (at
least to some extent) for the recent research efforts on network HENNING SCHULZRINNE [F’06] (hgs@cs.columbia.edu) received his Ph.D. from the
virtualization, as they use today’s technologies directly. University of Massachusetts in Amherst, Massachusetts. He was a member of
technical staff at AT&T Bell Laboratories, Murray Hill, New Jersey, and an asso-
An increasing number of home users today are using NATs ciate department head at GMD-Fokus (Berlin) before joining the Computer Sci-
to connect their home IP devices with the Internet. Choongul ence and Electrical Engineering Departments at Columbia University, New York.
Park et al. discuss this issue in their article “Issues in the He is currently a professor and chair of the Department of Computer Science.
Remote Management of Home Network Devices.” By extend- He has been a member of the Board of Governors of the IEEE Communications
Society and is vice chair of ACM SIGCOMM, former chair of the IEEE Commu-
ing SNMP and using additional management objects (MOs) to nications Society Technical Committees on Computer Communications and the Inter-
gather NAT binding information, the authors attempt to net, has been technical program chair of Global Internet, INFOCOM,
address the NAT traversal problem under a symmetric NAT, NOSSDAV, and IPTCOMM, and was General Chair of ACM Multimedia 2004.
based on their observations in Korea. While the success rate of He has also been a member of the Internet Architecture Board. Protocols co-
developed by him, such as RTP, RTSP, and SIP, are now Internet standards, used
NAT traversal could be a potential issue outside Korea, the by almost all Internet telephony and multimedia applications. His research inter-
article provides an insight of what home networking standards ests include Internet multimedia systems, ubiquitous computing, mobile systems, qual-
may have to deal with. ity of service, and performance evaluation.
A Retrospective View of
Network Address Translation
Lixia Zhang, University of California, Los Angeles
Abstract
Today, network address translators, or NATs, are everywhere. Their ubiquitous
adoption was not promoted by design or planning but by the continued growth of
the Internet, which places an ever-increasing demand not only on IP address space
but also on other functional requirements that network address translation is per-
ceived to facilitate. This article presents a personal perspective on the history of
NATs, their pros and cons in a retrospective light, and the lessons we can learn
from the NAT experience.
public IP destination address D located in the global Inter- RFC 1287 also discussed three possible directions to extend
net, the packet is routed to N. N translates the private IP address space. The first one pointed to a direction similar
source IP address in P’s header to N’s public IP address and to current NATs:
adds an entry to its internal table that keeps track of the
mapping between the internal host and the outgoing packet. Replace the 32-bit field with a field of the same size but with a
This entry represents a piece of state, which enables subse- different meaning. Instead of being globally unique, it would be
quent packet exchanges between H and D. For example, unique only within some smaller region. Gateways on the bound-
when D sends a packet P’ in response to P, P’ arrives at N, ary would rewrite the address as the packet crossed the boundary.
and N can find the corresponding entry from its mapping
table and replace the destination IP address — which is its RFC 1335 [3], published shortly after RFC 1287, provided
own public IP address — with the real destination address a more elaborate description of the use of internal IP address-
H, so that P’ will be delivered to H. The mapping entry times es (i.e., private IP addresses) as a solution to IP address
out after a certain period of idleness that is typically set to a exhaustion. The first article describing the NAT idea, “Extend-
vendor-specific value. In the process of changing the IP ing the IP Internet through Address Reuse” [10], appeared in
address carried in the IP header of each passing packet, a the January 1993 issue of ACM Computer Communication
NAT box also must recalculate the IP header checksum, as Review and was published a year later as RFC 1631 [11].
well as the checksum of the transport protocol if it is calcu- Although these RFCs can be considered forerunners in the
lated based on the IP address, as is the case for Transmis- development of NAT, as explained later, for various reasons
sion Control Protocol (TCP) and User Datagram Protocol the IETF did not take action to standardize NAT.
(UDP) checksums. The invention of the Web further accelerated Internet
From this brief description, it is easy to see the major bene- growth in the early 1990s. The explosive growth underlined
fit of a NAT: one can connect a large number of hosts to the the urgency to take action toward solving both the routing
global Internet by using a single public IP address. A number scalability and the address shortage problems. The IETF took
of other benefits of NATs also became clear over time, which several follow-up steps, which eventually led to the launch of
I will discuss in more detail later. the IPng development effort. I believe that the expectation at
At the same time, a number of drawbacks to NATs also the time was to develop a new IP within a few years, followed
can be identified immediately. First and foremost, the NAT by a quick deployment. However, the actual deployment dur-
changed the end-to-end communication model of the Inter- ing the next ten years took a rather unexpected path.
net architecture in a fundamental way: instead of allowing
any host to talk directly to any other host on the Internet, the The Planned Solution
hosts behind a NAT must go through the NAT to reach oth- As pointed out in RFC 1287, the continued growth of the
ers, and all communications through a NAT box must be ini- Internet exposed strains on the original design of the Internet
tiated by an internal host to set up the mapping entries on architecture, the two most urgent of which were routing sys-
the NAT. In addition, because ongoing data exchange tem scalability and the exhaustion of IP address space.
depends on the mapping entry kept at the NAT box, the box Because long-term solutions require a long lead time to devel-
represents a single point of failure: if the NAT box crashes, it op and deploy, efforts began to develop both a short term and
could lose all the existing state, and the data exchange a long-term solution to those problems.
between all of the internal and external hosts must be restart- Classless inter-domain routing, or CIDR, was proposed as a
ed. This is in contrast to the original goal of IP of delivering short term solution. CIDR removed the class boundaries
packets to their destinations, as long as any physical connec- embedded in the IP address structure, thus enabling more
tivity exists between the source and destination hosts. Fur- efficient address allocation, which helped extend the lifetime
thermore, because a NAT alters the IP addresses carried in a of IP address space. CIDR also facilitated routing aggrega-
packet, all protocols that are dependent on IP addresses are tion, which slowed down the growth of the routing table size.
affected. In certain cases, such as TCP checksum, which However, as stated in RFC 1481 [12], IAB Recommendation
includes IP addresses in the calculation, the NAT box can for an Intermediate Strategy to Address the Issue of Scaling:
hide the address change by recalculating the TCP checksum “This strategy (CIDR) presumes that a suitable long-term
when forwarding a packet. For some of the other protocols solution is being addressed within the Internet technical com-
that make direct use of IP addresses, such as IPSec [7], the munity.” Indeed, a number of new IETF working groups start-
protocols can no longer operate on the end-to-end basis as ed in late 1992 and aimed at developing a new IP as a
originally designed; for some application protocols, for exam- long-term solution; the Internet Engineering Steering Group
ple, File Transfer Protocol (FTP) [8], that embed IP address- (IESG) set up a new IPng area in 1993 to coordinate the
es in the application data, application-level gateways are efforts, and the IPng Working Group (later renamed to IPv6)
required to handle the IP address rewrite. As discussed later, was established in the fall of 1994 to develop a new version of
NAT also introduced other drawbacks that surfaced only IP [13].
recently. CIDR was rolled out quickly, which effectively slowed the
growth of the global Internet routing table. Because it is a
quick fix, CIDR did not address emerging issues in routing
A Recall of the History of NATs scalability, in particular the issue of site multihoming. A multi-
I started my Ph.D. studies in the networking area at the Mas- homed site should be reachable through any of its multiple
sachusetts Institute of Technology at the same time as RFC provider networks. In the existing routing architecture, this
791 [9], the Internet Protocol Specification, was published in requirement translates to having the prefix, or prefixes, of the
September 1981. Thus I was fortunate to witness the fascinat- site listed in the global routing table, thereby rendering
ing unfolding of this new system called the Internet. During provider-based prefix aggregation ineffective. Interested read-
the next ten years, the Internet grew rapidly. RFC 1287 [2], ers are referred to [14] for a more detailed description on
Towards the Future Internet Architecture, was published in 1991 multihoming and its impact on routing scalability.
and was probably the first RFC that raised a concern about IP The new IP development effort, on the other hand, took
address space exhaustion in the foreseeable future. much longer than anyone expected when the effort first
began. The IPv6 working group finally completed all of the changing providers, other than renumbering the public IP
protocol development effort in 2007, 13 years after its estab- address of the NAT box.
lishment. The IPv6 deployment also is slow in coming. Until Similarly, a NAT box also makes multihoming easy. One
recently, there were relatively few IPv6 trial deployments; NAT box can be connected to multiple providers and use one
there is no known commercial user site that uses IPv6 as the IP address from each provider. Not only does the NAT box
primary protocol for its Internet connectivity. shelter the connectivity to multiple ISPs from all the internal
If one day someone writes an Internet protocol develop- hosts, but it also does not require any of its providers to
ment history, it would be very interesting to look back and “punch a hole” in the routing announcement (i.e., make an
understand the major reasons for the slow development and ISP de-aggregate its address block). Such a hole punch would
adoption of IPv6. But even without doing any research, one be required if the multihomed site takes an IP address block
could say with confidence that NATs played a major role in from one of its providers and asks the other providers to
meeting the IP address requirement that arose out of the announce the prefix.
Internet growth and at least deferred the demand for a new Furthermore, this one level of indirection also is perceived
IP to provide the much needed address space to enable the as one level of protection because external hosts cannot
continued growth of the Internet. directly initiate communication with hosts behind a NAT, nor
can they easily figure out the internal topology.
The Unplanned Reality Besides all of the above, two additional factors also con-
Although largely unexpected, NATs have played a major tributed greatly to the quick adoption of NATs. First, NATs
role in facilitating the explosive growth of Internet access. can be unilaterally deployed by any end site without any coor-
Nowadays, it is common to see multiple computers, or even dination by anybody else. Second, the major gains from
multiple LANs, in a single home. It would be unthinkable deploying a NAT were realized on day one, whereas its poten-
for every home to obtain an IP address block, however small tial drawbacks were revealed only slowly and recently.
it may be, from its network service provider. Instead, a com-
mon implementation for home networking is to install a
NAT box that connects one home network or multiple home
The Other Side of the NAT
networks to a local provider. Similarly, most enterprise net- A NAT disallows the hosts behind it from being reachable by
works deploy NATs as well. It also is well known that coun- an external host and hence disables it from being a server.
tries with large populations, such as India and China, have However, in the early days of NAT deployment, many people
most of their hosts behind NAT boxes; the same is true for believed that they would have no need to run servers behind a
countries that connected to the Internet only recently. With- NAT. Thus, this architectural constraint was viewed as a secu-
out NATs, the IPv4 address space would have been exhaust- rity feature and believed to have little impact on users or net-
ed a long time ago. work usage. As an example, the following four justifications
For reasons discussed later, the IETF did not standardize for the use of private addresses are quoted directly from RFC
NAT implementation or operations. However, despite the 1335 [3].
lack of standards, NATs were implemented by multiple ven- • In most networks, the majority of the traffic is confined to
dors, and the deployment spread like wildfire. This is because its local area networks. This is due to the nature of net-
NATs have several attractions, as we describe next. working applications and the bandwidth constraints on
inter-network links.
• The number of machines that act as Internet servers, that is,
Why NATs Succeeded run programs waiting to be called by machines in other net-
NATs started as a short term solution while waiting for a new works, is often limited and certainly much smaller than the
IP to be developed as the long-term solution. The first recog- total number of machines.
nized NAT advantages were stated in RFC 1918 [1]: • There are an increasingly large number of personal
machines entering the Internet. The use of these machines
With the described scheme many large enterprises will need is primarily limited to their local environment. They also
only a relatively small block of addresses from the globally can be used as clients such as ftp and telnet to access other
unique IP address space. The Internet at large benefits through machines.
conservation of globally unique address space, which will effec- • For security reasons, many large organizations, such as
tively lengthen the lifetime of the IP address space. The enterpris- banks, government departments, military institutions, and
es benefit from the increased flexibility provided by a relatively some companies, allow only a very limited number of their
large private address space. machines to have access to the global Internet. The majori-
ty of their machines are purely for internal use.
The last point deserves special emphasis. Indeed, anyone As time goes on, however, the above reasoning has largely
can use a large block of private IP addresses — up to 16 mil- been proven wrong.
lion without asking for permission — and then connect to the First, network bandwidth is no longer a fundamental con-
rest of the Internet by using only a single public IP address. A straint today. On the other hand, voice over IP (VoIP) has
big block of private IP addresses provides the much needed become a popular application over the past few years. VoIP
room for future growth. On the other hand, for most if not all changed the communication paradigm from client-server to a
user sites, it is often difficult to obtain an IP address block peer-to-peer model, meaning that any host may call any other
that is beyond their immediate requirements. host. Given the large number of Internet hosts that are
Today, NAT is believed to offer advantages well beyond behind NAT, several NAT traversal solutions have been
the above. Essentially, the mapping table of a NAT provides developed to support VoIP. A number of other recent peer-
one level of indirection between hosts behind the NAT and to-peer applications, such as BitTorrent, also have become
the global Internet. As the popular saying goes, “Any problem popular recently, and each must develop its own NAT traver-
in computer science can be solved with another layer of indi- sal solutions.
rection.” This one level of indirection means that one never In addition to the change of application patterns, a few
need worry about renumbering the internal network when other problems also arise due to the use of non-unique, pri-
vate IP addresses with NATs. For instance, a number of busi- addresses at the time. Furthermore, sticking to the architec-
ness acquisitions and mergers have run into situations where tural model in an absolute way also contributed to the one-
two networks behind NATs were required to be interconnect- sided view of the drawbacks of NATs, hence the lack of a full
ed, but unfortunately, they were running on the same private appreciation of the advantages of NATs as we discussed earli-
address block, resulting in address conflicts. Yet another er, let alone any effort to develop a NAT-traversal solution
problem emerged more recently. The largest allocated private that can minimize the impact of NATs on end-to-end reacha-
address block is 10.0.0.0/8, commonly referred to as net-10. bility.
The business growth of some provider and enterprise net- Yet another factor was that given that network address
works is leading to, or already has resulted in, the net-10 translation could be deployed unilaterally by a single party
address exhaustion. An open question facing these networks is alone, there was not an apparent need for standardization.
what to do next. One provider network migrated to IPv6; a This seemingly valid reasoning missed an important fact: a
number of others simply decided on their own to use another NAT box does not stand alone; rather it interacts both direct-
unallocated IP address block [15]. ly with surrounding IP devices, as well as indirectly with
It is also a common misperception that a NAT box makes remote devices through IP packet handling. The need for
an effective firewall. This may be due partly to the fact that in standardizing network address translation behavior has since
places where NAT is deployed, the firewall function often is been well recognized, and a great effort has been devoted to
implemented in the NAT box. A NAT box alone, however, developing NAT standards in recent years [16].
does not make an effective firewall, as evidenced by the fact Unfortunately the early misjudgment on NAT already has
that numerous home computers behind NAT boxes have been cost us dearly. While the big debate went on through the late
compromised and have been used as launch pads for spam or 1990s and early part of the first decade of this century, NAT
distributed denial of service (DDoS) attacks. Firewalls estab- deployment was widely rolled out, and the absence of a stan-
lish control policies on both incoming and outgoing packets to dard led to a number of different behaviors among various
minimize the chances of internal computers being compro- NAT products. A number of new Internet protocols also were
mised or abused. Making a firewall serve as a NAT box does developed or finalized during the same time period, such as
not make it more effective in fencing off malicious attacks; IPSec, Session Announcement Protocol (SAP), and Session
good control polices do. Initiation Protocol (SIP), to name a few. Their designs were
based on the original model of IP architecture, wherein IP
addresses are assumed to be globally unique and globally
Why the Opportunity of Standardizing NAT reachable. When those protocols became ready for deploy-
ment, they faced a world that was mismatched with their
Was Missed design. Not only were they required to solve the NAT traver-
During the decade following the deployment of NATs, a big sal problem, but the solutions also were required to deal with
debate arose in the IETF community regarding whether NAT a wide variety of NAT box behaviors.
should, or should not, be deployed. Due to its use of private Although NAT is accepted as a reality today, the lessons to
addresses, NAT moved away from the basic IP model of pro- learn from the past are yet to be clarified. One example is the
viding end-to-end reachability between hosts, thus represent- recent debate over Class-E address block usage [17]. Class-E
ing a fundamental departure from the original Internet refers to the IP address block 240.0.0.0/4 that has been on
architecture. This debate went on for years. As late as 2000, reserve until now. As such, many existing router and host
messages posted to the IETF mailing list by individual mem- implementations block the use of Class-E addresses. Putting
bers still argued that NAT was architecturally unsound and aside the issue of required router and host changes to enable
that the IETF should in no way endorse its use or develop- Class-E usage, the fundamental debate has been about
ment. Such a position was shared by many people during that whether this Class-E address block should go into the public
time. address allocation pool or into the collection of private
These days most people would accept the position that the address allocations. The latter would give those networks that
IETF should have standardized NAT early on. How did we face net-10 exhaustion a much bigger private address block to
miss the opportunity? A simple answer could be that the crys- use. However, this gain is also one of the main arguments
tal ball was cloudy. I believe that a little digging would reveal against it, as the size limitation of private addresses is consid-
a better understanding of the factors that clouded our eyes at ered a pressure to push those networks facing the limitation
the time. As I see it from my personal viewpoint, the follow- to migrate to IPv6, instead of staying with NAT. Such a desire
ing factors played a major role. sounds familiar; similar arguments were used against NAT
First, the feasibility of designing and deploying a brand new standardization in the past. However if the past is any indica-
IP was misjudged, as were the time and effort required for tion of the future, we know that pressures do not dictate new
such an undertaking. Those who were opposed to standardiz- protocol deployment; rather, economical feasibility does. This
ing NAT had hoped to develop a new IP in time to meet the statement does not imply that migrating to IPv6 brings no
needs of a growing Internet. Unfortunately, the calculation economical feasibility. On the contrary, it does, especially in
was way off. While the development of a new IP was taking its the long run. New efforts are being organized both in protocol
time, Internet growth did not wait. Network address transla- and tools development to smooth and ease the transition from
tion is simply an inevitable consequence that was not clearly IPv4 to IPv6 and in case studies and documentation to show
recognized at the time. clearly the short- and long-term gains from deploying IPv6.
Second, the community faced a difficult question regarding
how strictly one should stick to architectural principles, and
what can be acceptable engineering trade-offs. Architectural
Looking Back and Looking Forward
principles are guidelines for problem solving; they help guide The IPv4 address space exhaustion predicted long ago is final-
us toward developing better overall solutions. However, when ly upon us today, yet the IPv6 deployment is barely visible on
the direct end-to-end reachability model was interpreted as an the horizon. What can and should be done now to enable the
absolute rule, it ruled out network address translation as a Internet to grow along the best path forward? I hope this
feasible means to meet the instant high demand for IP review of NAT history helps shed some light on the answer.
First, we should recognize not only the fact that IPv4 net- local IPv6 unicast addresses (ULA) [20], another new type of
work address translation is widely deployed today, but also IP address. The debate over the exact meaning of ULA is still
recognize its perceived benefits to end users as we discussed going on.
in a previous section. We should have a full appraisal of the The original IP design clearly defined an IP address as
pros and cons of NAT boxes; the discussion in this article being globally unique and globally reachable and as identify-
merely serves as a starting point. ing an attachment point to the Internet. As the Internet con-
Second, it is likely that some forms of network address tinues to grow and evolve, recent years have witnessed an
translation boxes will be with us forever. Hopefully, a full almost universal deployment of middleboxes of various
appraisal of the pros and cons of network address translation types. NATs and firewalls are dominant among deployed
would help correct the view that all network address transla- middleboxes, though we also are seeing increasing numbers
tion approaches are a “bad thing” and must be avoided at all of SIP proxies and other proxies to enable peer-to-peer-
costs. Several years ago, an IPv4 to IPv6 transition scheme based applications. At the same time, proposals to change
called Network Address Translation-Protocol Translation the original IP address definition, or even redefine it entire-
(NAT-PT; see [18]) was developed but later classified to his- ly, continue to arise. What should be the definition, or defi-
torical status,1 mainly due to the concerns that: nitions, of an IP address today, especially in the face of
• NAT-PT works in much the same way as an IPv4 NAT box. various middleboxes? I believe an overall examination of the
• NAT-PT does not handle all the transition cases. role of the IP address in today’s changing architecture
However, in view of IPv4 NAT history, it seems worthwhile to deserves special attention at this critical time in the growth
revisit that decision. IPv4, together with IPv4 NAT, will be of the Internet.
with us for years to come. NAT-PT seems to offer a unique
value in bridging IPv4-only hosts and applications with IPv6- Acknowledgments
enabled hosts and networks. There also have been discussions I sincerely thank Mirjam Kuhne and Wendy Rickard for their
of the desire to perform address translations between IPv6 help with an earlier version of this article that was posted in
networks as a means to achieve several goals, including insu- the online IETF Journal of October 2007. I also thank the co-
lating one’s internal network from the outside. This question editors and reviewers of this special issue for their invaluable
of “Whither IPv6 NAT?” deserves further attention. Instead comments.
of repeating the mistakes with IPv4 NAT, the Internet would
be better off with well-engineered standards and operational References
guidelines for traversing IPv4 and IPv6 NATs that aim at [1] Y. Rekhter et al., “Address Allocation for Private Internets,” RFC 1918, 1996.
[2] D. Clark et al., “Towards the Future Internet Architecture,” RFC 1287, 1991.
maximizing interoperability. [3] Z. Wang and J. Crowcroft, “A Two-Tier Address Structure for the Internet: A
Furthermore, accepting the existence of network address Solution to the Problem of Address Space Exhaustion,” RFC 1335, 1992.
translation in today’s architecture does not mean we simply [4] J. Rosenberg et al., “STUN: Simple Traversal of User Datagram Protocol
take the existing NAT traversal solutions as given. Instead, we (UDP) through Network Address Translators (NATs),” RFC 3489, 2003.
[5] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays around
should fully explore the NAT traversal design space to steer NAT (TURN),” draft-ietf-behave-turn-08, 2008.
the solution development toward restoring the end-to-end [6] C. Huitema, “Teredo: Tunneling IPv6 over UDP through Network Address
reachability model in the original Internet architecture. A new Translations (NATs),” RFC 4380, 2006.
effort in this direction is the NAT traversal through tunneling [7] S. Kent and R. Atkinson, “Security Architecture for the Internet Protocol, RFC
2401, 1998.
(NATTT) project [19]. Contrary to most existing NAT traver- [8] J. Postel and J. Reynolds, File Transfer Protocol (FTP), RFC 959, 1985.
sal solutions that are server-based or protocol-specific, [9] J. Postel, Internet Protocol Specification, RFC 791, 1981.
NATTT aims to restore end-to-end reachability among Inter- [10] P. Tsuchiya and T. Eng, “Extending the IP Internet through Address Reuse,”
net hosts in the presence of NATs, by providing generic, ACM SIGCOMM Computer Commun. Review, Sept. 1993.
[11] K. Egevang and P. Francis, “The IP Network Address Translator (NAT),”
incrementally deployable NAT-traversal support for all appli- RFC 1631, 1994.
cations and protocols. [12] C. Huitema, “IAB Recommendation for an Intermediate Strategy to Address
Last, but not least, I believe it is important to understand the Issue of Scaling,” RFC 1481, 1993.
that successful network architectures can and should change [13] R. M. Hinden, “IP Next Generation Overview,” http://playground.sun.com/
ipv6/INET-IPng-Paper.html, 1995.
over time. All new systems start small. Once successful, they [14] L. Zhang, “An Overview of Multihoming and Open Issues in GSE,” IETF J.,
grow larger, often by multiple orders of magnitude as is the Sept. 2006.
case of the Internet. Such growth brings the system to an [15] L. Vegoda, “Used but Unallocated: Potentially Awkward /8 Assignments,”
entirely new environment that the original designers may not Internet Protocol J., Sept. 2007.
[16] http://www.ietf.org/html.charters/behave-charter.html; IETF BEHAVE Work-
have envisioned, together with a new set of requirements that ing Group develops requirements documents and best current practices to
must be met, hence the necessity for architectural adjust- enable NATs to function in a deterministic way, as well as advises on how to
ments. develop applications that discover and reliably function in environments with
To properly adjust a successful architecture, we must have the presence of NATs.
[17] http://www.ietf.org/mail-archive/web/int-area/current/msg01299.html;
a full understanding of the key building blocks of the architec- see the message dated 12/5/07 with subject line “240/4” and all the fol-
ture, as well as the potential impact of any changes to them. I low-up.
believe the IP address is this kind of key building block that [18] G. Tsirtsis and P. Srisuresh, “Network Address Translation-Protocol Transla-
touches, directly or indirectly, all other major components in tion (NAT-PT),” RFC 2766, 2000.
[19] E. Osterweil et al., “NAT Traversal through Tunneling (NATTT),” http://
the Internet architecture. The impact of IPv4 NAT, which www.cs.arizona.edu/˜bzhang/nat/
changed IP address semantics, provides ample evidence. Dur- [20] R. M. Hinden and B. Haberman, “Unique Local IPv6 Unicast Addresses,”
ing IPv6 development, much of the effort also involved a RFC 4193, 2005.
change in IP address semantics, such as the introduction of
new concepts like that of the site-local address. The site-local Biography
address was later abolished and partially replaced by unique LIXIA ZHANG (lixia@cs.ucla.edu) received her Ph.D. in computer science from the
Massachusetts Institute of Technology. She was a member of research staff at the
Xerox Palo Alto Research Center before joining the faculty of the UCLA Comput-
er Science Department in 1995. In the past she served as vice chair of ACM
1 Historical status means that a protocol is considered obsolete and is thus SIGCOMM and co-chair of the IEEE ComSoC Internet Technical Committee. She
removed from the Internet standard protocol set. is currently serving on the Internet Architecture Board.
Abstract
For a long time, traditional client-server communication was the predominant com-
munication paradigm of the Internet. Network address translation devices emerged
to help with the limited availability of IP addresses and were designed with the
hypothesis of asymmetric connection establishment in mind. But with the growing
success of peer-to-peer applications, this assumption is no longer true. Consequent-
ly network address translation traversal became a field of intensive research and
standardization for enabling efficient operation of new services. This article pro-
vides a comprehensive overview of NAT and introduces established NAT traversal
techniques. A new categorization of applications into four NAT traversal service
categories helps to determine applicable techniques for NAT traversal. The interac-
tive connectivity establishment framework is categorized, and a new framework is
introduced that addresses scenarios that are not supported by ICE. Current results
from a field test on NAT behavior and the success ratio of NAT traversal tech-
niques support the feasibility of this classification.
Port preservation source transport address of the packet. As long as the destina-
No port preservation tion transport address of a packet matches an existing state,
Port binding
Port overloading
the packet is forwarded. With Address Restricted Filtering, the
Port multiplexing
NAT forwards only packets coming from the same host
(matching IP address) to which the initial packet was sent.
Endpoint-independent Address and Port Restricted Filtering also compares the source
NAT binding Address- (port)-dependent
port of the inbound packet in addition to address restricted
Connection-dependent
filtering.
Independent
Endpoint filtering Address restricted NAT Traversal Problem
Address and port restricted
To work properly, the NAT must have access to the protocol
n Table 1. NAT behavior categories and possible NAT properties. headers at layers 3 and 4 (in case of a network address port
translation [NAPT]). Additionally, for every incoming packet,
the NAT must already have a state listed in its table. Other-
wise, it cannot find the related internal host to which the
issues. If an application works with one particular NAT, this packet belongs. According to RFC 3027 [8], the NAT traver-
does not imply that it always works in a NATed environment. sal problem can be separated into three categories, which are
Therefore, it is very important to understand and classify presented in this section. In addition to the three problems,
existing NAT implementations in order to design applications we identified Unsupported Protocols as a new category.
that can work in combination with current NATs. The classifi- The first problem occurs if a protocol uses Realm-Specific
cation in this article is mainly derived from simple traversal of IP Addresses in its payload. That is, if an application layer pro-
User Datagram Protocol (UDP) through NAT (STUN) [4], tocol such as the Session Initiation Protocol (SIP) uses a
whereas the address binding and mapping behavior follows transport address from the private realm within its payload
the terminology used in RFC 4787 [5]. This section covers signalizing where it expects a response. Because regular NATs
only topics that are required for the understanding of this do not operate above layer 4, application layer protocols typi-
article. A detailed discussion and further information (includ- cally fail in such scenarios. A possible solution is the use of an
ing test results) is given in [6] (for TCP) and [5] (for UDP). application layer gateway (ALG) that extends the functionali-
Binding covers “context based packet translation” [7], which ty of a NAT for specific protocols. However, an ALG sup-
describes the strategy the NAT uses to assign a public trans- ports only the application layer protocols that are specifically
port address (combination of IP address and port) to a new implemented and may fail when encryption is used.
state in the NAT. Filtering, or packet discard, shows how the The second category is P2P Applications. The traditional
NAT handles (or discards) packets trying to use an existing Internet consists of servers located in the public realm and
mapping. Table 1 shows the different categories and their pos- clients that actively establish connections to these servers.
sible properties. Port binding describes the strategy a NAT This structure is well suited for NATs because for every con-
uses for the assignment. With port preservation, the NAT nection attempt (e.g., a TCP SYN) coming from an internal
assigns an external port to a new connection; it attempts to client, the NAT can add a mapping to its table. But unlike
preserve the local port number if possible. Port overloading is client-server applications, a P2P connection can be initiated
problematic and rarely occurs. A new connection takes over by any of the peers regardless of their location. However, if a
the binding, and the old connection is dropped. Port multi- peer in the private realm tries to act as a traditional server
plexing is a very common strategy where ports are demulti- (e.g., listening for a connection on a socket), the NAT is
plexed based on the destination transport address. Incoming unaware of incoming connections and drops all packets. A
packets can now carry the same destination port and are dis- solution could be that the peer located in the private domain
tinguished by the source transport address. always establishes the connection. But what if two peers, both
NAT binding deals with the reuse of existing bindings. That behind a NAT, want to establish a connection to each other?
is, if an internal host closes a connection and establishes a Even if the security policy would allow the connection, it can-
new one from the same source port, NAT binding describes not be established.
the assignment strategy for the new connection. As shown in The third category is a combination of the first two. Bun-
Table 1, the NAT binding is organized into three categories. dled Session Applications, such as File Transfer Protocol
With Endpoint Independent, the external port is only depen- (FTP) or SIP/Session Description Protocol (SDP), carry
dent on the source transport address of the connection. As realm-specific IP addresses in their payload to establish an
long as a host establishes a connection from the same source additional session. The first session is usually referred to as
IP address and port, the mapping does not change. The the control session, whereas the newly created session is
assignment is dependent on the internal and the external called the data session. The problem here is not only the
transport address with the Address (Port) Dependent strategy. realm-specific IP addresses, but the fact that the data session
As long as consecutive connections from the same source to often is established from the public Internet toward the pri-
the same destination are established, the mapping does not vate host, a direction the NAT does not permit (e.g., active
change. As soon as we use a different destination, the NAT FTP).
changes the external port. With a Connection Dependent bind- Unsupported Protocols are typically newly developed trans-
ing, the NAT assigns a new port to every connection. We dis- port protocols such as the SCTP or the DCCP that cause
tinguish between NATs that increase the new port number by problems with NATs even if an internal host initiates the con-
a specific (and well predictable) delta and NATs that assign nection establishment. This is because current NATs do not
random port numbers to the new mappings. have built-in support for these protocols. The unsupported
Endpoint filtering describes how existing mappings can be protocols also cover protocols that cannot work with NATs
used by external hosts and how a NAT handles incoming con- because their layer 3 or layer 4 header is not available for
nection attempts that are not part of a response. Independent translation. This happens when using encryption protocols
Filtering allows inbound connections independent of the such as IPSec.
(1)
(1)
(2) (1) (1)
(2) (3) (2) (3)
(1) (2)
(3)
(3)
Requester Service Service
a) b) c) d)
n Figure 1. NAT traversal service categories for applications: a) RNT; b) GSP; c) SPPS; d) SSP.
NAT Traversal Service Categories creating the mapping in step 2, the service is accessible by any
host, depending on the selected NAT traversal technique and
Instead of classifying the NAT behavior (see classification in the filtering strategy of the NAT. SPPS supports all types of
STUN [4]), we defined four NAT traversal service categories, services where a one-to-one connection is sufficient and pre-
each making different assumptions about the purpose of the signaling is available.
connection establishment and the infrastructure that is avail- The last category, secure service provisioning (SSP), is an
able. Our categorization emphasizes that the applicability of extension of SPPS and addresses scenarios that require autho-
many NAT traversal techniques depends on the support of a rization of the remote party before initiating the NAT traver-
combination of requester, the responder, globally reachable sal process. The hereby established channel must be accessible
infrastructure nodes, and the role of the application. On the only by the authorized remote party. This requires additional
one hand, server applications set up a socket and wait for con- functionality that enforces this policy and only allows autho-
nections (which also applies to P2P applications). On the rized users to access the service. The policy enforcement can
other hand, client applications such as VoIP clients actively be done at the NAT itself, at a data relay, or at a firewall.
initiate a connection and wait for an answer on a different Table II depicts all four service categories with popular NAT
port (bundled session applications). Other applications work traversal techniques and shows the implications for automated
only across NATs if both ends participate in the connection NAT traversal and required signaling. First we distinguish
establishment (unsupported protocols). Thus, we differentiate between the service and the requester. “Support at the ser-
between supporting a service and supporting a client. In this vice” means, for example, that a framework must be deployed
article, the client is called the requester because it actively ini- at the same host providing the service. The same applies to
tiates a connection. the requester. “RP” means that a rendezvous point is required
The behavior of the NAT is important because it allows or for relaying data back and forth. “Signaling messages” means
prohibits certain NAT traversal techniques within one service that some sort of signaling protocol is used for NAT traversal.
category. If only one end implements NAT traversal support Again, we differentiate between signaling at the service and
(e.g., by running a stand-alone framework or by built-in NAT signaling at the requester. A rendezvous point for signaling
traversal functionality), NAT traversal techniques that rely on messages is required in case of pre-signaling. Finally, “stream
a collaboration of both ends (e.g., ICE) are not applicable. independent” describes the requirement for consecutive con-
Our first category, requester side NAT traversal (RNT), nections. For example, a port forwarding entry must be creat-
covers scenarios where only the requester side supports ed only once, whereas hole punching [13] requires sending a
NAT traversal (e.g., the application or the NAT itself). new hole punching packet for every new stream (with restrict-
RNT helps applications that actively participate in the con- ed filtering).
nection establishment and still suffer from the existence of Table 2 shows the main differences of our service cate-
NATs. Typical examples are applications that have prob- gories. RNT deals with bundled session applications that wait
lems with realm-specific IP addresses in their payload. This on a port after initiating a session (e.g., via a SIP INVITE).
applies to protocols using in-band signaling on the applica- GSP requires only support of the service and aims to make a
tion layer, which is related to bundled session applications service globally reachable for multiple clients. SPPS and SSP
with asymmetric connection establishment (e.g., VoIP using combine these categories and require support at both ends.
SIP/SDP). The requester initiates pre-signaling to exchange information
The second category, global service provisioning (GSP), about a global end point. The service then creates a mapping
assumes that the host providing the service implements NAT in the NAT that can be used by the client.
traversal support, helping to make a service globally accessi-
ble. This is done by creating and maintaining a NAT mapping
that then accepts multiple connections from previously Applicability of NAT Traversal Techniques for
unknown clients (Fig. 1). This is the main difference from
RNT, which only creates a NAT mapping for one particular
NAT Traversal Service Categories
session (e.g., one call in the case of VoIP). There are many different techniques for solving the NAT
The last two categories assume support at both ends, the traversal problem in specific scenarios, but none of them pro-
service and the requester. On the one side, NAT traversal is vides a solution that works well with all NATs, applications,
required to make a service behind a NAT globally accessible, and network topologies. Another article explains many of the
whereas on the other side, the support at the requester allows available protocols for NAT traversal [14] in general. This sec-
the use of sophisticated techniques through coordinated tion describes the applicability of existing techniques from the
action. Thus, service provisioning using pre-signaling (SPPS) applications point of view.
extends the GSP category by the assumption that both hosts RNT is required for protocols using in-band signaling (bun-
have interoperable frameworks (e.g., ICE [9]; NAT, URIs, dled session applications). Therefore, one common approach
Tunnels, SIP, and STUNT [NUTSS] [10]; NATBlaster [11]; or is to integrate RNT into these applications (e.g., the VoIP
NatTrav [12]) running. This allows a selection from all avail- client), to establish port bindings on the fly. One possibility is
able NAT traversal solutions, which leads to a high success the integration of a universal plug and play (UPnP) client.
rate of NAT traversal. In Fig. 1, the two hosts use a ren- Another option is to use ALGs that are integrated in the
dezvous point to agree on a NAT traversal technique. After NAT, interpreting in-band signaling and establishing map-
Hole punching —
GSP X X X X
independent filtering
Hole punching —
X X X X X X
independent binding
UPnP X X X X X X
SPPS
Closed/open data relay
X X X X X X
(e.g., TURN, Skype)
Hole punching —
X X X X X X
restricted filtering
n Table 2. Service categories and their implications for automated NAT traversal; RP denotes rendezvous point.
pings accordingly. ALGs are not a general solution because SSP is an extension to SPPS that allows only authorized
the NAT must implement the required logic for each proto- hosts to allocate and to use a mapping. Protocols that autho-
col, and end-to-end security prohibits the interpretation of the rize requests and assume control over the middlebox, such as
signaling by the NAT. middlebox communication (MIDCOM) [16] or the NAT/Fire-
GSP depends on NAT traversal techniques that allow unre- wall Next Step in Signaling (NSIS) Layer Protocol [17] qualify
stricted access to a public end point. A control protocol can for SSP. The advantage of NSIS is that it can discover and
be used to directly establish a port forwarding entry in the configure multiple middleboxes along the data path, thus sup-
mapping tables of the NAT, for instance, with UPnP [15]. porting complex scenarios with nested NATs and multipath
Port forwarding entries created by UPnP are easy to maintain routing. However, if one NAT on the path does not support
and work independently from NAT behavior. However, UPnP the protocol, NSIS fails. Using NSIS and MIDCOM for SSP
only works if the NAT is in the local network on the path to requires restrictive rules that allow only authorized clients to
the other end point. Thus, nested NATs are not allowed, and use the mapping, for instance, by opening pinholes for IP five-
path changes break the connectivity. tuples. UPnP is not useful for SSP because it forwards
Hole punching is an alternative if UPnP is not applicable inbound packets without considering the source transport
and works for NATs with an independent filtering strategy. address. Hole punching can be used only with SSP if the NAT
The mapping must be refreshed periodically, for instance, by implements a restricted filtering strategy. All cases discussed
sending keep-alive packets. For NATs other than full-cone, previously rely on additional measures to prohibit IP spoofing.
hole-punching for GSP cannot be used because the source The use of secure tunnels impedes IP spoofing and allows
port of the request is unknown in advance. secure NAT traversal, even for unsupported protocols (e.g.,
SSPS makes no assumption about the accessibility of a cre- IPSec, SCTP, DCCP). SSP also can be achieved by using
ated mapping, thus all possible techniques are applicable. Dif- traversal using relay NAT (TURN) with authentication,
ferent from GSP, hole-punching for SPPS works as long as authorization, and secure communication (e.g., via transport
port prediction is possible. For NATs implementing restricted layer security [TLS]).
filtering, pre-signaling helps to create the appropriate map- ICE [9] is under standardization by the IETF and strives to
ping because the five-tuple of the connection is exchanged. combine several techniques into a framework flexible enough
Pre-signaling also enables the establishment of an UDP tun- to work with all network topologies. Because ICE requires
nel, allowing the encapsulation of unsupported protocols. both peers to have an ICE implementation running, it can be
SPPS also can use UPnP to establish port forwarding entries seen as a technique for SPPS or SSP, depending on the acces-
for one session. sibility and the security policies of the public endpoint.
Secure Insecure
Secure Insecure endpoint endpoint
endpoint endpoint
The same is true for solutions such as TURN [18]. TURN ing NAT traversal support. With the session manager, ANTS
is a promising candidate for SPPS, because it provides a relay can provide GSP and RNT directly. Whenever an application
with a public transport address allowing the exchange of data is added and associated with GSP or RNT, the session manag-
packets between a TURN client and a public host. er calls the NAT traversal logic and asks to allocate an appro-
priate mapping in the NAT. This also requires ANTS to have
Why Unilateral Solutions Exist sufficient knowledge about the applicability of the integrated
Despite the great flexibility of SPPS and SSP, both categories techniques regarding the service categories. For example,
involve a number of assumptions that are not always satisfied. UPnP cannot be used for SSP because it violates the idea of
The most important one is the requirement for both ends an endpoint that is accessible only by authenticated hosts.
(and sometimes also the infrastructure), to support compati- Figure 2 shows a decision tree that ANTS uses to establish
ble versions of the NAT traversal framework. It remains to be a mapping in the NAT. First, we distinguish between requester
seen if the future will bring a sufficiently big deployment of initiated NAT traversal on the one hand and the access to a
one framework on which to rely for arbitrary applications. service on the other hand. Then, we must know which ends
The chances are better within homogeneous problem domains, actually implement ANTS. If both hosts have the framework
like telecommunication, where such frameworks can be inte- running, pre-signaling is possible, which leads to a wide choice
grated with the applications and be distributed in large num- of techniques depending on the security considerations of the
bers. For instance, the adoption of ICE is occurring mainly mapping. If only one end supports ANTS, only techniques
within the VoIP/SIP community and focusing on VoIP specific belonging to GSP or RNT are applicable.
use cases. These drawbacks are the reason why RNT and GSP Despite some unsolved issues such as the question of how
as unilateral solutions for the NAT traversal problems exist. It to connect legacy applications to ANTS (e.g., by using a
is easier to enhance an infrastructure under one responsibility library or a traversal of UDP through NAT [TUN]-based
than to rely on a solution that requires a global deployment. approach), the idea of a knowledge-based framework seems
However, unilateral solutions are limited to the middle-
boxes in the given domain. They fail to provide solutions
to scenarios with nested NATs and depend on the net- S. cat. Prot. Condition Suc. rate
work topology.
UDP (UPnP or HP-UDP) 90.27%
RNT
TCP (UPnP or HP-TCP) 77.84%
Coalescing Unilateral and Cooperative
Approaches for NAT Traversal UDP (Full Cone and HP-UDP) 27.03%
TCP (Full Cone and HP-TCP) 17.30%
When investigating existing NAT traversal techniques, we GSP
UDP (UPnP or (Full Cone and HP-UDP)) 50.27%
determined that none of them can be used in all scenar-
TCP (UPnP or (Full Cone and HP-TCP)) 44.32%
ios. For example, UPnP only supports globally accessible
end points, whereas ICE requires both hosts to run the
framework. In [19], we proposed a new framework that UDP (HP-UDP) 88.65%
aims toward providing an advanced NAT traversal service TCP (HP-TCP) 71.35%
(ANTS) supporting all four service categories. The con- TCP (HP-TCP or HP-UDP) 94.59%
SPPS
cept of ANTS is based on the idea of reusing previously UDP (UPnP or HP-UDP) 90.27%
obtained knowledge about the topology of the network TCP (UPnP or HP-TCP) 77.84%
and the capability of the NAT. A small component of TCP (UPnP or HP-TCP or HP-UDP) 95.14%
ANTS, the NAT tester, is responsible for gathering this
information and will be presented (together with some UDP (Restricted NAT and HP-UDP) 48.65%
test results) in the next section. SSP
TCP (Restricted NAT and HP-TCP) 38.38%
If a user decides that a particular application should be
reachable from the public Internet, he registers it at a ses- n Table 3. Results of the field test: success rates of NAT traversal tech-
sion manager that keeps track of all applications request- niques depending on service categories.
to be the right answer. Thus once implemented, ANTS can between support by service, client, and infrastructure and list-
help many existing services by integrating several techniques ed applicable NAT traversal techniques for each category.
and making its choice based on knowledge about the NAT Our findings from a field test showed that there are a number
and the requirements of the application. of prospective NAT traversal techniques that enable connec-
tivity for each NAT traversal service category. We emphasized
how to build upon this categorization to develop a knowledge-
Field Test on NAT Traversal based NAT traversal framework. Future frameworks that
To prove that existing techniques can be adapted to our ser- aspire to support the typical connectivity scenarios of current
vice categories, we implemented a NAT tester that acts as a applications should support all four service categories.
cornerstone for our new framework. This section presents the
results of a field test investigating 185 NATs in the wild. For a References
detailed description including all results, see our Web site: [1] K. Egevang and P. Francis, “The IP Network Address Translator (NAT),” IETF
http://nettest.net.in.tum.de. RFC 1631, May 1994.
[2] IETF, “Behavior Engineering for Hindrance Avoidance (behave);”
The first test queries a public STUN server to determine http://www.ietf.org
the type of the NAT. Afterward, the NAT tester performs the [3] P. Srisuresh and M. Holdrege, “IP Network Address Translator (NAT) Termi-
following connection tests and tries to establish a connection nology and Considerations,” IETF RFC 2663, Aug. 1999.
to the host behind the NAT: UPnP, hole punching, and con- [4] J. Rosenberg et al., “STUN: Simple Traversal of User Datagram Protocol (UDP)
through Network Address Translators (NATs),” IETF RFC 3489, Mar. 2003.
necting to a data relay (each for both protocols, UDP and [5] E. F. Audet and C. Jennings, “NAT Behavioral Requirements for Unicast
TCP) (Table 3). UDP,” IETF RFC 4787, Jan. 2007.
We then adapted the test results to our work and evaluated [6] S. Guha and P. Francis, “Characterization and Measurement of TCP Traver-
the success rates of the individual techniques regarding our sal through NATs and Firewalls,” Proc. ACM Internet Measurement Conf.,
Berkeley, CA, Oct. 2005.
defined service categories. Table III shows the categories and [7] G. Huston, “Anatomy: A Look Inside Network Address Translators,” The
the conditions that must be met according to the considera- Internet Protocol J., vol. 7, 2004, pp. 2–32.
tions made previously. For example, GSP requires the use of [8] M. Holdrege and P. Srisuresh, “Protocol Complications with the IP Network
UPnP or hole punching support in combination with a full- Address Translator,” IETF RFC 3027, Jan. 2001.
[9] J. Rosenberg, “Interactive Connectivity Establishment (ICE): A Protocol for
cone NAT to make a service globally accessible. Therefore, Network Address Translator (NAT) Traversal for Offer/Answer Protocols,”
50.27 percent of our tested NATs supported a direct connec- IETF Internet draft, work in progress, Oct. 2007.
tion for UDP and category GSP (44.32 percent for TCP). In [10] P. Francis, S. Guha, and Y. Takeda, “NUTSS: A SIP-based Approach to
all other cases (the remaining percentages), an external relay UDP and TCP Network Connectivity,” Cornell Univ., Panasonic Commun.,
tech. rep., 2004.
must be used to provide GSP. [11] A. Biggadike et al., “NATBLASTER: Establishing TCP Connections between
For SPPS, which makes no security assumptions, we divided Hosts behind NATs,” ACM SIGCOMM Asia Wksp., Beijing, China, 2005.
our results into two categories. First we determined the suc- [12] J. Eppinger, “TCP Connections for P2P Applications — A Software
cess rates without considering UPnP. With 88.65 percent of all Approach to Solving the NAT Problem,” Carnegie Mellon Univ., Pittsburgh,
PA, tech. rep., 2005.
NATs, we were able to establish a direct connection to the [13] B. Ford, P. Srisuresh, and D. Kegel, “Peer-to-Peer Communication across
host behind the NAT (71.35 percent for TCP). This rate Network Address Translation,” MIT, tech. rep., 2005.
increased slightly (for TCP to 77.84 percent) when UPnP was [14] H. Khlifi, J. Gregoire, and J. Phillips, “VoIP and NAT/Firewalls: Issues, Traversal
an option. The highest success rate for TCP NAT traversal Techniques, and a Real-World Solution,” IEEE Commun. Mag., July 2006.
[15] U. Forum, “Internet Gateway Device (IGD) Standardized Device Control
(95.14 percent) was discovered when we also allowed the tun- Protocol,” Nov. 2001.
neling of TCP packets through UDP. [16] P. Srisuresh et al., “Middlebox Communication Architecture and Frame-
SSP allows only authorized hosts to create and to use a work,” IETF RFC 3303, Aug. 2002.
mapping. Therefore, a suitable technique for SSP is hole [17] M. Stiemerling et al., “NAT/Firewall NSIS Signaling Layer Protocol (NSLP),”
IETF Internet draft, Feb. 2008.
punching in combination with a NAT implementing a restrict- [18] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays around
ed filtering strategy. This was supported by 48.65 percent for NAT (TURN),” IETF Internet draft, work in progress, June 2008.
UDP and 38.38 percent for TCP. [19] A. Müller, A. Klenk, and G. Carle, “On the Applicability of Knowledge-
The success rate for RNT depends on the effort that is Based NAT-Traversal for Future Home Networks,” Proc. IFIP Networking
2008, Springer, Singapore, May 2008.
made for the specific protocol. For example, if we assume that
we can inspect each signaling packet on the application layer
thoroughly, we could adopt the results from SPPS to RNT. If Biographies
we would only modify the packets in a way that the internal ANDREAS MÜLLER (mueller@net.in.tum.de) received his diploma degree in comput-
er science from the University of Tübingen, Germany in 2007. Currently, he is a
port is reachable by any client, the success rate of GSP would research assistant and Ph.D. candidate at the Network Architecture and Services
apply to RNT. Finally, we did not measure the effect of NATs Department at the Technical University of Munich. His research interests include
with integrated ALGs in this field test. middleboxes, P2P systems, and autonomic networking.
Modeling Middleboxes
Dilip Joseph and Ion Stoica, University of California at Berkeley
Abstract
The lack of a concise and standard language to describe diverse middlebox func-
tionality and deployment configurations adversely affects current middlebox deploy-
ment, as well as middlebox-related research. To alleviate this problem, we present
a simple middlebox model that succinctly describes how different middleboxes pro-
cess packets and illustrate it by representing four common middleboxes. We set up
a pilot online repository of middlebox models and prototyped model inference and
validation tools.
set(A, key → val) Stores the specified key-value pair in zone A’s state database
S : get?(A, key) Returns true and assigns val to S if key → val is present in zone A’s state database
Internet and zone srvr representing the Web server farm. The grained measurement can cause discrepancies between the
load balancer spreads out packets received at zone inet to model predicted behavior of a middlebox and its actual opera-
Web server instances in zone srvr. tions. A middlebox behavior is predicted by the model. So the
We assume that the mapping between interfaces and zones model predicted behavior of a middlebox may be better than
is pre-determined by the middlebox vendor or configured dur- its actual operations. As we illustrate in the next section, we
ing middlebox initialization. Frames reaching an interface use special processing rules to flag such possible discrepancies.
belonging to multiple zones are distinguished by their virtual
local area network (VLAN) tags, IP addresses, and/or trans- Processing Rules
port port numbers. Processing rules model the core functionality of a middlebox.
A processing rule specifies the action taken by a middlebox
Input Preconditions when a particular condition becomes true. For example, the
Input preconditions specify the types of packets that are processing of an incoming packet is represented by a rule of
accepted by a middlebox for processing. For example, a trans- the general form:
parent firewall processes all packets received by it, whereas a
load balancer in a single-legged configuration processes a Z(A, p) ∧ I (P, p) ∧ C (p) ⇒ Z (B, T (p)) ∧ state ops
packet arriving at its inet zone only if the packet is explicitly
addressed to it at layers 2, 3, and 4. Similarly, a NAT process- The above rule indicates that a packet p reaching zone A of
es all packets received at its int zone, but requires those the middlebox is transformed to T(p) and emitted out through
received at its ext zone to be addressed to it at layers 2 and 3. zone B, if it satisfies the input precondition I(P, p) and a mid-
Input pre-conditions are represented using a clause of the dlebox-specific condition C(p). In addition, the middlebox may
form I (P, p), which is true if the headers and contents of pack- update state associated with the TCP flow or application ses-
et p match the pattern P. For example, the firewall has the input sion to which the packet belongs. We now present concrete
precondition I (< * >, p), and the load balancer has I (< dm = examples of processing rules for common middleboxes.
MAC LB , di = IP LB , dp = 80 >, p) for its inet zone, where
MAC LB and IP LB are the layer-2 and layer-3 addresses of the Firewall — First, consider a simple stateless layer-4 firewall
load balancer. Although I (< * >, p) is a tautology, we still that either drops a packet received on its red zone or relays it
explicitly specify it in the firewall model to enhance model clari- unmodified to the green zone. This behavior can be repre-
ty. sented using the following two rules:
Red Green
zone zone
(a) Insecure (iv), is keyed by [h.si, h.sp, h.di, h.dp] rather than by just
Secure
external internal [h.si, h.sp]. A symmetric NAT is also more restrictive than a
network network full cone NAT. It relays a packet with header [IP s , IP NAT ,
PORTs, PORTd] from the ext zone only if it had earlier received a
packet destined to IPs : PORTs at the int zone and had rewritten
External Internal its source port to PORTd. This restrictive behavior is captured by
zone zone keying the zone ext state set in rule (i) and retrieved in rules (iii)
(b)
and (v) with [h.di, h.dp, newport] rather than with just new-
External port. Other NAT types, like restricted cone and port restricted
network NAT Internal
network cone, can be easily represented with similar minor modifications.
Web servers
Rule (i) describes how the load balancer processes the first
(c) packet of a new flow received at its inet zone. The load bal-
ancer dynamically selects a Web server instance Wi for the flow
and records it in the state database of the inet zone. It
rewrites the destination IP and MAC addresses of the packet to
Wi using the destination NAT (DNATfwd) transformation func-
Internet Switch tion and then emits it out through the srvr zone. It also
records this flow in the state database of the srvr zone, keyed
n Figure 1. Zones of different middleboxes: a) firewall; b) NAT; by the five-tuple of the packet expected there in the reverse flow
and c) load balancer in single-legged configuration. direction. Rule (ii) specifies that subsequent packets of the flow
simply will be emitted out after rewriting the destination IP and
MAC addresses to those of the recorded Web server instance.
models like the Unified Firewall Model [2] to construct the Rule (iii) describes how the load balancer processes a packet
appropriate C clauses. Rules for packets in the green → red received from a Web server. It verifies the existence of flow state
direction are similar. for the packet and then emits it out through the inet zone
after applying the reverse DNAT transformation — that is,
NAT — Next, consider another very common middlebox — a rewriting the source IP and MAC addresses to those of the load
NAT. Unlike the firewall in the previous example, a NAT balancer and the destination MAC to the next hop IP gateway.
rewrites packet headers and maintains per-flow state. We first Although the Web server instance selection mechanism is
describe the processing rules (rule box 1) for a full cone NAT beyond the scope of our general model, the load balancer model
and then, with minor modifications, change it to represent a easily can be augmented with primitives to represent common
symmetric NAT. selection mechanisms like least loaded and round robin. In the
Rule (i) describes how a full cone NAT processes a packet previous example, we assumed that the load balancer was set as
[hd] with a previously unseen [si, sp] pair received at its int the default IP gateway at each Web server. Other load balancer
zone. It allocates a new port number using a standard mecha- deployment configurations (e.g., direct server return or source
nism like random or sequential selection, or using a custom NAT) can be represented with minor modifications.
mechanism beyond the scope of our general model. It stores
[si, sp] → newport and newport → [si, sp] in the state Layer-7 Load Balancer — We now present our most complex
databases of zone int and zone ext, respectively. It rewrites example, a layer-7 SSL offload-capable load balancer. This
the packet header h by applying the source NAT (SNATfwd) example illustrates how our model describes a middlebox
transformation function — the source medium access control whose processing spans both packet headers and contents and
(MAC) and IP addresses are replaced with the publicly visible is not restricted to one-to-one packet transformations. The
addresses of the NAT, the source port with the newly allocated layer-7 load balancer is the end point of the TCP connection
port number, and the destination MAC with the next hop IP from a client (the CL connection). Because accurately model-
gateway of the NAT. The packet with the rewritten header and ing TCP is very hard, we abstract it using a black box TCP
unmodified payload is then emitted out through the ext zone. state machine tcp CL and buffer the data received from the
Rule (ii) specifies that the NAT emits a packet with a previously client in a byte queue DCL. The I clauses are similar to those in
seen [si, sp] pair through zone ext, after applying SNATfwd the layer-4 load balancer and hence not repeated in rule box 3.
with the port number recorded in rule (i). Rule (iii) describes Rule (i) specifies that the load balancer creates tcpCL and
how the NAT processes a packet reaching the ext zone. It DCL and records them along with the packet header on receiv-
retrieves the newport → [si, sp] state recorded in rule (i) ing the first packet of a new flow from a client at the inet
using the destination port number of the packet, applies the zone. Rule (ii) specifies how the TCP state and data queue of
reverse source NAT transformation function(SNATrev), and the CL connection are updated as the packets of an existing
then emits the modified packet through zone int. Rule (iv) and flow arrive from the client. Rule (iii), triggered when tcpCL
Rule (v) flag discrepancies resulting from the inaccuracy of the has data or acknowledgments to send, specifies that packets
model in tracking state expiration. The NAT may drop a packet from the load balancer to the client will have header hrev CL
arriving at its int or ext zone because the state associated (with appropriate sequence numbers filled in by tcpCL) and
with the packet expired without the knowledge of the model. payload read from the DLS queue, if it was already created by
Unlike a full cone NAT, a symmetric NAT allocates a sepa- the firing of rule (iv). Rule (iv), triggered when the data col-
rate port for each [si, sp, di, dp] tuple seen at its int zone, lected in D CL is sufficient to parse the HTTP request URL
rather than for each [si, sp] pair. Thus, for a symmetric NAT, and/or cookies, specifies that the load balancer selects a Web
the zone int state set in rule (i) and retrieved in rules (ii) and server instance Wi and opens a TCP connection to it, that is,
Internet Server
zone zone rower and more detailed focus on how middle-
boxes operate. Reference [10] uses detailed mea-
surement techniques to evaluate the performance
and reliability of production middlebox deploy-
ments. We plan to investigate how the techniques
described in these papers can enhance our model
Control
inference and validation tools.
packet Observe RFC 3234 [1] presents a taxonomy of middle-
sending boxes. Our model goes well beyond a taxonomy
and describes middlebox packet processing in
Model inference tool more detail using a concise and standard lan-
guage. In addition, our model can naturally induce
n Figure 2. Middlebox model inference tool analyzing a load balancer.
a more fine-grained taxonomy on middleboxes
(e.g., “middleboxes that rewrite the destination IP
and port number” versus “middleboxes operating
at the transport layer”). Our model does not cur-
box-specific models like the Unified Firewall Model as rently consider the middlebox failover modes and functional ver-
described earlier, although at the expense of reducing model sus optimizing roles identified by RFC 3234.
simplicity and conciseness. The desire for simplicity and con- The Unified Firewall Model [2] and IETF BEHAVE [3]
ciseness also limits our model from capturing accurate timing working group characterize the functionality and behavior of
and causality between triggering of different processing rules. specific middleboxes — firewalls and NATs in this case. Guid-
On the other hand, our model may not be general enough ed by these efforts, we construct a general model that applies
to describe all possible current and future middleboxes. to a wide range of middleboxes and enables us to compare
Although we represented many common middleboxes in our different middleboxes and study their interactions. Further-
model and are not aware of any existing middleboxes that more, these specific models can be plugged into our general
cannot be represented, we are unable to formally prove that model and alleviate the limitations of model generality.
our model covers all possible middleboxes.
The model for a particular middlebox consists of a small
number (typically < 10) of processing rules. However, con-
Conclusion
structing the model itself is a non-trivial task even with support In this article, we presented a simple middlebox model and
from our model inference and validation tools. We expect illustrated how various commonly used middleboxes can be
models to be constructed by experts and shared through an described by it. The model guides middlebox-related research
online model repository, thus making them easily available to and aids middlebox deployments. Our work is only an initial
all, without requiring widespread model construction skills. step in this direction and calls for the support of the middle-
box research and user communities to further refine the
model and to contribute model instances for the many differ-
Related Work ent kinds of middleboxes that exist today.
The middlebox model described in this article is placed at an inter-
mediate level in between related work on very general network References
[1] “Middleboxes: Taxonomy and Issues,” RFC 3234.
communications models and very specific middlebox models. [2] G. J. Nalepa, “A Unified Firewall Model for Web Security,” Advances in
An axiomatic basis for communication [7] presents a general Intelligent Web Mastering.
network communications model that axiomatically formulates [3] “Behavior Engineering for Hindrance Avoidance”; http://www.ietf.org/
packet forwarding, naming, and addressing. This article presents html.charters/behave-charter.html
[4] D. Joseph, A. Tavakoli, and I. Stoica, “A Policy-Aware Switching Layer for
a model tailored to represent middlebox functionality and oper- Data Centers,” Proc. SIGCOMM, 2008.
ations. The processing rules and state database in our model are [5] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed Automated Random
similar to the forwarding primitives and local switching table in Testing,” Proc. PLDI, 2005.
[7]. As part of future work, we plan to investigate the integra- [6] M. Walfish et al., “Middleboxes No Longer Considered Harmful,” Proc.
OSDI, 2004.
tion of the two models and thus combine the practical benefits [7] M. Karsten et al., “An Axiomatic Basis for Communication,” Proc. SIG-
of our middlebox model (e.g., middlebox model inference and COMM ’07.
validation tools, model repository) and the theoretical benefits [8] T. Roscoe et al., “Predicate Routing: Enabling Controlled Networking,” SIG-
of the general communications model (e.g., formal validation of COMM Comp. Commun. Rev., vol. 33, no. 1, 2003.
[9] S. Kandula, R. Chandra, and D. Katabi, “What’s Going On? Learning Com-
packet forwarding correctness through chains of middleboxes). munication Rules in Edge Networks,” Proc. SIGCOMM, 2008.
Predicate routing [8] attempts to unify security and routing [10] M. Allman, “On the Performance of Middleboxes,” Proc. IMC, 2003.
by declaratively specifying network state as a set of Boolean
expressions dictating the packets that can appear on various Biographies
links connecting together end nodes and routers. This DILIP JOSEPH (dilip@cs.berkeley.edu) received his B.Tech. degree in computer science
from the Indian Institute of Technology, Madras, in 2004 and his M.S. degree in
approach can be extended to represent a subset of our mid- computer science from the University of California at Berkeley in 2006. He is current-
dlebox model. For example, Boolean expressions on the ports ly a Ph.D. candidate at the University of California at Berkeley. His research interests
and links (as defined by predicate routing) of a middlebox can include data center networking, middleboxes, and new Internet architectures.
specify the input preconditions of our model and indirectly
ION STOICA (istoica@cs.berkeley.edu) received his Ph.D. from Carnegie Mellon
hint at the processing rules and transformation functions. University in 2000. He is an associate professor in the EECS Department at the
From a different perspective, middlebox models from our University of California at Berkeley, where he does research on peer-to-peer net-
repository can aid the definition of the Boolean expressions in work technologies in the Internet, resource management, and network architec-
a network implementing predicate routing. tures. He is the recipient of the 2007 Rising Star Award, a Sloan Foundation
Fellowship (2003), a Presidential Early Career Award for Scientists and Engi-
Reference [9] uses statistical rule mining to automatically neers (PECASE) (2002), and the ACM doctoral dissertation award (2001). In
group together commonly occurring flows and learn the under- 2006 he co-founded Conviva, a startup company to commercialize peer-to-peer
lying communication rules in a network. Our work has a nar- technology for video distribution.
Abstract
Network address translation is widely deployed in the Internet and supports the
Transmission Control Protocol and the User Datagram Protocol as transport layer
protocols. Although part of the kernels of all recent Linux distributions, namely, the
FreeBSD 7 and the Solaris 10 operating systems, the new Internet Engineering
Task Force transport protocol — Stream Control Transmission Protocol — is not
supported on most NAT middleboxes yet. This article discusses the deficiencies of
using existing NAT methods for SCTP and describes a new SCTP-specific NAT con-
cept. This concept is analyzed in detail for several important network scenarios,
including peer-to-peer, transport layer mobility, and multihoming.
10.1.0.1:52001
UDP-Based Tunneling
Currently, most NAT middleboxes support only
100.4.5.1:8080 protocols running on top of TCP or UDP. A stan-
120.10.2.1 dard technique for all other protocols is to encap-
10.1.0.2:52002 sulate these packets into UDP instead of IP.
Internet Because both UDP and IP provide an unreliable
packet delivery service, this is feasible. This also
120.10.2.1:52001 => 100.4.5.1:8080 works for SCTP, as described in [3], and is cur-
120.10.2.1:52002 => 100.4.5.1:8080 rently implemented in the SCTP kernel extension
120.10.2.1:52003 => 100.4.5.1:8080 for Mac OS X.
10.1.0.3:52003
It should be noted that NAT middleboxes on
different paths are not synchronized, and there-
fore, the UDP port number might be different on
n Figure 2. Using basic NAT. different paths.
One drawback of using UDP encapsulation is
that Internet Control Message Protocol (ICMP)
port number to a global IP address and port number in the messages might not contain enough information to be pro-
TCP or UDP header, respectively. This method is called cessed by the SCTP layer. Another drawback is that the sim-
NAPT. Thereby, the NAT middlebox chooses the port num- ple peer-to-peer solution described in the sections about
bers from a pool and makes sure that no two connections to peer-to-peer communication and multihoming with a ren-
the same server obtain the same port numbers. dezvous server does not work because the UDP port numbers
As the transport layer checksum of the TCP and UDP might be changed by NAT-middleboxes.
packets covers the transport header that includes the port Tunneling SCTP over UDP must handle the same prob-
numbers, it must be modified according to the port number lems as any other UDP-based communication for NAT traver-
change. However, the checksum used for TCP or UDP has sal. However, this is the only possibility for SCTP-based
the property that the change of the checksum can be comput- communication through a NAT middlebox without modifying
ed only from the change of the port numbers. So this can be it to add SCTP support.
done very efficiently by a simple set of additions and subtrac-
tions.
It should be noted that the behavior of NAT middleboxes
An SCTP-Specific Variant of NAT
varies dramatically because there were no standards describ- In the NAPT method described previously, the NAT middle-
ing how to build them. The Behavior Engineering for Hin- box controls the 16-bit source port number of outgoing TCP
drance Avoidance (BEHAVE) working group of the IETF connections to distinguish multiple TCP connections of all
develops best current practice (BCP) documents giving clients behind the NAT middlebox to the same server. The
requirements for NAT middlebox behavior and protocols to basic idea for the SCTP-specific method is instead to use the
help applications to run over networks with NAT middlebox- combination of the source port number and the verification
es. tag. For single-homed hosts, this method is described in [2].
Considering only single-homed SCTP clients and servers, it If NAT middleboxes use the verification tags together with
is also possible to use this NAPT concept for SCTP because it the addresses and the port numbers to identify an association,
has the same port number concept as TCP and UDP. Howev- the probability that two hosts end up with the same combina-
er, the transport layer checksum used by SCTP is different tion decreases to a tolerable level.
from the one used by UDP and TCP. This checksum does not
allow the computing of the checksum change based only on A Simple Association Setup
the port number change. Therefore, the NAT middlebox must The main task of a NAT middlebox is to substitute the source
compute the new SCTP checksum again, based on the com- address of each packet with the public address used by the
plete SCTP packet. This requires a substantial amount of NAT middlebox and to keep the corresponding IP addresses
computing power that might be reduced when the computa- in a table. First, we consider an association setup between a
tion is performed directly by hardware. single-homed client and a single-homed server. Neither the
For multihomed SCTP clients and servers, reusing the INIT nor the INIT-ACK chunk contain any IP addresses. This
techniques from TCP and UDP becomes much harder. As leads to a scheme as described in Fig. 3.
we mentioned earlier, hosts can be multihomed, which In the first message of the handshake, the verification tag
means that they can simultaneously use multiple network in the common header must be set to 0, but the initiate tag
addresses and thus can be attached to multiple networks. (initTag) in the INIT chunk holds a 32-bit random number
Therefore, the traffic of one SCTP association, in general, that is supposed to be the verification tag (VTag) of the
passes through different NAT middleboxes on different incoming packets. Hence, at the beginning of the handshake,
paths. Because each SCTP end point can use only one only one verification tag is known. The NAT middlebox keeps
SCTP port number on all paths, the NAT middleboxes track of this information and takes the local private address
cannot change the port number independently. To apply (Local-Address) and the officially registered destination IP
the existing NAT concept, the NAT middleboxes involved address (Global-Address) from the IP header of the SCTP
would have to synchronize the port numbers to assign a packet and saves them in the NAT table (Fig. 3). The local
common number for the association. This is very hard to source port (Local-Port) and the destination port (Global-
achieve. Port) are obtained the same way.
Based on this discussion, it seems desirable to use a NAT The initiate tag of the INIT chunk, which the client has
mechanism for SCTP that does not require a change to the chosen for its communication, is also extracted from the INIT
SCTP header at all and hence to the port numbers, which chunk header and saved as Local-VTag. The Global-VTag
avoids synchronization among NAT middleboxes and the that eventually will be chosen by the communication partner
recomputation of the SCTP checksum. is not known yet. Before forwarding the packet, the NAT mid-
n Figure 3. Four-way handshake for the SCTP association setup with NAT table.
dlebox exchanges the source address of the IP header with the address, an entry to the NAT table is made for that address.
NAT address (Nat-Global-Address) and sends the packet Because both verification tags must be added, a parameter
toward the other end point. must be included in the ASCONF chunk that contains the
The other SCTP end point receiving the packet containing verification tag that is not present in the common header.
the INIT chunk answers the request with a message contain-
ing the INIT-ACK chunk. This message is addressed to the Behavior of the SCTP End Points
NAT-Global-Address and the Local-Port. Its verification tag Because multiple clients behind the NAT middlebox might
in the common header must be identical to the initiate tag of choose the same local port when connecting to the same serv-
the INIT chunk, whereas the initiate tag of the INIT-ACK er, the restart procedure would result in a loss of an SCTP
chunk will be used as the verification tag for all packets that association. Therefore, the INIT chunk sent by the clients
are sent by the initiating end point (client 10.1.0.1 in the fig- should contain a parameter indicating that the server should
ure) of the association. For an incoming INIT-ACK chunk, not follow the restart procedure. Instead it should use the ver-
the NAT middlebox searches the table entries for the corre- ification tag to distinguish between the associations. This is
sponding combination of Local-Port, Global-Address, Global- what most SCTP implementations already do.
Port, and the Local-VTag and adds the Global-VTag. Thus, Furthermore, the SCTP end points must not include non-
after the reception of the INIT-ACK chunk, both verification global addresses in the INIT or INIT-ACK chunk.
tags are known. Now the NAT middlebox sets the destination If an SCTP end point is multihomed and has non-global
address to the Local-Address found in the table entry and addresses, it should set up the association single-homed and
delivers the packet. To complete the handshake, a packet with then add the other addresses after the association has been
a COOKIE-ECHO chunk is sent that is acknowledged with a established by sending an SCTP packet containing an
message containing a COOKIE-ACK chunk. ASCONF chunk for each address. To add such an address,
the ASCONF should contain only the wildcard address and
NAT Table the parameter providing the required verification tag. The
The NAT table consists of several entries. Each entry is a source address of the packet containing the ASCONF chunk
tuple consisting of: will be added to the association.
1) Local-Address To remove an address, an ASCONF chunk is sent with the
2) Global-Address wildcard address. Then, all addresses except the source
3) Local-Port address of the packet containing the ASCONF chunk are
4) Global-Port deleted from the association.
5) Local-VTag
6) Global-VTag Communication between the NAT Middleboxes and
In addition to the procedure to modify the table given in the the SCTP End Points
next subsection, a timer must be used to remove entries that
have not been used for a certain amount of time. This time If a NAT middlebox receives an INIT chunk that would result
should be long enough such that the SCTP path supervision in adding an entry to the NAT table that conflicts with an
procedure prevents the table entries from timing out. already existing entry, it should not insert this entry and may
send an ABORT chunk back to the SCTP end point. In the
Modifications to the NAT Table ABORT chunk, an M-bit should be set that indicates that it
The basic procedure for handling INIT and INIT-ACK chunks has been generated by a middlebox. This happens if two dif-
was described previously. If the INIT or INIT-ACK chunk ferent clients choose the same local port number and initiate
contains a list of addresses, then for each address in the list, tag and try to connect to the same server. On reception of
an entry is added to the table. such an ABORT chunk, the end point can try to choose a dif-
If an ASCONF chunk is received to add the wildcard ferent initiate tag and try setting up the association again.
Server
100.4.5.1:8080
100.5.5.1:8080
Client Router 1
10.1.0.1:52001
NAT
Internet
Router 2
n Figure 4. Building the NAT table for the single-homed client with a multihomed server.
If the NAT middlebox receives an SCTP packet that cannot Both the ERROR chunk and the ABORT chunk must
be processed because there is no entry in the NAT table, the have an M-bit indicating that the packet containing the chunk
NAT middlebox should discard the packet and can send back is generated by a middlebox instead of the peer.
an ERROR chunk. An M-bit must be set to indicate that the Two additional error causes are introduced, one to be
chunk is generated by a middlebox, and an error cause should included in the ERROR chunk to indicate that the NAT mid-
indicate that the NAT middlebox does not have the required dlebox misses some state, and one to be included in the
information to process the packet. On reception of such an ABORT chunk to indicate a conflict in the NAT table.
ERROR chunk, the end point should use an ASCONF chunk
to provide the required information to the NAT middlebox. Examples
This section provides a detailed discussion of several network
New SCTP Protocol Elements scenarios involving NAT middleboxes. The proposed NAT
Clients require a new parameter to be included in the INIT mechanisms were verified in all these scenarios using an
chunk to indicate that they will use the procedures described SCTP simulation in the INET framework for the OMNeT++
in this article. This parameter also is included in the INIT- simulation kernel described in [10].
ACK chunk to indicate that the receiver also supports it. Furthermore, a group of the Center for Advanced Internet
Another new parameter is required that can contain a verifi- Architecture at Swinburne University is implementing this
cation tag and is included in an ASCONF chunk. method for the FreeBSD operating system. This project,
Server
100.4.5.1:8080
Client
10.1.0.1:52001 Router
NAT
Internet
new NAT
ERROR: 100.4.5.1:8080=>120.10.2.1:52001
100.4.5.1=>10.1.0.1 Cause: NAT state missing
ASCONF: 120.10.2.1:52001=>100.4.5.1:8080
10.1.0.1=>100.4.5.1 Vtag: 12345 140.1.1.1=>100.4.5.1
ASCONF-ACK:
100.4.5.1=>10.1.0.1 100.4.5.1:8080=>120.10.2.1:52001 100.4.5.1=>140.1.1.1
Rendezvous server
This chunk provides the required information to the NAT numbers. This avoids the requirement of changing the port
middlebox, NAT 2. numbers and possibly synchronizing them between different
NAT middleboxes. A feature of dynamic address reconfigura-
Multihomed Transport Layer Mobility tion can be used to avoid having IP addresses in the transport
Previously, we discussed the procedure for a case when a layer, which is problematic for the processing in NAT middle-
client moves and hence changes its source address and the boxes. For peer-to-peer communications, it is helpful if the
corresponding NAT middlebox as well. During the transition transport layer supports simultaneous connection setups.
from one cell to another in a host mobility scenario, there is Finally, it might be preferable to use simple algorithms involv-
likely to be a zone where both cells are active, and thus, two ing random numbers with a small chance of collision instead
addresses can be in use. Adding the new address results in a of more complex deterministic algorithms without collision.
temporarily multihomed client. We propose to handle this sit- The solution presented in this article will be included in a
uation in a way similar to the case explained in the last sec- future version of our Internet drafts to be considered for stan-
tion. The new address is added by the sending of a message dardization in the BEHAVE working group of the IETF.
containing an ASCONF chunk. But as the old address is com-
pletely replaced by the new one as soon as the previous cell is References
left, another parameter must be added that indicates that the [1] Q. Xie et al., “SCTP NAT Traversal Considerations,” draft-xie-behave-sctp-
primary path should be set to the new address. This causes nat-cons-03.txt (work in progress), Nov. 2007.
the server to send the next packets to the new address. [2] R. Stewart and M. Tüxen, “Stream Control Transmission Protocol (SCTP) Net-
work Address Translation,” draft-stewart-behave-sctpnat-03.txt (work in
progress), Nov. 2007.
Multihoming with Rendezvous Server [3] M. Tüxen and R. Stewart, “UDP Encapsulation of SCTP Packets,” draft- tuex-
The final step in increasing the complexity of the NAT sce- en-sctp-udp-encaps-02.txt (work in progress), Nov. 2007.
nario is the communication between two multihomed peers [4] P. Srisuresh and M. Holdrege, “IP Network Address Translator (NAT) Termi-
nology and Considerations,” RFC 2663, Aug. 1999.
that are behind different NAT middleboxes. [5] R. Stewart, “Stream Control Transmission Protocol,” RFC 4960, Sept. 2007.
Just like in the single-homed case, the rendezvous server [6] R. Stewart et al., “Stream Control Transmission Protocol (SCTP) Dynamic
must gather the peer information to fill its table. This time the Address Reconfiguration,” RFC 5061, Sept. 2007.
table must be enlarged by the additional addresses. The peers [7] M. Riegel and M. Tüxen, “Mobile SCTP Transport Layer Mobility Manage-
ment for the Internet,” Proc. SoftCOM 2002, Int’l. Conf. Software, Telecom-
first set up an association with the rendezvous server. Using munications and Computer Networks, Split, Croatia, 2002, pp. 305–09.
this server the peers can obtain each other’s addresses and [8] B. Ford and P. Srisuresh, “Peer-to-Peer Communication across Network Address
port numbers. Translators,” USENIX Annual Technical Conf., Anaheim, CA, Apr. 2005.
At this point, the peers must set up an association via ini- [9] R. Stewart and Q. Xie, Stream Control Transmission Protocol (SCTP): A Refer-
ence Guide, Addison-Wesley, Oct. 2001.
tialization collision to provide a path by using hole punching. [10] I. Rüngeler, M. Tüxen, and E. Rathgeb, “Integration of SCTP in the
To also use the second path, on the way, the NAT middlebox- OMNeT++ Simulation Environment,” Int’l. Developers Wksp. OMNeT++
es must obtain the required information. By sending messages (OMNeT++ 2008), Mar. 2008.
containing ASCONF chunks almost simultaneously, the NAT
middleboxes are notified to allow packets arriving from the Biographies
opposite direction to pass through. Unfortunately, the mecha- ERWIN P. RATHGEB (erwin.rathgeb@iem.uni-due.de) received his Dipl.-Ing. and
nism described earlier to request information by sending a Ph.D. degrees in electrical engineering from the University of Stuttgart, Germany,
in 1985 and 1991, respectively. He has been a full professor at the University
message containing an ERROR chunk does not work when Duisburg-Essen since 1999 and holds the Alfried Krupp von Bohlen und Halbach
coming from the global side of the network because only the Chair for Computer Networking Technology at the Institute for Experimental Math-
host behind the NAT middlebox can provide the data to fill ematics. From 1991 to 1998 he held various positions at Bellcore, Bosch Telekom,
the NAT table. So when the message containing an ASCONF and Siemens. His current research interests include concepts and protocols for
next-generation Internets with a focus on network security. He is a member of IFIP,
chunk arrives at the opposite NAT middlebox before a hole is GI, and ITG, where he is chairman of the expert group on network security.
punched, the packet is discarded, but its retransmission might
be successful. After both NAT tables receive the appropriate IRENE RÜNGELER (i.ruengeler@fh-muenster.de) received her diplomas in computer
entries, the secondary paths also can be used. science and economics at the University of Hagen in 1992 and 2000, respec-
tively. She joined the Münster University of Applied Sciences in 2002, where she
works as a research staff member. Her research interests include innovative
Conclusion transport protocols, especially, SCTP and their performance analysis, signaling
transport over IP-based networks, and fault-tolerant systems.
In this article, we proposed a comprehensive solution for the
R ANDALL S TEWART (randall.stewart@trgworld.com) works for TRG Holdings as
support of SCTP in NAT middleboxes. We motivated the chief development officer. His current duties include integrating software solutions
necessity for a specific NAT concept with NAPT functionality, for call center applications using both SCTP and RSerPool. Previously, he was a
where the verification tags provided by SCTP are used to dis- distinguished engineer at Cisco systems. He also has worked for Motorola,
tinguish between associations. The NAT middleboxes can NYNEX S&T, Nortel, and AT&T Communications. Throughout his career he has
focused on operating system development, fault tolerance, and call-control sig-
request information from the SCTP end points and give hints naling protocols. He is also a FreeBSD committer with responsibility for the SCTP
to improve the overall procedure. reference implementation within FreeBSD.
Furthermore, several scenarios were analyzed to explain the
manipulation of the NAT table in single-homed, multihomed, MICHAEL TÜXEN (tuexen@fh-muenster.de) studied mathematics at the University of
Göttingen and received a Dipl.Math. degree in 1993 and a Dr.rer.nat. degree
and mobility environments. The peer-to-peer communication in 1996. He has been a professor in the Department of Electrical Engineering
with a preregistration was taken into account as well. and Computer Science of Münster University of Applied Sciences since 2003. In
Generalizing the SCTP-specific variant of NAT, the follow- 1997 he joined the Systems Engineering group of ICN WN CS of Siemens AG
ing is important. For supporting a transport protocol with in Munich. His research interests include innovative transport protocols, especial-
ly SCTP, IP-based networks, and highly available systems. At the IETF, he partici-
multipath support, a connection identifier makes connection pates in the Signaling Transport, Reliable Server Pooling, and Transport Area
tracking possible without a requirement to rely on the port Working Groups.
Abstract
Because of the constant reduction of available public network addresses and the
necessity to secure networks, middleboxes such as network address translators and
firewalls have become quite common. Because they are designed around the
client-server paradigm, they break connectivity when protocols based on different
paradigms are used (e.g., VoIP or P2P applications). Centralized solutions for mid-
dlebox traversal are not an optimal choice because they introduce bottlenecks and
single point-of-failures. To overcome these issues, this article presents a distributed
connectivity service solution that integrates relay functionality directly in user nodes.
Although the article focuses on applications using the Session Initialization Proto-
col, the proposed solution is general and can be extended to other application sce-
narios.
with limited connectivity, the central server must handle about ed connectivity for receiving SIP messages) and media relay.
50,000 keep-alive messages per second. Connectivity peers also can offer support to the hole-punching
This article proposes a distributed architecture — referred procedure for media session establishment, thus operating as
to as DIStributed COnnectivity Service (DISCOS) — for a distributed rendezvous server. In addition, connectivity
ensuring connectivity across NATs and firewalls in a SIP peers also provide support for middlebox behavior discovery
infrastructure. This solution overcomes the limitations of the [7]. UAs with limited connectivity can locate and attach to an
current centralized solution by creating a gossip-based P2P available peer whenever they require one of these services.
network and integrating the previously described rendezvous Connectivity peers are organized in a P2P overlay, and
and relay functionalities in the UAs. Each globally reachable their knowledge is spread through proper advertisement mes-
UA with enough resources can provide such services to UAs sages, thus building an unstructured gossip-based network.
with limited connectivity. A major emphasis is given to the Structured networks, characterized by additional overhead
overlay design, as it is a key point for ensuring a fast “service due to the maintenance of the structure, are not considered
lookup” (i.e., to find a peer that still has enough resources for because their excellent lookup properties are not required. In
offering the connectivity service), which is instrumental for fact, DISCOS uses the overlay to find only the first available
providing an adequate quality of service to the users. In par- connectivity peer and not for locating a precise resource.
ticular, we show how a scale-free topology can fit this require- Note that because DISCOS distributes existing middlebox
ment, and we propose an overlay construction model that can traversal functionalities among peers, it is also totally compati-
be used to build such topology. ble with current middleboxes and their traversal techniques.
DISCOS is somewhat orthogonal to P2P-SIP [8], although This enables a smooth deployment of the proposed solution.
both are based on P2P technologies. In fact, P2P-SIP is a
solution mainly for distributed lookup, whereas DISCOS Overlay Topology
offers a solution for middlebox traversal. In order to enable DISCOS to locate an available peer for
The idea of distributing such functionalities among end sys- UAs with limited connectivity in the shortest time possible,
tems is also one of the characteristics of Skype, a well-known peers should have a deep knowledge of the network: the
VoIP application. However, Skype uses secret and proprietary greater the number of known peers, the higher the probability
protocols that cannot be studied and evaluated by third par- of finding an available peer in a short time, especially if
ties, therefore limiting the ability to understand exactly how known peers are lightly loaded. In gossip-based networks, the
these problems are solved. For example, in the Skype analysis spread of information is based on flooding, thus the overlay
presented in [9] and [10], the authors could give only partial topology has a deep impact on the network efficiency. For
explanations about its NAT and firewall traversal mechanisms. instance, the greater the average path length between nodes,
Their experiments pointed out that nodes with enough the higher the depth of the flooding (hence the load on the
resources can become supernodes and provide support for network) that is required for an adequate spread of the infor-
NAT and firewall traversal. In particular, they offer relay mation. Thus, an overlay topology that ensures a small aver-
functionalities and probably run a sort of STUN server that age path length is required. However, this is not sufficient for
other nodes use to discover the presence (and to determine enabling peers to know a large set of suitable connectivity
the type) of NAT and firewall in front of them. Therefore, it peers from which to choose when a UA asks for the connec-
is clear that a node behind NAT must connect to a super tivity service. In fact, nodes maintain a cache that should be
node to be part of the Skype network, but no information kept small to reduce the overhead required to manage all the
could be provided about the super node discovery and selec- entries. This limits the number of peers known at each instant.
tion policies. Also, super node overlay topology is almost The limited cache size can be compensated by frequently
completely unknown. Thus, there is no way to evaluate the refreshing its contents so that the set of known peers changes
effectiveness of these solutions. On the other hand, here we frequently, resulting in a sort of round robin among peers: dif-
propose a distributed architecture for middlebox traversal ferent connectivity peers can always be provided to UAs that
whose scalability and robustness are discussed and evaluated. request the service at different instants, thus increasing the
In addition, the solution was engineered and validated by sim- opportunity for a queried connectivity peer to suggest avail-
ulation on a SIP infrastructure, but the solution is more gen- able ones when it cannot provide the service itself. Frequent
eral, and it can be seen as a mechanism to cope with cache refresh also is useful for ensuring that nodes store up-
middlebox traversal, thus opening the path to a wider adop- to-date information about existing peers. Such a policy can be
tion. efficiently adopted if the overlay results in a scale-free network
[11], an interesting topology that ensures small average path
Operating Principles length and features scalability and robustness. In a scale-free
network, few nodes (referred to in the following as hubs) have
Distributed Connectivity Service a high degree, whereas the others have a low one. The degree
DISCOS extends current centralized NAT and firewall traver- of a node is the sum of all its incoming (i.e., the in-degree)
sal solutions by distributing rendezvous and relay functionali- and outgoing (i.e., the out-degree) links. In the DISCOS over-
ties among UAs. Relaying and hole-punching service for lay, the out-degree of a node is limited by the cache size
media flows is implemented by integrating a STUN/TURN whereas the in-degree is the number of other peers that have
server in each UA. The TURN server also is used to support that node in their cache. Thus, nodes can be considered hubs
relaying SIP messages. However, DISCOS can be modified when they are in the cache of several peers, that is, when they
easily to offer the relaying of SIP messages by integrating SIP are highly popular. Hubs frequently receive advertisement
proxy functionalities in each UA, leading to a distributed messages from a large set of different nodes, so they frequent-
implementation of [3]. ly update their cache. In particular, if advertisement messages
A UA with enough resources (e.g., a public network contain nodes that are low in popularity, hubs can discover
address, a wideband Internet connection, and free CPU peers, which being low in popularity, are lightly loaded with
cycles) becomes what we define as a connectivity peer and high probability. The key is to make searches through hubs
starts to offer a connectivity service. In particular, connectivity because they potentially know a large variety of lightly loaded
peers can act as both SIP relay (leveraged by UAs with limit- peers. Thus, the proposed solution essentially exploits — and
CP CP
CP
CP
CP
CP CP
Bootstrap
service CP
CP
CP
CP
CP
CP CP
NAT CP
The UA behind NAT queries a
node (possibly a hub) for service
UA A
generalizes to the case of a single resource provided by many autonomously by each node through a simple approximated
nodes — the results achieved by Adamic et al. [12] about ran- metric based on the number of received advertisement mes-
dom walk searches in unstructured P2P overlays. They demon- sages that contain such a node. In our approximated model,
strated that searches in scale-free networks are extremely preferential attachment is implemented by forcing peers to
scalable (their cost grows sublinearly with the size of the net- evaluate the popularity of nodes through the previously men-
work), also proving that searches toward hubs perform better tioned mechanism and then to include some of the most pop-
than random searches because hubs have pointers to a larger ular peers in the advertisement messages they send. This
number of resources. In DISCOS, the benefit of searching allows nodes to insert highly popular peers (hubs) in their
through hubs comes from the high frequency with which cache, thus building and maintaining the scale-free topology.
pointers to connectivity peers change in their cache. These In summary, new nodes use the peers known through the
properties are obtained at the expense of a non-uniform dis- bootstrap service as “bootstrap” nodes; then they learn the
tribution of the number of messages handled by nodes: the most popular ones through the received advertisement mes-
higher the popularity of a node, the larger the number of sages and start to perform preferential attachment. Further-
advertisement messages received. However, a proper hub more, incoming nodes that already know peers discovered
selection policy and a reasonable advertisement rate could during their previous visits can avoid the bootstrap procedure
mitigate the effects of this disparity. These aspects are ana- by attaching directly to them. The resulting topology is shown
lyzed in more detail in the following section. in Fig. 1.
The Barabasi-Albert [11] model was proposed to create It is worth noting that different bootstrap services can be
scale-free graphs. In this model, few nodes are immediately used to create disjoint overlays because joining peers that
available and when a new node arrives, it connects to one of fetch nodes from different bootstrap services start to exchange
the existing nodes with a probability that is proportional to advertisement messages with different connectivity peers. This
the degree of popularity of such a node (preferential attach- enables the possibility of deploying different DISCOS overlays
ment); in other words, the model assumes a global knowledge in different geographical areas of a SIP domain. If a location-
of nodes and their degree, which is clearly inapplicable in a aware bootstrap service selection policy is adopted, users can
real network scenario. A first step to implement such a model find a connectivity peer that is close to them, thus preserving
in our overlay is to make M peers available to other nodes the user-relay latency achieved by current centralized solu-
through a bootstrap service. When a node joins the overlay tions, where different servers can be used at different loca-
for the first time, it queries the bootstrap service for a subset tions.
of these M registered nodes. However, preferential attach- The implementation of the bootstrap service is highly cus-
ment is not possible with the mechanism described so far tomizable. A possible solution consists in deploying M static
because all incoming peers: peers and preconfiguring their addresses on each UA. A more
• Can learn only the nodes provided by the bootstrap service flexible approach (considered in the following) consists in
• Cannot compute the popularity of a node deploying multiple bootstrap servers reachable through appro-
An adequate spread of the network knowledge can address priate domain name server service (DNS SRV) location
the first issue, but there is no way to enable a node to learn entries configured in the DNS. Each bootstrap server stores
the in-degree (i.e., the precise metric of node popularity) of information about M connectivity peers that spontaneously
the others. In our case, the popularity is computed register themselves when they join the overlay. Multiple boot-
Is it available Yes
No Has node Yes for SIP/media relay
limited connectivity? Insert new
peer service?
Join DISCOS SIP relay No Success
overlay lookup
Get the three
peers provided
in the response
Order the cache
by popularity
Put the most
popular in cache
(drop less
popular if full)
No Response
within a
timeout?
Yes
No Is it available Yes
for SIP/media relay
service?
Success
n Figure 2. Operation of DISCOS when: a) a node joins the SIP domain; b) a node in the overlay receives an advertisement message; c)
a node performs a SIP/media relay lookup.
strap servers are deployed for redundancy and load balancing Then, it sends an advertisement message to the known peers
purposes. Proper DNS configuration can enable a location- to announce itself. The UA is now part of the DISCOS over-
aware bootstrap service selection. lay, and it starts receiving messages from other nodes, thus
gradually filling its cache with new peers. A proper peer
Protocol Overview advertisement policy is adopted to implement preferential
Whenever a UA joins the SIP domain, it must determine if it attachment (thus building and maintaining the scale-free
can become a connectivity peer, or if it is behind a middlebox. topology) and to enable caches to be refreshed with lightly
This is done by contacting a connectivity peer and exploiting loaded peers (thus having potential nodes available for the
its STUN functionalities [7]. The described bootstrap proce- service). In particular, advertisement messages include the
dure is performed if it does not know an active peer. The flow sender node, the two most popular peers it knows (enabling
chart related to the join procedure is shown in Fig. 2a. preferential attachment), and the two less popular peers it
If the UA can become a connectivity peer, it checks the knows (spreading the knowledge of lightly loaded peers).
number of addresses registered on each bootstrap server and Advertisement messages are periodically sent by peers to
if it is smaller than a fixed bound M, it adds itself to the list. all nodes they have in their cache and contain a special time-
to-live (TTL) field that allows the message to cross N hops: as solutions were proposed and can be seamlessly applied in
soon as the message is received, the TTL value is decrement- DISCOS. For example, in [14], public key certificates are dis-
ed and if it is a positive value, the recipient sends another tributed among users to enable them to verify the origin and
message to all the nodes in its cache. Every time a peer the integrity of messages. Analogously, certificates can be
receives an advertisement message, it updates its cache by used in DISCOS to authenticate advertisement messages, so
increasing the popularity of nodes already present and by that they can be considered trusted. This limits the operation
inserting the new ones. As previously described, it is impor- of malicious peers as they can be easily traceable. This and
tant for a node to have both hubs and peers of low popularity other P2P-SIP derived security policies certainly require fur-
in its cache. Thus, a proper cache management policy also is ther improvement to better fit specific DISCOS requirements.
adopted if the cache is full: the node with average popularity However, we are confident that effective results can be
is removed before the insertion, resulting in a cache that privi- obtained with minimal modifications because, as mentioned
leges big hubs and peers of low popularity. Figure 2b details previously, the security issues that must be addressed are simi-
the operations of a peer when it receives an advertisement lar in the two environments. This additional effort is left for
message. future work.
UAs with limited connectivity have a different behavior
because essentially they exploit DISCOS features to find SIP Overlay Simulation
relays (they choose a connectivity peer as relay for SIP mes-
sages as soon as they join the SIP domain; in addition, they Simulations Background
select another when the current one disappears) and media We developed a custom, event-driven simulator to evaluate
relays (when they need one to establish a media session). A the effectiveness of the proposed solution. In particular, we
UA with limited connectivity performs these lookups by con- were interested in proving its scalability and validating its
tacting the most popular peers in its list, which can accept or algorithms. Thus, we implemented a simulator supporting the
decline the request. If it refuses, it includes in the answer the following four operations: node arrival/departure, media ses-
two least popular peers and the most popular peer it knows: sion set up/teardown, SIP relay lookup (triggered when a
the least popular peers are queried immediately (since they node with limited connectivity joins the network or when its
are supposed to be free enough to provide connectivity), current SIP relay disappears), and media relay lookup (that
whereas the most popular is inserted in the cache (because it occurs when a node requires a relay to perform a media ses-
can perform faster searches as it is probably a hub). If both sion).
queried peers refuse to provide the service, another node is Simulations are referred to a single SIP domain. Node
picked from the cache, and the procedure is repeated. If all arrivals and call occurrences are modeled using a Poisson pro-
the nodes in the cache were queried without success, two dif- cess, whereas node lifetime and call length are extracted from
ferent policies are applied, depending on the type of service real Skype traffic coming from/to the network of the universi-
the UAs with limited connectivity require: in the case of ty campus to approximate the behavior of real VoIP networks.
lookup for a SIP relay, the UA waits for a random time and With our parameters, the average number of nodes in the net-
then repeats the procedure; in the case of lookup for a media work depends on their arrival rate because of the effect of the
relay, the procedure is stopped, and the media session cannot Poisson arrivals model coupled with the lifetime distribution
be established. Relay lookup procedure is shown in Fig. 2c. of Skype. For example, an arrival rate λN = 100 nodes/minute
UAs with limited connectivity also receive ad hoc messages leads to a network consisting, on average, of 30,000 nodes,
from their relays containing three highly popular peers that which is the standard size in our simulation and is a good
allow them first to fill, and then to update, their cache with trade-off between simulation length (some lasting several days
new hubs. This enables them to direct searches toward hubs on a Dual Xeon 3 GHz processor) and significance of results.
when they require a connectivity peer. Broken hubs (e.g., To test our solution within different traffic load scenarios,
because of a network failure) are detected through a timeout: three different rates are used for media session occurrences:
if a hub does not reply to a query, the UA can query one of 1.4 λN, 5 λN, and 20 λN sessions/minute. These values, coupled
the others hubs in its cache. If no peers are available, the UA with the distribution of the Skype call duration, lead to 10
again fetches the registered ones from the bootstrap server; percent, 30 percent, and 98 percent of nodes simultaneously
however, this situation is unlikely to occur because UAs with involved in a media session, respectively.
limited connectivity periodically receive new hubs from their Statistics presented in [15] show that about 74 percent of
SIP relays. hosts are behind a NAT. In addition, [1] shows that hole
This protocol could be integrated in SIP, as well as imple- punching is successful in about 82 percent of the cases. To the
mented separately. The former approach is more straightfor- best of our knowledge, no detailed information is available
ward as it simply consists in defining new SIP header fields. about firewall proliferation over the Internet. On the strength
The latter one is more efficient, especially concerning the of these available data, we consider for simulation a network
message size. In fact, the human-readable nature of SIP mes- scenario where nodes have limited connectivity with probabili-
sages would result in advertisement messages of about 800 ty P LC = 0.74 and media sessions directed to these nodes
bytes. require relaying with probability P MR = 0.18. Whenever a
node joins the SIP domain, two different actions can be per-
Security Issues formed at simulation level: if it is tagged as a node with limit-
The deployment of a P2P architecture for providing connec- ed connectivity (with probability PLC), it triggers a SIP relay
tivity service raises several security issues that are different lookup; otherwise it joins the DISCOS overlay as a connectivi-
than in centralized solutions. In DISCOS, like in many other ty peer. Media sessions are possible between each pair of
distributed systems, the control of the consequences of mali- nodes (selected randomly). When a node behind a NAT is
cious behavior of nodes can be more difficult than in the cen- contacted, a media relay lookup is triggered by this node with
tralized counterpart. Much effort has been expended during probability PMR.
past years in investigating these issues in the context of P2P- The number of UAs with limited connectivity to which a
SIP overlays [8, 13, 14] that must deal with similar concerns as peer can simultaneously provide SIP relay service is set to 10;
they replace centralized SIP proxies for user locations. Some advertisement messages have a TTL equal to 2; their sending
0,5 1
DISCOS overlay DISCOS observations
Average clustering coefficient Random graph Power law, c=0.7, y=1.5
0,4
0,1
Fraction of nodes
0,3
0,01
0,2
0,001
0,1
0 0,0001
0 5000 10,000 15,000 20,000 25,000 30,000 1 10 100
Network size (nodes) In-degree
(a) (b)
12 0,007
DISCOS overlay 10% involved in a call
Random spread and lookup 30% involved in a call
10 0,006
98% involved in a call
0,005
Failure probability
Contacted nodes
8
0,004
6
0,003
4
0,002
2 0,001
0 0
0 5000 10000 15000 20000 25000 30000 0 1 2 3
Network size (nodes) Number of backup relays
(c) (d)
14 1
10% involved in a call 10% involved in a call
12 30% involved in a call 30% involved in a call
98% involved in a call 0,8 98% involved in a call
10
Fraction of nodes
Contacted nodes
8 0,6
6
0,4
4
0,2
2
0 0
1 2 3 0 1 2 3
Number of allocated relays Number of media flows per relay
(e) (e)
n Figure 3. Simulations results: a) average clustering coefficient evaluation; b) in-degree power law distribution; c) average number of
contacted peers to find a SIP relay; d) media session failure probability vs. number of allocated backup relays; e) average number of
peers contacted to allocate K relays; f) bandwidth consumption distribution.
interval is set to 60 minutes; and the cache of a peer is sup- coefficient higher than the one of a random graph obtained in
posed to contain 10 entries. Furthermore, the number of the same conditions, which is clearly proved in Fig. 3a. In
peers registered in the bootstrap server (which is supposed to detail, the average clustering coefficient for DISCOS decreas-
be unique and reachable by nodes) is set to 20. Simulation es when the network size grows, asymptotically converging to
lasts enough to exit from the transient period; presented a value that is about 20 times the clustering coefficient of a
results are referred to the steady state. random graph. We also verified that at all network sizes
experimented, the coefficient remains almost constant in time.
Overlay Topology Evaluation Concerning the in-degree, the requirement to be met is that
First, simulation aims at demonstrating that our protocol cre- the distribution of node degree follows a power law, where
ates a scale-free network among connectivity peers. In partic- the probability is that a node has k connections, and c is a
ular, we consider the clustering coefficient and the in-degree normalization factor. Figure 3b shows that the distribution of
of nodes [11]. The clustering coefficient of a node is defined in-degree values obtained through simulation fits well a power
as the number of links between its neighboring nodes divided law P(k) = ck–γ with c = 0.7 and γ = 1.5. These tests validate
by the number of links that could possibly exist between them. our overlay construction model, showing that the resulting
To be scale-free, an overlay must have an average clustering topology really evolves in a scale-free network.
To prove the effectiveness of the DISCOS topology, we deriving from the search of back-up relays is depicted in Fig.
compare our solution with a distributed system where the 3e, which plots the average number of peers that must be con-
information is randomly spread, and the nodes to query dur- tacted to find K available media relays. For a reasonable num-
ing lookup procedures are randomly chosen among peers in ber of simultaneous sessions, this value remains low. However,
the cache. Figure 3c depicts the average number of peers that we set the number of back-up relay nodes to one, which is a
must be contacted to reach an available SIP relay for both reasonable trade-off between the probability of a session drop
DISCOS and the randomized overlay. Although the advertise- and the additional complexity that results when a UA must
ment rate and the TTL value remain the same, the figure search a back-up relay node before starting media sessions.
shows that in DISCOS, the number of peers contacted is sen- Finally, we analyzed the distribution of load among connec-
sibly lower. Furthermore, the ratio between the performances tivity peers. In particular, Fig. 3f shows the distribution of the
obtained by the two policies increases with the network size, number of media flows simultaneously handled by media
thus demonstrating the scalability properties of our solution. relays. It can be observed that although media flows have dif-
These tests prove the effectiveness and the scalability of ferent bandwidth requirements, the great part of relays simul-
DISCOS. In particular, results show how the scale-free taneously handles no more than one media session. Thus, a
topology ensures overlay efficiency with a limited message good load balancing among peers is guaranteed.
rate (each peer sends an advertisement message every 60
minutes) with a small TTL (equal to 2) and a limited cache
size (10 entries). We also evaluated the number of adver-
Conclusions
tisement messages that connectivity peers must handle in This article presents a distributed infrastructure, called DIS-
our simulated SIP domain including 30,000 UAs: 99 percent COS that aims at providing connectivity service to hosts
of nodes process less than seven advertisement messages behind middleboxes. This solution extends current centralized
per minute and the remaining 1 percent process a number approaches (and overcomes their scalability and robustness
of messages that varies between eight and 48 messages per limitations) by integrating middlebox traversal functionalities
minute, thus resulting in a reduced per-node overhead. into edge nodes. The article also presents the mechanisms
However, this confirms that hubs should be chosen careful- that can be used to manage such infrastructure and exploit its
ly, with a preference for nodes with enough computational services. The proposed infrastructure is based on an unstruc-
and bandwidth resources, for example, using the dynamic tured peer-to-peer paradigm and proved to be extremely
protocol proposed by Chawathe et al. for the Gia P2P net- effective in locating suitable relays and distributing media ses-
work [16]. sions evenly among the available connectivity peers. Results
confirm that the overhead for managing the overlay is low,
Media Sessions Relaying Performance that each host is able to locate a suitable connectivity peer
This section analyzes the overlay support for media sessions, with a small number of messages (hence, in a very short time),
in particular when hole punching fails and relaying is required. and the blocking probability of a new media call is negligible
To prevent resource wasting, a media relay is typically chosen even for a very high load. Although our simulations cannot
by a UA immediately preceding the establishment of a media simulate a nationwide network (for processing/memory prob-
session. Various types of media flows are considered, differing lems), we are confident that results can be extended to such
in the amount of consumed bandwidth. In particular, assum- an environment because the distributed infrastructure is based
ing b bit/s is the consumed bandwidth unit, five types of flows on the scale-free topology, which is the key to achieving these
requiring nb (1 ≤ n ≤ 5) bit/s are defined. The flow type is ran- results ensuring overlay scalability and robustness.
domly selected (with uniform distribution) when a new session Future work aims to validate the proposed infrastructure in
starts. We also define Bi as the amount of bandwidth that non-SIP environments and more exhaustively address security
peer i can offer for relaying media sessions. For the sake of issues.
simplicity, Bi is assumed to be the same for each connectivity
peer and equal to 5b bit/s. However, in a real scenario, this Acknowledgment
value could vary according to node capabilities. The authors would like to thank Marco Mellia who was
We start the evaluation of the DISCOS support for media instrumental in obtaining a proper characterization of Skype
sessions from the estimation of the call failure probability user agents.
because it is the parameter that mainly affects the quality of
service perceived by users. A session can fail because either References
an available relay cannot be found, or the relay is found but [1] B. Ford, P. Srisuresh, and D. Kegel, “Peer-to-Peer Communication across
becomes unavailable during the session (e.g., because it dis- Network Address Translators,” USENIX Annual Tech. Conf., Anaheim, CA,
connects from the network). Apr. 2005.
[2] J. Rosenberg et al., “SIP: Session Initiation Protocol,” IETF RFC 3261, June
With respect to the first problem, we never observed such 2002.
an event during simulation: a UA with limited connectivity [3] C. Jennings and R. Mahy, Eds., “Managing Client Initiated Connections in
was always able to find a media relay. This result suggests that SIP,” http://tools.ietf.org/html/draft-ietf-sip-outbound-11, Nov. 2007.
with our assumptions about the number of media sessions [4] J. Rosenberg, “Interactive Connectivity Establishment (ICE): A Protocol for
NAT Traversal for Offer/Answer Protocols,” http://tools.ietf.org/html/draft-
requiring a relay, the probability for this event to occur in a ietf-mmusic-ice-18, Mar. 2008.
DISCOS environment can be considered negligible. The sec- [5] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays around
ond issue could be mitigated by implementing proper relay NAT (TURN): Relay Extensions to Session Traversal Utilities for NAT (STUN),”
back-up policies. As shown in Fig. 3d, the media session can http://www3.tools.ietf.org/html/draft-ietf-behave-turn-07, Feb. 2008.
[6] J. Rosenberg et al., “Session Traversal Utilities for (NAT) (STUN),”
fail in about 0.6–0.65 percent of cases, but the selection of a http://tools.ietf.org/html/draft-ietf-behave-rfc3489bis-15, Feb. 2008.
single back-up relay (that handles the communication in case [7] D. MacDonald and B. Lowekamp, “NAT Behavior Discovery Using STUN”;
the first relay fails) sensibly reduces this probability, and fur- http://www3.tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-03,
ther reductions are possible increasing the number of relay Feb. 2008.
[8] D. A. Bryan and B. B. Lowekamp, “Decentralizing SIP,” ACM Queue, vol. 5,
nodes. The blocking probability remains low even in the no. 2, Mar. 2007.
unlikely case in which 98 percent of the users are involved in [9] S. A. Baset and H. Schulzrinne, “An Analysis of the Skype Peer-to-Peer Inter-
a call (i.e., almost all users are at the phone). The overhead net Telephony Protocol,” IEEE INFOCOM ’06, Barcelona, Spain, Apr. 2006.
[10] P. Biondi and F. Desclaux, “Silver Needle in the Skype,” Black Hat Europe GUIDO MARCHETTO (guido.marchetto@polito.it) received his Ph.D. in computer
2006, Amsterdam, The Netherlands, Mar. 2006. engineering in April 2008 and his laurea degree in telecommunications engi-
[11] R. Albert and A.-L. Barabási, “Statistical Mechanics of Complex Networks,” neering in April 2004, both from Politecnico di Torino. He is a post-doctoral fel-
Rev. Modern Physics, 74, 2002, pp. 47–97. low in the Department of Control and Computer Engineering at the Politecnico di
[12] L. A. Adamic et al., “Search in Power Law Networks,” Physical Rev., E 64, Torino. His research topics are packet scheduling and quality of service in pack-
2001. et-switched networks, peer-to-peer technologies, and voice over IP protocols. His
[13] J. Seedorf, “Security Challenges for Peer-to-Peer SIP,” IEEE Network, vol. interests include network protocols and network architectures.
20, no. 5, Sept. 2006.
[14] C. Jennings et al., “Resource Location and Discovery (RELOAD)”; FULVIO RISSO (fulvio.risso@polito.it) received his Ph.D. in computer and system
http://www.p2psip.org/drafts/draft-bryan-p2psip-reload-04.txt, June 2008. engineering from Politecnico di Torino in 2000 with a dissertation on quality of
[15] M. Casado and M. J. Freedman, “Peering through the Shroud: The Effect of service in packet-switched networks. He is an assistant professor in the Depart-
Edge Opacity on IP-Based Client Identification,” USENIX/ACM Int’l. Symp. ment of Control and Computer Engineering of Politecnico di Torino. His current
Networked Sys. Design and Implementation, Cambridge, MA, Apr. 2007. research activity focuses on efficient packet processing, network analysis, net-
[16] Y. Chawathe et al., “Making Gnutella-Like P2P Systems Scalable,” ACM work monitoring, and peer-to-peer overlays. He is the author of several papers
SIGCOMM ’03, Karlsruhe, Germany, Aug. 2003. on quality of service, packet processing, network monitoring, and IPv6.
Abstract
Users can be served by multiple network-enabled terminal devices, each of which
in turn can have multiple network interfaces. This multihoming at both the user and
device level presents new opportunities for mobility handling. Mobility can be han-
dled by utilizing devices, namely, middleboxes that can provide intermediary rout-
ing or adaptation services. This article presents an approach to enabling this kind
of mobility handling using the concept of personal networks (PNs). Personal net-
works (PNs) consist of dynamic conglomerations of terminal and middlebox devices
tasked to facilitate the delivery of information to and from a single human user.
This concept creates the potential to view mobility handling as a path selection
problem because there may be multiple valid terminal device and middlebox con-
figurations that can successfully carry a given communication session. We present
details and an evaluation of our approach, based on an extension of the Host
Identity Protocol, which demonstrate its simplicity and effectiveness.
Personal network
GPRS
Bl
ue
to
et
o
ern
th
Eth
Wi-Fi
Personal network
Service proxies
Mobile ins
router Jo
PN ‘B’ PN ‘B’
Server #1 Server #2 Server #3
(a) (b)
Personal network
Dep
arts
PN ‘A’ PN ‘A’
arts
Dep
Dep
arts
PN ‘B’ PN ‘B’
(c) (d)
n Figure 2. Service-path-based mobility (example): a) many candidate paths, one selected (bold line)
using laptop as SP for display; b) mobile router appears and offers a better path, previous path
replaced by new path; c) mobile router leaves coverage range, display is switched off, path readjusts
laptop now used as a terminal device; d) laptop battery depleted, triggering selection of different end-
to-end path toward PDA.
initiating the action. In the example, TDA is the initiator as can be centralized or distributed. Service selection and discov-
shown in Fig. 3c. ery is not addressed in detail here but is discussed in the con-
text of related work.
Identity Delegation Toward Multiple SPs
Identity delegation assists service composition by allowing two
hosts engaged in a communication session, TDA and TDB, to
Design and Implementation
delegate their identity to the head and tail of the composed In current systems, IP addresses are the most common type of
SP chain, SP1 and SP2. This enables an arbitrary AA number identifier used for end-to-end communication. However, IP
of intermediate SPs to be inserted in between the head and addresses are strongly bound to topological location and thus
the tail transparently to TDA or TDB. Figure 4 follows Fig. 3 not suitable for the purpose of delegation that is required to
and depicts this usage case as a sequence of twelve consecu- realize the scenarios described in the previous section. As a
tive steps. result, we base our design on the HIP, which uses identifiers
Since application data streams can consist of several sepa- that are decoupled from network topology. This section first
rable atomic components that can be routed independently, provides some background on HIP and IPSec, a general
for example, audio and video, and because some intermediary description of the identity delegation approach compared with
SPs can split or join certain application data flows, it is possi- a naïve private key duplication-based approach, and then
ble to construct an end-to-end SP path that is composed of delves into the specifics of the prototype implementation.
two or more converging subpaths. The benefits provided by
splitting and joining media include the potential for selection Host Identity Protocol and IPSec
of hybrid service paths that are more efficient than any avail- Schemes such as mobile IPv6 (MIPv6) [9] and the HIP [10]
able serial service path. In some cases, it may be desirable to provide a static identifier, referred to as a home-address (HoA)
construct service paths that do not completely converge, for in the former and a host identity tag (HIT) in the latter, which
example, to deliver the audio component of a media stream to is separate from its IP or IPv6 address that can be routed. The
a separate network interface or terminal device to the rest of approach presented here is based on HIP, although the general
the stream. approach is applicable to any similar scheme.
For the purpose of discussion, it is assumed that SPs partic- HIP is an end-to-end communication protocol that intro-
ipate in some common directory, the administration of which duces a thin layer of resolution between the network and
Decision to utilize
TDB TDB service proxy SPA TDB
Data flow
SPA SPA
xy
Pro aling
g n
si
TDA TDA TDA
transport layers, decoupling sockets from network addresses. bound end-to-end tunnel (BEET) mode of IPSec operation
Instead of binding to IPv6 addresses, applications bind to 128- that eliminates the requirement to retain the source and desti-
bit HITs, a flat (non-hierarchical) crypto-graphic identifier nation HIT as an encapsulated header in each transmitted
generated by hashing a public key. Due to the decoupling packet [10]. The set up of a HIP connection between two
between the network and transport layers, HIP enables appli- hosts results in a pair of unidirectional BEET mode IPSec
cations on a mobile host to continue communication oblivious security associations (SAs) at each host. The security parame-
to changes in local network addresses. New HIP communica- ter index (SPI) for each SA is contained in the I2 and R2 base
tion sessions are preceded by a challenge-response-based exchange packets and used by the hosts to determine the
authentication process. source and destination HIT. The mapping between IPSec SPI
As HIP deals only with control signaling, standard IPSec is and source/destination HIT is performed by the BEET mode
used to carry the actual data traffic. The implementation of association, which simply replaces the network layer addresses
HIP referred to in this article uses the recently proposed with HITs after decryption.
IPSec setup
TDB SPA2 TDB SPA2 TDB SPA2
Data flow
Arbitrary SPAy
service proxy
configuration SPAx
now possible
IPSec setup
TDA SPA1 TDA SPA1 TDA SPA1
TDA TDB
IPSec secured
key for signing. The owner of the private key can
then, at its discretion, sign the messages and return
SPA them to the host that requested them, which can in
connect (IP?)
turn forward them on to the corresponding host
ack(IPSPA) thereby verifying the claim to use the HIT. The
main advantage of this approach is that it avoids
Setup IPSec secured signaling the dissemination of private keys. This approach
p[I1(IPSPA-IPTDB, HITTDA-HITTDB)] I1(IPSPA-IPTDB, HITTDA-HITTDB)
also allows temporary delegation of a HIT because
the destination host can use the HIT only for the
p[R1] R1 duration that the corresponding host does not
request a re-keying procedure to be performed. An
p[R12] I2 additional advantage is that because the location-
update-signaling messages forwarded to the key-
p[R2] R2
holder host for signatures also contain the HIT and
IPSec security association details
IP addresses of the corresponding host, the key-
owner host can keep track of the corresponding
IPSec secured IPSec secured hosts with which communication sessions are being
conducted using its identity. Should the key-owner
wish to revoke the use of its HIT from a certain
n Figure 5. Proxy insertion by TDA. destination host, it need only perform a re-key
directly with the corresponding host. The drawback
of this approach is that if the key-owner host disap-
HIP mobility handling comprises an authenticated location pears, any further requests to sign location-update-signaling
update procedure in which the mobile host delivers a signed messages cannot be processed. This means that a destination
location update packet to the correspondent host with details of host may be forced to terminate a communication session if
the new network layer address. Our contribution is an extension the corresponding host initiates a re-key.
to standard HIP that provides a means to delegate HITs between One potential philosophical ramification of the delegation
physical hosts on-the-fly in response to a mobility event. approach (on HIP specifically) is that so called host identities
Depending on local security policy, either the mobile host no longer explicitly belong to a specific host but are capable
or the correspondent host may ask to re-key the connection in of being moved around between physical hosts, contrary to
response to mobility. Re-keying also can be requested by the original intention of the designers of HIP. The proposed
either host after a certain time period has elapsed. Re-keying approach limits the architectural impact of this by ensuring
involves the deletion of existing IPSec SAs and the establish- that identities are delegated only temporarily and can never
ment of new ones with a newly generated session key. If re- be used without the explicit consent of the actual entity that
keying is not required, existing SAs are deleted and the host identity serves to identify.
re-established with the previous session key. This reconfigura- Our proposal changes the notion of end-to-end security of
tion of IPSec SAs is transparent to the transport layer. HIP because even though communication is still encrypted
To mitigate the effect of the implementation-specific secu- (IPSec), all nodes explicitly included in the service path can
rity policy on experimental results, a base exchange was sub- read the payload. This is a desired functionality because we
stituted for an update procedure in the work presented here. envisage that nodes included on the service path may be
A base exchange is, in fact, in most cases roughly equivalent tasked with some application-layer processing such as content
to an update with the main difference being modified HIP adaptation.
header fields.
the identifier. This can be solved in two ways, either by dupli- (c)
cating the requisite private key on any host that requires it, or
by forwarding location update signaling packets to be signed (a)
on demand. The second approach is advocated in this article
because private keys should, in principle, remain private. To
avoid the introduction of another acronym, the abbreviation
HIT is used interchangeably with the term cryptographic iden-
tifier in the remainder of this article. 2 4 6 8 10 12 14 16
In the approach proposed in this article, hosts that wish to Time (s)
use a delegated HIT are required to forward location update
signaling packets to the owner of the corresponding private n Figure 6. TCP sequence number vs. time plot: two service proxies.
extensions were evaluated on a Debian Linux system running respective times at which SP 1A and SP 2A were inserted in
kernel version 2.6.16. The remainder of this section provides between TDA and TDB. From the plot it can be observed that
an analysis of the signaling procedures specific to the imple- the effects of the insertion of SP2A are similar to those of the
mentation. Our description refers to a simple scenario: a insertion of SP1A in terms of latency and impact on TCP per-
mobile terminal device (TD A) communicating with a static formance. However, the plot also demonstrates that insertion
correspondent terminal device (TD B ). The analysis com- of multiple consecutive SPs does not result in any further
mences with a description of how TDA may delegate its iden- drop in performance provided the SPs are powerful enough to
tity to an intermediary SP. handle the required IPSec sessions without CPU saturation.
Figure 5 depicts the signaling involved in the delegation Some smaller gaps such as that indicated by (b) can be
process. It is assumed that there is a pre-existing trust rela- attributed to the CPU being utilized by the cryptographic
tionship between terminal devices belonging to a single PN. operations required to set up a secure signaling channel prior
The delegation process starts when the TD A1 queries the SP to hand off.
for the IP address (IPSP) that it wants to use for the delegat- It is important to note that if the capability to delegate or
ed identity. At the same instance, TDA and the SP establish transfer identity were not available, then the session must be
transport mode IPSec. This channel carries encapsulated HIP broken and restarted to insert and remove each intermediary
signaling traffic, as well as the IPSec security policy and asso- proxy, causing the TCP sequence number to reset to the
ciation information used to establish a BEET mode IPSec beginning each time.
used for applications data. The HIP signaling traffic between
TDA and SP is sent as encapsulated payloads indicated in Fig.
5 by “p […].” The SP relays any HIP signaling traffic either to
Related Work
TDA or TDB without modification. The whole process of iden- There are a number of previous and ongoing related works
tity delegation and subsequent session redirection is transpar- addressing inter-device mobility. On the other hand, there are
ent to applications running on TDB. The modular nature of fewer proposals that address the insertion of intermediary SPs
our design means that the scheme can be implemented as an as a mobility-handling technique.
extension to an existing HIP-enabled network stack. The only proposal that has a similar functionality is Stream
As mentioned previously, intermediary SP insertion also Control Transmission Protocol (SCTP). Like any other trans-
can be performed by TDB to construct chains of two or more port protocol, a node can be made to act as a proxy. In SCTP,
composed SPs between TDA and TDB. The signaling involved when an end point (A) initiates a connection, the other end
in the insertion of the second SP is equivalent to the SP inser- (B) can, with or without the knowledge of the initiator, open
tion by TDA shown in Fig. 5. an association to another entity (C) and act as a proxy in
between. Then B can either remove itself or make an associa-
Experimental Evaluation tion with C to receive the data from C. All this must be done
Evaluation of the identity delegation scheme was performed before heartbeat signals are exchanged [12]. The HIP base
for the second usage scenarios described previously. The specification provides no mechanism for inter-device mobility.
results of this scenario are also applicable to the other scenar- However, [1] and [8] allude to the possibility of identity dele-
ios because they utilize the same identity delegation mecha- gation using signed certificates. The approach proposed here
nism. The intention of the experiments was twofold: first, to provides a higher degree of transparency and control and is
provide a general evaluation of HIP performance in a real sys- more responsive than delegation certificates.
tem, and second, to show that the delegation approach does Koponen, Gurtov, and Nikander provide a high-level dis-
not result in any measurable performance drop compared to cussion of the potential for HIP identity delegation with cer-
unmodified HIP. Initially, it was assumed that most of the tificates [1]. References [2, 3] are related solutions that enable
hand-off latency overhead would be due to heavy CPU load ongoing communication sessions to be moved between
caused by the cryptographic operations required to sign HIP devices. In [2], Su creates a virtualized network interface that
signaling messages and establish IPSec sessions. As such, it can be transferred between different devices and with it the
was expected that the performance of both approaches would associated communication sessions. It should be noted that
be equivalent, provided that the machines used to sign HIP none of these schemes conflict with HIP or with the scheme
messages and set up IPSec sessions were equal in terms of presented here; in fact, there is even potential for useful inter-
processing power. These assumptions were confirmed by the operation. A major difference of the delegation approach is
evaluation results presented below. that it focused only on managing connectivity and can be
The experiment was performed to evaluate the scenario of implemented in such a way that it is at least transparent to
inserting intermediary SPs, which, for example, can be a con- one end of an end-to-end connection, if not both. There are
tent adaptation SP between two devices engaged in a TCP also a number of related activities in the IETF associated with
communication session. The purpose of evaluating this sce- locator/ID split [13 and 14]. Of this work, the network-based
nario was to demonstrate that the TCP connection between schemes such as Locator/Identifier Separation Protocol
the two devices remains unbroken and that the scheme does (LISP) do not consider the use of middleboxes. The others,
not cause any specific harm to the normal performance of especially mobility Internet key exchange (MOBIKE) and
higher-layer protocols. In reality, altering the end-to-end path SHIM6 focus on device mobility and do not support the use
in midsession may introduce some degradation in TCP perfor- of middleboxes as described in this article.
mance if the new path is of lower quality than the old path;
however, this issue is outside the scope of our proposal.
In these experiments, an initial communication session was
Conclusion
established from TDA (600 MHz PIII) toward TDB (500 MHz Auxiliary devices that can serve as dynamically configured
Celeron). The evaluated scenario was the insertion of two 3- middleboxes introduce potential for a new approach to mobil-
GHz Pentium 4 service proxies, SPA1 and SPA2, in serial between ity handling that makes use of multiple available network
the initial TCP session end points, TD A and TD B. Figure 6 interfaces and terminal devices. Mobility handling in this case
shows the resulting TCP sequence number vs. time plot. means adapting to the changing status of an individual termi-
The two large gaps (a) and (c) in the plot represent the nal by delivering application data flows to the best available
terminal device(s) and utilizing the available service proxies [3] R. Baratto et al., “MobiDesk: Mobile Virtual Desktop Computing,” Proc.
(middleboxes) in the best possible way. This cannot be MobiCom, Philadelphia, PA, Sept. 2004.
[4] I. G. Niemegeers and S. M. Heemstra De Groot, “From Personal Area Net-
achieved using currently available technology. works to Personal Networks: User Oriented Approach,” Wireless Personal
This article addresses the problem by creating and exploit- Commun., vol. 22, no. 2, 2002, pp 175–86.
ing PNs to provide enhanced mobility handling to mobile [5] S. Herborn, A Personal-Network Centric Approach to Mobility Aware Net-
users. This article is focused on the specific problem of decou- working, Ph.D. diss., Univ. New South Wales, Mar. 2007.
[6] S. Ardon et al., “MARCH: A Distributed Content Adaptation Architecture,”
pling application data flows from specific devices by making Int’l. J. Commun. Sys., vol. 16, 2003, pp. 97–115.
use of multiple available network interfaces and terminal and [7] B. Knutsson and H. Lu, “Architecture and Performance of Server Directed
service proxy devices. Transcoding,” ACM Trans. Internet Technology, vol. 3, 2003, pp. 392–424.
We propose mechanisms to switch ongoing communication [8] R. Moskowitz et al., “Host Identity Protocol,” Internet RFC 5201; http://
www.ietf.org/rfc/rfc5201.txt
sessions between terminal devices and to transparently insert [9] D. Johnson, C. Perkins, and J. Arkko, “Mobility Support in IPv6,” IETF RFC
or remove intermediary service proxies, with the mobility 3775; http://www.ietf.org/rfc/rfc3775.txt
management schemes at layers lower than the transport layer. [10] P. Nikander and J. Melen, “A Bound End-to-End Tunnel (BEET) Mode for
The proposed identity delegation approach is based on the ESP,” Internet draft; http://tools.ietf.org/id/draft-nikander-esp-beet-mode-
08.txt
HIP and allows the identity creator to retain full control over [11] InfraHIP project; http://infrahip.hiit.fi/
the use of their identity. The approach enables the movement [12] T. Aura, P. Nikander, and G. Camarillo, “Effects of Mobility and Multihom-
of communication sessions between terminal devices, as well ing on Transport-Protocol Security,” IEEE Symp. Security and Privacy, Berke-
as the transparent insertion and removal of middleboxes, ser- ley, CA, May 2004.
[13] D. Meyer, “The Locator/ID Split, Its Implications for IP Architecture, and a
vice proxies, or other intermediaries able to perform routing Few Current Approaches,” Future of Routing Wksp., APRICOT ’07;
or adaptation. http://www.1-4-5.net/dmm/talks/apricot2007/locid
[14] D. Lee, X. Fu, and D. Hogrefe, “A Review of Mobility Support Paradigms
for the Internet,” IEEE Commun. Surveys & Tutorials, vol. 8, no. 1, 2006.
Future Work
Future work in support for movement of communication ses- Biographies
ARUNA SENEVIRATNE (aruna.seneviratne@nicta.com.au) received his Ph.D. in elec-
sions between terminal devices may include the coupling of trical engineering from the University of Bath, United Kingdom, in 1982. He is
identity delegation with “checkpointing” and transfer of trans- director of the NICTA Australian Technology Park Laboratory. He has held aca-
port, session, and application layer state to allow full applica- demic appointments at the University of Bradford, United Kingdom, Curtin Uni-
tion sessions to be moved between devices. Another problem versity, and the University of New South Wales. He has also held visiting
appointments at the University of Pierre and Marie Curie, Paris, and INRIA,
worthy of investigation for security reasons is how to enable Nice. In addition, he has been a consultant to numerous organizations including
independent verification of whether or not two terminal Telstra, Vodafone, Inmarsat, and Ericsson.
devices belong to the same PN.
STEPHEN HERBORN (stephen.herborn@nicta.com.au) completed his Ph.D. at the
University of New South Wales under the supervision of Professor Aruna Senevi-
References ratne. He works for Accenture consulting. Between 2003 and 2008, he was a
[1] T. Koponen, A. Gurtov, and P. Nikander, “Application Mobility Using the member of the Networking and Pervasive Computing (NPC) program at NICTA
Host Identity Protocol,” Proc. ICT ’05, Madeira, Portugal, May 2005. in Sydney, first as a student and then as a full-time researcher. While at NICTA,
[2] G. Su, MOVE: Mobility with Persistent Network Connections, Ph.D. diss., his research activities centered around personal area networking, mobile net-
Columbia Univ., Oct. 2004. working, and context-aware computing.
Abstract
Currently, many customer devices are being connected to home networks. For this
reason, it is expected that device management capabilities will be a powerful
instrument for the service provider to cope with high maintenance costs, security
concerns, and management issues related to home networks. Through DM, the ser-
vice provider could provide valuable services such as auto-provisioning, remote
configuration, firmware and software updates, diagnostics, monitoring, scheduling,
and fraud management. However, network address translators that are widely
deployed in the home network environment prohibit DM operations from reaching
user devices behind the NAT. In this article, we focus on NAT issues in the man-
agement of home network devices. Specifically, we discuss efforts relating to stan-
dardization and present our proposal to deploy DM services for VoIP and IPTV
devices behind NATs. By slightly changing the behavior of Simple Network Man-
agement Protocol managers and agents and by defining additional management
objects (MOs) to gather NAT binding information, we could solve the NAT traver-
sal problem under a symmetric NAT. Moreover, we propose an enhanced method
to search for the UDP hole binding time of the NAT box. For evaluation, we
applied our method to 22 randomly selected VoIP devices out of 194 NATed hosts
in the real broadband network and achieved a success ratio of 99 percent for
exchanging SNMP request messages and a 26 percent enhancement in determin-
ing the UDP hole binding time.
DM services DM operations
Auto-provisioning Get MOs Notebook
Set MOs PC
Remote diagnostics and control
Service quality management Event notification
Add/replace/copy MOs IPTV
Firmware and software STB
management Exec MO
Status and fault monitoring VoIP
Operation supporting -069
Inventory management P, TR NAT phone
SNM middlebox
Statistics and report WiFi phone
management
Home network
OM
A-D
M
Mobile Note-
phone book
Northbound interface Southbound interface
WiFi PDA
phone
Mobile network
ple Network Management Protocol (SNMP)-based approach Protocol (SOAP)3/HTTP protocol, it enables communication
to control hosts under NATs, which employs a User Data- between a device and a DMS. Typical applications of TR-069
gram Protocol (UDP) hole-punching technique with the cor- are safe auto-configuration and the control of other customer
rect timer estimation method. The remainder of this article is premises equipment (CPE) management functions within the
organized as follows. We provide an overview of DM proto- integrated framework.
cols and standards and then discusses the open issues of the The SNMP [5, 6] is popular in network management
remote management of NATed devices. We describe our pro- because it enables easy monitoring of the status of network-
posal using SNMP as device management, give the results of attached devices through SNMP. A set of standards for net-
our experiment, and also make comparisons with other DM work management and application-layer protocols, a database
methods. Our conclusions and suggestions for future research schema, and a set of data objects are defined in SNMP, with
are presented later. management data specified in the form of variables on the
managed systems, which describe the system configuration
Managing Devices Behind a NAT information. These variables can then be queried and some-
times set by SNMP manager applications.
Overview of Device Management Protocols
There are many device management protocols; the protocols Open Issues of Remote Management of NATed
we discuss here are presented in Table 1. These are stan- Devices
dards-based protocols that are widely accepted around the
world by many service and solution providers for device man- As explained earlier, several protocols have been standardized
agement. to support device management. However, with the advent of
Open Mobile Alliance (OMA) [3] for DM uses extensible many NATs in the home network environment, a NAT
markup language (XML) for data exchange, more specifically becomes an important part to consider. Therefore, we present
the subset defined by Open Mobile Alliance device synchro- open issues in the remote management of NATed devices.
nization (OMA DS). Open Mobile Alliance-device manage- A NAT translates between internal private IP addresses
ment (OMA DM) is designed to support Wireless Session and external public ones. NATs, particularly network address
Protocol (WSP), Wireless Application Protocol (WAP), port translation (NAPT), one of the most common NAT sys-
Hypertext Transfer Protocol (HTTP), or OBject Exchange tems, deal with communication sessions, which are identified
(OBEX) or similar transports as a transport layer protocol. uniquely by the combination of source IP address, source port
The protocol specifies the exchange of packages during a ses- number, destination IP address, and destination port number.
sion, with each package consisting of several messages and When a NATed device in a private network sends packets
each message in turn being composed of one or more com- to the external host, the NAT intercepts the packet and
mands. The server initiates the commands, and the client is replaces the source private IP address and the port number
expected to execute the commands and return the result with with a public IP address and a port number. Subsequently,
a reply message. when the NAT receives an incoming packet from the same
Technical Report 069 (TR-069) [4] is also a device manage- public IP address and port number, it replaces their destina-
ment protocol that is defined by a digital subscriber line tion address and port number with the corresponding entry
(DSL) forum technical specification. This application layer stored in the translation table, forwarding the packet to the
protocol provides the remote management function for end- private network.
user devices. Based on a bidirectional Simple Object Access The first issue in the remote management of a NATed device
is to find an efficient way to facilitate the successful exchange of
remote management request/response messages through the
3SOAP stood for Simple Object Access Protocol, but this acronym was NAT box. A DMS cannot provide management authorities with
dropped in Version 1.2 of the standard because it was considered to be management functions for a device behind a NAT because the
misleading management operations are blocked by the NAT.
On the other hand, a NAT maintains a table that maps way (ALG) for payload address translation, but this ALG
the private addresses and the port numbers to the public has serious limitations, including its scalability and speed of
port numbers and IP addresses. Thus, it is important to deployment of new applications. Moreover, it requires an
note that this “binding” information could be initiated upgrade to existing NATs. A CALLHOME BoF was held at
only by outgoing traffic from the internal host. In addi- the 64th IETF meeting and suggested a connection model
tion, most NATs maintain an idle timer for several outgo- that reversed the client-server role when establishing a con-
ing sessions and close the hole if no traffic is observed for nection. However, its activity ended without a clear result.
the given time period. If we knew the default timer value For these reasons, in this section we focus on the efforts of
of a NAT, we could minimize session management over- the defacto DM standardization body to manage NATed
head. However, there is no way to know the default timer devices. We discuss and compare these with our approach in
value without any information about the NAT itself, such detail in a later section.
as vendor or model. In other words, the issue that we
focus on here is determining the NAT timer value for an Technical Report-111 — TR-111 [9] extends the mechanism
unknown NAT. Therefore, a second important issue is to defined in TR-069 for the remote management of devices
estimate the correct timer value for each NAT box at a and is incorporated in TR-069 ANNEX G. TR-111 enables
minimal cost. Without knowledge of the appropriate timer a management system to access and manage devices con-
values for each NAT, the DMS repeatedly must send nected to a local area network (LAN) through a NAT.
unnecessary probe packets to each NAT to find it in a Two mechanisms were suggested in TR-111. TR-111 Part 1
large-scale network. is defined for the situation in which both the NAT and the
These two issues are not specific to DM but related to all device are TR-069 managed by the same DMS. TR-111
applications under an unknown NAT. To provide DM services Part 2 provides a mechanism to realize a remote connec-
against a NAT, we look into efforts for standardizing NATed tion request to a device behind a NAT, in the event that
device management. the NAT does not support TR-069. It allows a DMS to ini-
tiate a TR-069 session with a device that is operating
Efforts for the Standardization of NATed Device behind a NAT. The simple traversal of UDP through
Management NATs (STUN) protocol mechanism defined in RFC 3489
[10] is included as Part 2 of TR-111, in which a device uses
When it comes to the issue of how to manage NATed STUN to determine whether or not the device is behind a
devices, there exist similar discussions such as RFC 2962 [7] NAT. Then, if the device is behind a NAT with a private
and the CALLHOME Birds of a Feather (BoF) draft [8]. allocated address, the device uses the procedures defined
First, RFC 2962 describes an SNMP application level gate- in STUN to discover the binding timeout. The device
(Address port)
n Figure 2. The sequential message flow of SNMP device management on the NAT environment.
SNMP request message because the SNMP agent sends the • Precondition: The DMC uses port 161 as its listening port
SNMP trap message to the destination port of 162. for receiving SNMP requests from the DMS and as its
By using these concepts, we propose an SNMP-based source port for sending SNMP trap messages. The DMS
remote management method for a device behind a NAT as uses port 162 as its listening port for receiving SNMP trap
follows. First, we define the behavior of the SNMP agent messages sent from the DMC and as its source port for
embedded in the device. An SNMP agent triggers a UDP hole sending SNMP request messages to the DMC.
by periodically sending SNMP trap messages and keep-alive • Step 1: Creating a UDP hole: When the IP address (A) is
messages to the DMS. We chose 180 seconds for the interval assigned to the device by the NAT, the DMC (A:161) sends
of keep-alive messages because it was the most frequently the SNMP trap message to the DMS (C:162), which
found value of NATs in our experiments. We also fixed the includes the device address as a trap object and its value
source port of the SNMP trap message sent from the SNMP (ClientAddress=A), which provides the private IP address of
agent to UDP port 161. Second, we added a function of gath- the device. The NAT translates the IP:port pair (A:161) of
ering the agent IP address and its source port number to the the SNMP trap packet to (B:p), which are the IP address
SNMP manager in the DMS. If the agent IP address is differ- and the port number allocated by the NAT, randomly or
ent from the source IP address, the SNMP manager decides sequentially. In other words, the NAT creates a UDP hole
that the device that has sent the SNMP message is located and a binding entry (A:161, B:p).
behind a NAT. To avoid the symmetric NAT problem, the • Step 2: Binding discovery: The DMS determines that the
SNMP manager must fix the source port to 162 when sending device is located behind a NAT when it knows that the
the SNMP request message. address (A) of the device extracted from the SNMP object
ClientAddress differs from the source address of the SNMP
Proposal of SNMP-Based DM over NAT trap packet. If a device is behind the NAT, the DMS
Figure 2 shows the message flow associated with the proce- extracts the binding information (A:161, B:p) from the
dures of our proposed method to manage a NATed device by SNMP trap message, whereby the IP:port (A:161) of the
using the UDP hole punching scheme. device is extracted from the SNMP message, and the
In Fig. 2 the address/port pairs use the notation (Address:port). IP:port pair (B:p) of the NAT is extracted from the received
There are four steps in our mechanism, as follows: SNMP trap packet.
Linear 1) EV = IV + TE
• Search UDP mapping time with a linearly increasing value of EV
search 2) Wait for EV and send SNMP command
1) EV = (MinVal + MaxVal)/2
Binary 2) Wait until EV and send SNMP command
• Use binary search method
search 3) If success, MinVal = EV and go 1)
If fail, MaxVal = EV and go 1)
• Limited types of NAT vendors are deployed in the real environment 1) Binary search in the Top N list
TopN binary
• Maintain TopN list of UDP hole time 2) Binary search between TopN(i) and
search
• Based on the TopN list, first perform binary search between entries TopN(i+1) until the difference is in TE
n Table 2. Methods for searching for the UDP hole timer values of a NAT (EV: Expect Value, IV: Initial Value, TE: Tolerable Error,
PEV: Previous Expect Value).
• Step 3: Keep punching the UDP hole: To maintain the UDP nation-wide high-speed broadband network, as shown in Fig.
hole bound with entry (A:161, B:p) of the NAT, the DMC 3. Of 1177 manageable VoIP devices, 194 hosts are shown to
keeps sending SNMP trap messages to the DMS (C:162) at be NATed in our network; thus, on average 17 percent of end
hole-timer intervals. hosts are NATed. Based on these hosts, we randomly selected
• Step 4: Sending SNMP request messages: When DMS (C:162) 22 devices and tested our proposed method. The reason why
wants to manage a NATed device, it sends the SNMP we chose only a small number of devices is that we had to
request message to manage the device through the hole carefully test the minimum number of devices so as not to
(B:p). Then, the message can pass through the hole and affect customer service if we sent command messages repeat-
reach the NATed device. The NAT translates (B:p) to edly.
(A:161) according to the binding table. The DMC receives The SNMP manager was implemented based on University
this SNMP message from UDP port 161, and it sends the of California, Davis (UCD) SNMP version 4.2.6, 5 and can
SNMP response message to the DMS (C:162). The process send and receive SNMP messages simultaneously using one
to deliver this SNMP response message with the result to port, UDP 162. Hence, we embedded the SNMP agent into
the DMS is the same as that of the SNMP request message. the device with our proposed method.
Table 3 shows the results of different methods of searching
Heuristic to Estimate the UDP Hole Punching Timer for the UDP hole time. Our heuristic, based on the binary
Values search method, showed the best performance in the experi-
ments. Compared with the popular binary search method, our
In general, UDP mapping timer values are not standardized heuristic could reduce search times by 26 percent, as well as
so they could be different for each NAT vendor. For the reduce the average number of probes by 0.6.
remote management of devices behind a NAT from a public Table 4 shows the command success ratio of our proposed
network, the DMS should make the user device send a UDP method. We could achieve a 99 percent success rate of SNMP-
packet periodically before the UDP hole is closed. In other command penetration into the NATed devices. This result
words, the device should punch the UDP hole periodically at provides compelling evidence that it is possible to manage a
the time interval configured by the DMS. Note that searching device using private IP addresses, without any additional
the UDP mapping time could cause a large amount of over- servers or equipment. It also shows that the top N binary
load to the DMS in a large-scale network because the DMS search heuristic is useful for the efficient management of
must send many probe packets with the estimated timeout val- NATed hosts.
ues for each NAT box. There might be an argument with our NAT traversal success
As such, we propose a heuristic method to maintain the list rate of 99 percent when compared with the well-known result
of the top 10 UDP mapping times statistically obtained of 80 percent in [12]. First, we think that the small number of
through experiments. Then, we applied the binary search algo- experimented devices (22 devices) could be contributing to the
rithm to the list of the top 10 known timer values.
That is, a DMS uses the binary search algorithm to
find the UDP mapping time in the list and then to Average number Average
Method Device Test
search it between two items. Table 2 summarizes of probes time (s)
four kinds of applicable search methods. The experi-
mental results are explained in the next section. Linear search 22 196 25.6 4608
5 The current release version of UCD SNMP is NET-SNMP n Table 3. Average time of searching UDP hole timer values for different
5.4.1; http://net-snmp.sourceforge.net NATs.
DMS located in Ethernet, xDSL, FTTH 194 hosts exist under NATs
KT’s backbone network (KORNET) out of 1,177 VoIP phones
Ethernet
DMS
NAT VoIP phone
FES
DMC
xDSL
FTTH
Modem NAT VoIP phone
OLT
DMC
n Figure 3. Test environment of estimating UDP hole time at the commercial broadband network in Korea.
There are some issues in our system. One is the issue of [8] E. Lear, “Simple Firewall Traversal Mechanisms and Their Pitfalls,” IETF draft,
scalability, in that our system must put up with thousands of Oct. 2005.
[9] DSL Forum, “Technical Report 111, Applying TR-069 to Remote Management
keep-alive SNMP trap messages per minute from thousands of Home Networking Devices,” Dec. 2005.
of devices. As a result, we are now approaching a time-to-live [10] J. Rosenberg et al., “STUN — Simple Traversal of User Datagram Protocol
(TTL)-based scheme to avoid heavy traffic that is not required through Network Address Translators,” IETF RFC 3489, Mar. 2003.
to reach a DMS, in which we send periodic trap messages [11] B. Ford and P. Srisuresh, “Peer-to-Peer Communication across Network
Address Translators,” USENIX Annual Technical Conf., 2005.
with TTL = n (n being the least count of TTL to punch the [12] C. Jennings, “NAT Classification Test Results,” IEEE draft, July 2007.
hole), which will be discussed in the future. Another issue is
the security issue arising from SNMP v2. To address this Biographies
issue, in the near future we are considering changing the DM CHOONGUL PARK (lion@kt.com) received B.S. and M.S. degrees in computer engi-
protocol to one of the standards-based secure DM protocols, neering in 2001 from Pusan National University and in 2008 from Chungnam
such as SNMP v3, OMA-DM, or TR-069. In future work, we National University, Korea, respectively. Currently, he is a Ph.D. student at
are going to estimate and analyze real environment results in Chungnam National University. He joined KT Technology Laboratory in 2002
and started his research work on the Next Generation OSS project. Since 2005
our large scale VoIP and IPTV network that will be widely he has been a member of the KT Device Management project and a senior
deployed this year, reaching over a million devices. researcher in the Department of Next Generation Network Research. His
research interests include device management and traffic engineering in the next-
Acknowledgment generation Internet.
This work was partly supported by the IT R&D program of
KITAE JEONG (kjeong@kt.com) received B.S. and M.S. degrees in 1983 and 1986
MKE/IITA (2008-F-016-01, Collect, Analyze, and Share for in electronic engineering from Kyungpook National University, and a Ph.D. from
Future Internet) and partly by the ITRC (Information Tech- Tohoku University of Japan in 1996. He joined KT Laboratory in 1986, and is
nology Research Center) support program of MKE/IITA the leader of the Department of Next Generation Network Research. His
(IITA-2008-C1090-0801-0016). The corresponding author is research interests are in the fields of device management, next-generation net-
work, and fiber to the home.
Youngseok Lee.
References SUNGIL KIM (sikim@kt.com) received B.S. and M.S. degrees in 1992 and 1994 in
[1] M. Holdrege, “IP Network Address Translator (NAT) Terminology and Con- computer engineering from Choongbuk National University. He joined KT Tech-
siderations,” IETF RFC 2663, Aug. 1999. nology Laboratory in 1994, and is the leader of the KT Device Management pro-
[2] S. Guha and P. Francis, “Characterization and Measurement of TCP Traver- ject and delegate to the Broadband Convergence Network Standardization
sal through NATs and Firewalls,” Proc. Internet Measurement Conf., Berke- Group. His research interests are in the fields of device management and next-
ley, CA, Oct. 2005. generation networks.
[3] OMA, “OMA Device Management V1.2 Approved Enabler,” Feb. 2007.
[4] DSL Forum, “CPE WAN Management Protocol v1.1,” Dec. 2007. YOUNGSEOK LEE [SM] (lee@cnu.ac.kr) received B.S., M.S., and Ph.D. degrees in
[5] J. Case et al., “Introduction and Applicability Statements for Internet Stan- 1995, 1997, and 2002, respectively, all in computer engineering, from Seoul
dard Management Framework,” IETF RFC 3410, Oct. 2000. National University, Korea. He was a visiting scholar at Networks Lab at the Uni-
[6] W. Stallings, SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, 3rd ed., versity of California, Davis from October 2002 to July 2003. In July 2003 he
Addison Wesley, 1998. joined the Department of Computer Engineering, Chungnam National University.
[7] D. Raz, J. Schoenwaelder, and B. Sugla, “An SNMP Application Level Gate- His research interests include Internet traffic measurement and analysis, traffic engi-
way for Payload Address Translation,” IETF RFC 2962, Oct. 2000. neering in next-generation Internet, wireless mesh networks, and wireless LAN.
Abstract
Multihomed subscribers are increasingly adopting intelligent route control solutions
to optimize the cost and end-to-end performance of the traffic routed among the
different links connecting their networks to the Internet. Until recently, IRC practices
were not considered adverse, but new studies show that in a competitive environ-
ment, they can lead to persistent traffic oscillations, causing significant performance
degradation rather than improvements. To cope with this, randomized IRC tech-
niques were proposed. However, the proliferation of IRC products raises concerns,
given that randomization becomes less effective as the number of interfering IRC
systems increases. In this article, we present a more scalable route control strategy
that can better support the foreseeable spread of IRC solutions. We show that by
blending randomization with adaptive filtering techniques, it is possible to drasti-
cally reduce the interference between competing route controllers, and this can be
achieved without penalizing the end-to-end traffic performance. In addition to the
potential improvements in terms of scalability and performance, the route control
strategy outlined here has various practical advantages. For instance, it does not
require any kind of protocol or coordination between the competing IRC middle-
boxes, and it can be adopted readily today because the only requirement is a soft-
ware upgrade of the available route controllers.
work conditions. using a configuration like the one shown at the top of Fig. 1 with several
• We show that a simple enhancement to randomized IRC border routers.
choose the best egress link for each target flow, depending on evaluations of multihoming in combination with IRC tools, as
the outcome of these measurements. More specifically, the in [1, 8, 11]. These research publications, along with the docu-
RCM is capable of taking rapid routing decisions for the tar- mentation provided by vendors, allowed us to capture and
get flows, often avoiding the effects of issues such as distant model the key features of conventional IRC techniques. A
link/node failures2 or performance degradation due to conges- similar approach was followed by the authors in [5]. For sim-
tion.3 plicity, and as in [5, 8, 10], we consider traffic performance as
The third module of an IRC system, namely the RVM, typ- the only criteria to be optimized for the target flows.4
ically supports a broad set of reporting options and provides
online information about the average latency, jitter, band-
width utilization, and packet loss experienced through the dif-
The General IRC Network Model
ferent providers, summaries of traffic usage, associated costs The general IRC network model is composed of a multi-
for each provider, and so on. homed stub network S, a route controller C , the transit
Overall, IRC offers an incremental approach, complement- domains, and a set of target destinations {d} with cardinality
ing some of the key deficiencies of the Interior Gateway Pro- |d| = D to be optimized by C . The source domain S has a set
tocol/Border Gateway Protocol (IGP/BGP)-based route of egress links {e}, with |e| = E. For the sake of simplicity,
control model. It is worth emphasizing that the set of candi- we keep the notation in the granularity of destinations (d),
date routes to be probed by IRC boxes usually is determined but the model easily can be extended to consider various flows
by IGP/BGP; so conversely to overlay networks [8], IRC boxes per target d.
never circumvent IGP/BGP routing protocols. The effective- To dynamically decide the best egress link for each target
ness of multihoming in combination with IRC is confirmed destination d, the MMM in C probes all the candidate paths
not only by studies like [8], but also by the increased trend in through the egress links e of S. Then, the collected measure-
the deployment of these solutions. ments are processed and abstracted into a performance func-
In this article we deal with the algorithmic aspects of tion Pe(d,t) at time t, associated with the quality perceived for
IRC systems so hereafter we focus our attention on the each of the available paths toward the target destinations d.
RCM in Fig. 1 — the functionality of the MMM and RVM Let N (d) denote the number of available paths to reach d.
modules essentially is orthogonal to the proposals made in Because N (d) usually represents the number of candidate
this work. paths in the forwarding information base (FIB) of the BGP
border routers of S, N(d) ≤ E ∀ d.
We assume that the better the end-to-end traffic perfor-
Related Work mance perceived by C for a target destination d through
In [9], the authors simultaneously optimize the cost and per- egress link e, the lower the value of the performance function
formance for multihomed stub networks, by introducing a Pe(d,t).
series of new IRC algorithms. The contributions of that work In this framework, IRC strategies can be taxonomized into
are fundamentally theoretical. For instance, the authors show two categories, namely, reactive route control (RRC) and
that an intelligent route controller can improve its own per- proactive route control (PRC). RRC practices switch a target
formance without adversely affecting other controllers in a flow from one egress link to another only when a maximum
competitive environment, but the conclusions are drawn at tolerable threshold (MTT) is met. The MTTs are application-
traffic equilibria (traffic equilibrium is defined by the authors specific and typically represent the maximum acceptable pack-
as a state in which no traffic can improve its latency by unilat- et loss, the maximum tolerated packet delay, and so on, for a
erally changing its link assignment). However, after examining given application. Beyond any of these bounds, the perfor-
and modeling the key features of conventional IRC systems, it mance perceived by the users of the application becomes
becomes clear that they do not seek this type of traffic equi- unacceptable.
libria. Indeed, more recent studies, such as [5], show that in PRC strategies, on the other hand, switch traffic before any
practice, the performance penalties can be large, especially of the MTTs are met and in turn, can be taxonomized into
when the network utilization increases. two categories: those that can be called fully proactive (FP),
In light of this, and considering the current deployment and those that follow a controlled proactivity (CP) approach.
trend of IRC solutions, it becomes necessary to explore alter- FP IRC practices always switch to the best path. Therefore,
native IRC strategies. These new route control strategies the dynamic optimization problem addressed by a FP route
should always improve the performance and reliability of the controller is to:
target flows, or at least, they should drastically reduce the
potential implications associated with frequent traffic reloca- Find the min{Pe(d,t)} ∀ d, t and enforce the redirection of the
tions, such as persistent oscillations causing packet losses and corresponding traffic to the egress link found.
increased packet delays [5].
Although most commercially available IRC solutions do not The alternative offered by CP is to keep the proactivity, but
reveal in depth the technical details of their internal operation switch traffic as soon as the performance becomes degraded
and route control decisions, the behavior of one particular to some extent, typically represented by a relocation threshold
controller is described in detail in [10]. That work also pro- (R th). The dynamic optimization problem addressed by CP-
vides measurements that evaluate the effectiveness of differ- based strategies can be formulated as follows.
ent design decisions and load balancing algorithms. Akella et
al. also provided rather detailed descriptions and experimental Let e best denote the egress link utilized to reach d at time t,
(d,t)
and let e′ be such that Pe′ = min{Pe(d,t)} for destination d at
time t. 5 A CP-based route controller would switch traffic to d
(d,t)
2 The timescale required by IRC systems to detect and react to a distant (d,t)
from ebest to e′ whenever Pebest – Pe′ ≥ Rth, with Rth > 0.
link/node failure is very small compared to that of the general IGP/BGP
routing system [2–4, 6].
4Cost reductions are typically accomplished by aggregating traffic toward
3 This cannot be automatically detected and avoided with BGP [7]. non-target destinations over the cheapest ISPs.
RTTe(d,t) Me(d,t)
First Second RCM
MMM filter filter
Medians
(ms)
RTTe(d,t) Me(d,t)
MTT
Qe(d,t)
Qe(d,t)
Randomized Compute ∆(d,t)
0
SRC Sampling instants (s)
algorithm Pe(d,t)
n Figure 2. Filtering process and interaction between the monitoring and measurement module (MMM) and the route control module
(RCM) of a sociable route controller. The Randomized SRC Algorithm within the RCM is outlined in Algorithm 1.
After extensive evaluations and analysis, we confirmed that rently are experiencing in the network. These medians are
PRC performs much better than RRC. The reason for this is precisely the input to the second filter, where the social
that proactive approaches can anticipate network congestion nature of the route control algorithm covers two different
situations, which in the reactive case, typically demands sever- facets:
al traffic relocations when congestion already was reached. In • CP
addition, we found that in a competitive environment, CP- • SRC
based route control strategies can outperform the FP ones.
Therefore, our SRC algorithm (outlined in the following sec- Controlled Proactivity
tion) is supported by a CP-based route control strategy. On the one hand, the proactivity of box C is controlled to
avoid minor changes in the medians triggering traffic reloca-
tions at S. This prevents interfering too often with other route
Sociable Route Control controllers. For this reason, our sociable controllers filter the
In the SRC strategy that we conceive, each controller remains medians.
independent so the SRC boxes do not require any kind of The second filter in Fig. 2 works like an analog-to-digital
coordination with one another — just as conventional IRC (A/D) converter, with quantization step ∆, and its output is
systems operate today. Moreover, our SRC strategy does not one of the levels of the converter Qe(d,t). The right-hand side
introduce changes in the way measurements are conducted of Fig. 2 illustrates how the instantaneous samples of RTT are
and reported by conventional IRC systems, so both the MMM filtered to obtain the median Me(d,t), and then, the latter is fil-
and the RVM in Fig. 1 remain unmodified. Our SRC strategy tered to obtain Qe(d,t).
introduces changes only on the algorithmic aspects of the As described earlier, IRC systems compare the quality of
RCM. the active and alternative paths by means of a performance
function Pe(d,t), which as shown in Fig. 2, is fed by Qe(d,t). The
High-Level Description of the SRC Strategy controller C would switch traffic toward d only when the vari-
(d,t)
For simplicity in the exposition, we focus on the optimization ations of Qe(d,t) cause that Pebest = Pe(d,t) ≥ Rth. A more detailed
of a single application, namely, voice over IP (VoIP), and we description of the route selection process is shown in Algo-
describe the overall SRC process for the round-trip time rithm 1. For simplicity, only the stationary operation of the
(RTT) performance metric. For a comprehensive and formal algorithm is summarized. The randomized nature of Algo-
analysis, the reader is referred to [12]. rithm 1 is discussed later. The timer in Step 8 is also intro-
Our goal is that a controller C becomes capable of adap- duced later.
tively adjusting its proactivity, depending on the RTT condi- For the RCM described here, we simply used the outcome
tions for each target destination d. To be precise, a sociable of the digital conversion as the performance function Pe(d,t),
controller analyzes the evolution of the RTT, that is, that is, the number of quantization steps in the quantification
{RTTe(d,t)}, and depending on its dynamics, the controller can level Q e(d,t) . Similarly, R th represents the difference in the
restrain its traffic reassignments adaptively (i.e., its proactivi- number of quantization steps that Pe(d,t) must reach to trigger
ty). To this end, the RCM processes the RTT samples gath- a path switch.
ered from the MMM using two filters in cascade (Fig. 2). The Overall, the advantage of this filtering technique is that it
first filter corresponds to the median RTT, M e(d,t), which is produces the desired effect (i.e., controlled proactivity)
constantly computed through a sliding window. This approach because it prevents minor changes in the medians from trig-
is used widely in practice because the median represents a gering unnecessary traffic relocations at S.
good estimator of the delay that the users’ applications cur-
Socialized Route Control
The second facet of the social behavior of the algorithm
5 We notice that with CP, ebest might be different from e′. relates to the dynamics of the median RTTs; more precisely,
12000 12000
SRC SRC SRC
4000 Randomized IRC Randomized IRC Randomized IRC
10000 10000
Path switches
Path switches
6000 6000
2000
4000 4000
1000
2000 2000
0 0 0
2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68
1 10 100 1 10 100 1 10 100
Rth Rth Rth
RTTs (ms)
RTTs (ms)
80 80
80 60 60
60 40 40
SRC
20 20 Randomized IRC
40 IGP/BGP routing
0 0
2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68
1 10 100 1 10 100 1 10 100
Rth Rth Rth
n Figure 3. Number of path switches (top) and <RTTs> (bottom) for L = 0.450 (left), L = 0.675 (center), and L = 0.900 (right).
Simulation Scenarios — We run the same simulations sepa- are assigned with three outgoing flows (including those in the
rately using three different scenarios: multihomed stub domains and those in the ISPs). All back-
• Default IGP/BGP routing, where BGP routers choose their ground connections were active during the simulation run
best routes based on the shortest AS-path time.
• BGP combined with the SRC strategy at the 12 source Furthermore, the frequency and size of the probes sent by
domains the route controllers were correlated with the outbound traffic
• BGP combined with randomized IRC systems at the 12 being controlled, just as conventional route controllers do
source domains today [2–4].
For a more comprehensive comparison between the differ- Finally, we assume that the route controllers have pre-
ent route control strategies, we performed the simulations for established performance bounds (i.e., the MTTs) for the traf-
three different network loads. We considered the following fic under control. For instance, the recommendation G.114 of
load factors (L): the International Telecommunication Union-Telecommunica-
• L = 0.450, low load corresponding to an average occupancy tion Standardization Sector-(ITU-T) suggests a one-way-delay
of 45 percent of the egress links capacity (OWD) bound of 150 milliseconds to maintain a high quality
• L = 0.675, medium load corresponding to an average occu- VoIP communication over the Internet. Thus, for VoIP traf-
pancy of 67.5 percent of the egress links capacity fic, the maximum RTT tolerated was chosen as twice this
• L = 0.900, high load corresponding to an average occupan- OWD bound, that is, 300 ms.
cy of 90 percent of the egress links capacity
Objectives of the Performance Evaluation
Simulation Conditions — The simulation tests were conducted Our evaluations have two main objectives.
using traffic aggregates sent from the source domains to each
target destination d. These traffic aggregates were composed Assess the Number of Path Switches — The first objective of
of a variable number of multiplexed Pareto flows as a way to the simulation study is to demonstrate that the sociable nature
generate the traffic demands, as well as to control the network of our SRC strategy contributes to drastically reducing the
load during the tests. The flow arrivals were modeled accord- potential interference between competing route controllers.
ing to a Poisson process and were independently and uniform- To this end, we compared the number of path switches that
ly distributed during the simulation run time. This approach occurred during the simulation run time for the 300 compet-
aims at generating sufficient traffic variability to support the ing IRC flows for the SRC and randomized IRC scenarios.
assessment of the different route control strategies. The number of path switches is obtained by adding the num-
In addition, we used the following method to generate traf- ber of route changes that are required to meet the desired
fic demands for the remaining Internet traffic, usually referred RTT bound for each target destination d.
to as background traffic. We started by randomly picking four It is worth emphasizing that in both the randomized IRC
nodes in the network. The first one chosen acts as the origin and SRC strategies, the route controllers operate indepen-
(O) node, and the remaining three nodes act as destinations dently and compete for the same network resources. This
(D) of the background traffic. We assigned one Pareto flow allows us to evaluate the overall impact on the traffic caused
for each O-D pair. This process continues until all the nodes by the interference between several standalone route con-
1 1 1
0.9 SRC SRC SRC
Randomized IRC 0.9 Randomized IRC 0.9 Randomized IRC
0.8 IGP/BGP routing 0.8 IGP/BGP routing 0.8 IGP/BGP routing
0.7 0.7 0.7
P(RTT>=x)
P(RTT>=x)
P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)
n Figure 4. Complementary cumulative distribution function (CCDF) of the RTTs for the 300 competing IRC flows, for Rth = 1, and
for L = 0.450 (left), L = 0.675 (center), and L = 0.900 (right).
trollers running at different stub domains. Thus, when analyz- The second observation is that the reductions in the num-
ing the results for the different route control strategies, it is ber of path switches offered by the SRC strategy become
important to keep in mind that we take into account all the more and more evident as the proactivity of the controllers
competing route controllers present in the network. increases, that is, for low values of Rth, which is precisely the
To contrast the number of path switches under fair condi- region where IRC solutions operate today. It is worth recall-
tions, we made the following decisions. First, both the ran- ing that these results were obtained when both route control
domized IRC and SRC controllers are endowed with the strategies were complemented by the same randomized deci-
same (explicit) randomization technique [5, 13]. This approach sions. This confirms that in a competitive environment, SRC
avoids the appearance of persistent oscillations that might is much more effective than pure randomization in reducing
lead to a large number of path switches in the case of conven- the potential interference between route controllers.
tional IRC [5]. Second, both types of controllers follow a con- On the other hand, our results show that when the route
trolled proactivity approach. We have conducted the control strategies become less proactive, that is, for higher val-
simulations modeling the same triggering condition R th for ues of Rth, randomized IRC and SRC tend to behave compar-
both of them. The main difference is that in the SRC case, atively the same so SRC does not introduce any benefit over a
the social adaptability of the controllers can result in the trig- randomized IRC technique.
ger being reached more often, or less often, depending on the To assess the effectiveness of SRC, it is mandatory to con-
variability of the RTTs on the network. firm that the reductions obtained in the number of path
switches are not excessive, resulting in a negative impact on
End-to-End Traffic Performance — The second objective of the the end-to-end traffic performance. To this end, we first ana-
simulation study is to demonstrate that the drastic reduction lyze the performance of randomized IRC and our SRC “glob-
in the number of path switches obtained with our SRC strate- ally,” that is, by averaging the RTTs obtained by “all”
gy can be achieved without penalizing the end-to-end traffic competing route controllers. This is shown at the bottom of
performance. To this end, we compared the RTTs obtained Fig. 3 and in Fig. 4. The end-to-end performance obtained by
for the 300 flows in the three different scenarios, namely, “each” route controller individually, is shown in Fig. 5.
default IGP/BGP, SRC, and randomized IRC. The bottom of Fig. 3 reveals that as expected, both SRC
and randomized IRC perform much better than IGP/BGP for
Main Results all values of L and Rth, and the improvements in the achieved
The top of Fig. 3 illustrates the total number of path switches performance become more evident as the network utilization
performed by both the randomized IRC and SRC strategies, increases. In particular, SRC is capable of improving the
in all the stub networks, and for the three different load fac- 〈RTTs〉7 by more than 40 percent for L = 0.675 and by more
tors: L = 0.450 (left), L = 0.675 (center), and L = 0.900 than 35 percent for L = 0.900 when compared with IGP/BGP.
(right). The number of path switches is contrasted for differ- Moreover, the 〈RTTs〉 obtained by SRC and IRC are compar-
ent triggering conditions, that is, for different values of the atively the same and particularly for L = 0.675, SRC not only
threshold Rth (shown on a logarithmic scale). drastically reduces the number of path switches, but also
Several conclusions can be drawn from the results shown in improves the end-to-end performance for almost all the trigger-
Fig. 3. In the first place, the results confirm that SRC drasti- ing conditions assessed. It is worth emphasizing that a low value
cally reduces the number of path switches compared to a ran- of Rth together with a load factor of L = 0.675 reasonably reflect
domized IRC technique. 6 An important result is that the the conditions in which IRC currently operates in the Internet.
reductions are significant for all the load factors assessed. For Our results also reveal an important aspect: by allowing
instance, when compared with randomized IRC, our SRC more path switches, some route controllers can improve
strategy contributes to reductions of up to: slightly their end-to-end performance, but such actions have
• 77 percent for Rth = 1 and 71 percent for Rth = 2 when L = no major effect on the overall 〈RTTs〉. Indeed, a certain num-
0.450 ber of path switches is always required, and this number of
• 75 percent for Rth = 1 and 74 percent for Rth = 2 when L = path switches is what actually ensures the average perfor-
0.675 mance observed in the RTTs at the bottom of the Fig. 3 (this
• 34 percent for Rth = 1 and 36 percent for Rth = 2 when L = becomes clear as the proactivity decreases).
0.900 By analyzing Fig. 3 as a whole, it becomes evident that the
6Clearly, no results are shown for the default IGP/BGP routing scenario 7As mentioned previously, this average is computed over the RTTs
here because BGP does not perform path switching actively. obtained by all competing route controllers in the network.
1 1 1
0.9 IGP/BGP routing 0.9 IGP/BGP routing 0.9 IGP/BGP routing
0.8 0.8 0.8
0.7 0.7 0.7
P(RTT>=x)
P(RTT>=x)
P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)
1 1 1
0.9 SRC 0.9 SRC 0.9 SRC
0.8 0.8 0.8
0.7 0.7 0.7
P(RTT>=x)
P(RTT>=x)
P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)
1 1 1
0.9 Randomized IRC 0.9 Randomized IRC 0.9 Randomized IRC
0.8 0.8 0.8
0.7 0.7 0.7
P(RTT>=x)
P(RTT>=x)
P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)
n Figure 5. CCDFs for IGP/BGP routing (top), SRC (center), and randomized IRC (bottom), for L = 0.450 (left), L = 0.675 (center),
and L = 0.900 (right).
selection of the best triggering condition actually depends on picture than Fig. 4 because it shows the CCDFs of the RTTs
the load present in the network. For this particular case, the obtained by each of the 12 competing route controllers. The
best trade-offs are Rth = 30 for L = 0.450, Rth = 10 for L = figure shows the results for the three studied scenarios and for
0.675, and Rth = 7 for L = 0.900, which is a reasonable pro- all the load factors assessed when Rth = 1. Our results show
gression to lower values of Rth because the route controllers that the targeted bound of 300 ms is satisfied by both SRC
require less proactivity when the network utilization is low. and randomized IRC in all cases and for all controllers.
The corollary is that the triggering condition should be adap- IGP/BGP, however, shows a distribution of large delays given
tively adjusted as well, depending on the amount of traffic that the shortest AS-paths are not necessarily the best per-
carried through the egress links of the domain. We plan to forming paths. Figure 5 also shows that when considering
investigate this in the future. boxes individually, randomized IRC achieves slightly better
Figure 4 compares the distribution of the RTTs obtained by end-to-end performance for some of them but at the price of
IGP/BGP, SRC, and randomized IRC for the 300 competing a much larger number of path switches:
IRC flows, for the three different load factors assessed, and • ≈ ≈ 435 percent larger for L = 0.450
for Rth = 1, which as mentioned above is in the range of oper- • ≈ ≈ 400 percent larger for L = 0.675
ation of the IRC solutions presently deployed in the Internet. • ≈ ≈ 80 percent larger for L = 0.900 when Rth = 1.
To facilitate the interpretation of the results, we use the com-
plementary cumulative distribution function (CCDF).
An important observation is that under high egress link uti-
Conclusion
lization, that is, L = 0.900, there is a fraction of 〈RTTs〉 for In this article, we examined the strengths and weaknesses of
which the bound of 300 ms is exceeded in the case of randomized IRC techniques in a competitive environment.
IGP/BGP; whereas both SRC and the randomized IRC fulfill We proposed a way to blend randomization with a sociable
the targeted bound. route control (SRC) strategy, where by sociable, we mean a
To complete the analysis, Fig. 5 provides a more granular route control strategy that explicitly considers the potential