You are on page 1of 63

September/October 2008, Vol. 22, No.

5
®

THE MAGAZINE OF GLOBAL INTERNETWORKING


www.comsoc.org

Implications and Control of


Middleboxes in the Internet

®
A Publication of the IEEE Communications Society
in cooperation with the
IEEE Computer Society and the
Internet Society
®
LYT-TOC-SEPT 9/5/08 1:09 PM Page 1

THE MAGAZINE OF GLOBAL INTERNETWORKING


SEPTEMBER/OCTOBER 2008, VOL. 22, NO. 5

Special Issue
Implications and Control of Middleboxes in the Internet
Guest Editors: Xiaoming Fu, Martin Stiemerling, and Henning Schulzrinne

8 A Retrospective View of Network


Address Translation
33 Distributed Connectivity Service for a
SIP Infrastructure
Today, network address translators, or NATs, are every- The authors present a distributed connectivity service
where. Their ubiquitous adoption was not promoted by solution that integrates relay functionality directly in
design or planning but by the continued growth of the user nodes.
Internet. Luigi Ciminiera, Guido Marchetto, Fulvio Risso,
Lixia Zhang and Livio Torrero

14 Behavior and Classification of NAT


Devices and Implications for NAT
41 Dial “M” for Middlebox Managed Mobility
Users can be served by multiple network-enabled terminal
devices, each of which in turn can have multiple network
Traversal interfaces. This multihoming at both the user and device
For a long time, traditional client-server communication level presents new opportunities for mobility handling.
was the predominant communication paradigm of the
Stephen Herborn and Aruna Seneviratne
Internet. Network address translation devices emerged
to help with the limited availability of IP addresses and
were designed with the hypothesis of asymmetric con-
nection establishment in mind. But with the growing
48 NAT Issues in the Remote Management
of Home Network Devices
success of peer-to-peer applications, this assumption is The authors focus on NAT issues in the management of
no longer true. home network devices. Specifically, they discuss efforts
Andreas Müller, Georg Carle, and Andreas Klenk relating to standardization.
Choongul Park, Kitae Jeong, Sungil Kim, and Youngseok Lee
20 Modeling Middleboxes
The authors present a simple middlebox model that suc- 56 Improving the Performance of Route
cinctly describes how different middleboxes process Control Middleboxes in a Competitive
packets and illustrate it by representing four common
middleboxes.
Environment
The authors show that by blending randomization with
Dilip Joseph and Ion Stoica
adaptive filtering techniques, it is possible to drastically
reduce the interference between competing route con-
26 Network Address Translation for the
Stream Control Transmission Protocol
trollers, and this can be achieved without penalizing the
end-to-end traffic performance.
The authors discuss the deficiencies of using existing Marcelo Yannuzzi, Xavi Masip-Bruin, Eva Marin-Tordera,
NAT methods for SCTP and describes a new SCTP-specif- Jordi Domingo-Pascual, Alexandre Fonte,
ic NAT concept. This concept is analyzed in detail for and Edmundo Monteiro
several important network scenarios, including peer-to-
peer, transport layer mobility, and multihoming. Editor’s Note 2
Michael Tüxen, Irene Rüngeler, Randall Stewart,
and Erwin P. Rathgeb New Books & Multimedia 4
Guest Editorial 6
IEEE NETWORK ISSN 0890-8044 is published bimonthly by the Institute of Electrical and Electronics Engineers, Inc. Headquarters address: IEEE, 3 Park Avenue, 17th Floor, New York, NY 10016-
5997, USA; tel: +1-212-705-8900; e-mail: ieee.network@ieee.org. Responsibility for the contents rests upon authors of signed articles and not the IEEE or its members. Unless otherwise specified,
the IEEE neither endorses nor sanctions any positions or actions espoused in IEEE Network.
ANNUAL SUBSCRIPTION: $40 in addition to IEEE Communications Society or any other IEEE Society member dues. Non-member prices: $250. Single copy price $50.
EDITORIAL CORRESPONDENCE: Address to: Chatschik Bisdikian, Editor-in-Chief, IEEE Network, IEEE Communications Society, 3 Park Avenue, 17th Floor, New York, NY 10016-5997, USA; e-
mail:bisdik@us.ibm.com
COPYRIGHT AND REPRINT PERMISSIONS: Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright law for private use of patrons:
those articles that carry a code on the bottom of the first page provided the per copy fee indicated in the code is paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA
01923, USA. For other copying, reprint, or republication permission, write to Director, Publishing Services, at IEEE Headquarters. All rights reserved. Copyright ©2008 by the Institute of Electrical
and Electronics Engineers, Inc.
POSTMASTER: Send address changes to IEEE Network, IEEE, 445 Hoes Lane, Piscataway, NJ 08855-1331, USA. Printed in USA. Periodical-class postage paid at New York, NY and at additional
mailing offices. Bulk rate postage paid at Easton, PA permit #7. Canadian GST Reg# 40030962. Return undeliverable Canadian addresses to: Frontier, P.O. Box 1051, 1031 Helena Street, Fort Eire,
ON L2A 6C7.
SUBSCRIPTIONS, orders, address changes should be sent to IEEE Service Center, 445 Hoes Lane, Piscataway, NJ 08855-1331, USA. Tel. +1-732-981-0060.
ADVERTISING: Advertising is accepted at the discretion of the publisher. Address correspondence to IEEE Network, 3 Park Avenue, 17th Floor, New York, NY 10016-5997, USA.

IEEE Network • September/October 2008 1


LYT-EDIT_NOTE-SEPTEMBER 9/5/08 1:06 PM Page 2

EDITOR’S NOTE
®

NATs and Frozen Veggies


THE MAGAZINE OF GLOBAL INTERNETWORKING
Director of Magazines
Thomas F. La Porta, Penn. State Univ., USA
Editor-in-Chief
Ioanis Nikolaidis, U. of Alberta, Canada
Associate Editor-in-Chief
Chatschik Bisdikian, IBM Research, USA
Senior Technical Editors
Thomas M. Chen, Swansea U., UK
Yi-Bing (Jason) Lin, National Chiao Tung Univ., Taiwan
Peter O’Reilly, Northeastern Univ., USA
Technical Editors
Kevin Almeroth, UCSB, USA
N. Asokan, Nokia Res. Ctr., Finland
Olivier Bonaventure, U. Catholique de Louvain, Ioanis Nikolaidis
Belgium
Adrian Conway, Verizon, USA
Jon Crowcroft, U. of Cambridge, UK
Christos Douligeris, U. of Piraeus, Greece
Paolo Giacomazzi, Politecnico di Milano, Italy
David Greaves, U. of Cambridge, UK
Nikhil Jain, Qualcomm, USA
Admela Jukan, T. U. Braunschweig, Germany
Tim King, BTexact Tech., UK

D
Frank Magee, Consultant, USA
Ioanis Nikolaidis, U. of Alberta, Canada
Georgios I. Papadimitriou, Aristotle Univ., Greece
Mohammad Peyravian, IBM Corporation, USA ear readers, welcome to the September 2008 issue of IEEE Network.
Kazem Sohraby, U. of Arkansas, USA
James Sterbenz, Univ. of Kansas, USA The sound of trucks, the heavy duty disposal bins on the curb,
Joe Touch, USC/ISI, USA and the thud and bang of construction are all elements of a “quiet” sum-
Vittorio Trecordi, CEFRIEL, Italy
Guoliang Xue, Arizona State Univ., USA mer, full of renovations, in my neighborhood. Resisting this Siren’s call is
Raj Yavatkar, Intel, USA difficult even if I swore off any renovations for the rest of my life, given
Bulent Yener, Rensselaer Polytechnic Institute, USA
past experience. I naively thought that this time it wouldn’t be that bad.
Feature Editors After all, this time it looks like a much smaller job than last time. Of
Olivier Bonaventure, "Software Tools for Networking" course, I neglected a key conservation law: if a job is small, the additional
U. Catholique de Louvain, Belgium
Olivier Bonaventure, "New Books & Multimedia" delays for various reasons will expand it to be roughly equal the total time
U. Catholique de Louvain, Belgium of a “big” job (more professionally managed one might argue, and hence
IEEE Production Staff with much less slack). What I was not prepared to experience is the shift in
Joseph Milizzo, Assistant Publisher attitudes caused by the widespread adoption of many “information” appli-
Eric Levine, Associate Publisher ances in today’s household.
Susan Lange, Digital Production Manager
Catherine Kemelmacher, Associate Editor I should have spotted the shift when my contractor warned that he would
Jennifer Porcello, Publications Coordinator
Devika Mittra, Publications Assistant need to turn off the power to our house, only to qualify it with “If that’s
okay with your gear, right?” noticing that there were maybe a tad too many
2008 IEEE Communications Society Officers devices, computers, firewalls, servers, and bridges spread around the house.
Doug Zuckerman, President
Andrzej Jajszczyk, VP–Technical Activities He was concerned that some might develop bad hiccups after the switch
Mark Karol, VP–Conferences was turned off and on again. He had experienced himself some “unhealthy”
Byeong Gi Lee, VP–Member Relations
Sergio Benedetto, VP–Publications side effects to his equipment under similar circumstances, so his concern
Nim Cheung, Past President was genuine. I thought for a moment of explaining the benefits of stateless-
Stan Moyer, Treasurer
John M. Howell, Secretary ness and how, I would hope, most of my gear could survive power being cut
and restored later (no I don’t have a UPS — I believe in luck). I decided
Board of Governors
The officers above plus Members-at-Large: not to expand on the topic, just agreeing that it was okay to cut the power
Class of 2008 to the house.
Thomas M. Chen, Andrea Goldsmith
Khaled Ben-Letaief, Peter J. McLane Things indeed went as planned, although it should have struck me as odd
Class of 2009 that he did not ask about other things that might be influenced by cutting the
Thomas LaPorta, Theodore Rappaport
Catherine Rosenberg, Gordon Stuber power. A few days later, while I was at work, the contractor stumbled on a
Class of 2010 dilemma. He had to run a industrial strength vacuum cleaner to pick up lots
Fred Bauer, Victor Frost
Stefano Galli, Lajos Hanzo
of debris. Having pulled down walls and removed several wall outlets left him
with no choice but to run an extension cord to the nearest outlet he could
2008 IEEE Officers
Lewis M. Terman, President find still standing. It happened that this was an outlet already fully populated
John R. Vig, President-Elect by two cords, one connecting a refrigerator we keep in the basement, and one
Barry L. Shoop, Secretary
David G. Green, Treasurer connecting a NAT/firewall box, a nearby server, and a cable modem. Without
Leah H. Jamieson, Past President any hesitation, he removed the one least likely to create a hassle: the refrig-
Jeffry W. Raynes, Executive Director
Curtis A. Siller, Jr., Director, Division III erator!
In comparison to a NAT box, a refrigerator is low tech and almost stateless
— if not its volatile contents. His choice was reasonable. He was not expect-
ing to keep it for more than an hour this way. But human nature conspired.
The contractor forgot to plug in the refrigerator when he was done. The
® packets were running smoothly while our frozen veggies were thawing. To

2 IEEE Network • September/October 2008


LYT-EDIT_NOTE-SEPTEMBER 9/5/08 1:06 PM Page 3

EDITOR’S NOTE

make matters worse, and blame my own human nature review of where we are in middlebox evolution and
here, I did not notice the “failure” until late in the how they might further evolve. I would like to thank
evening (okay, so I do keep some beer there too). Had it the guest editors, Xiaoming Fu, Martin Stiemerling,
been the firewall malfunctioning I would have spotted it and Henning Schulzrinne, as well as the liaison editor
in minutes. I spent a good part of the evening deciding of this issue, Jon Crowcroft, for their excellent work in
what had to be thrown away and what to keep (luckily putting this issue together. I would also to welcome a
this was not a warm day) and laughing at our priorities: new member to our editorial board: Dr. Admela Jukan.
mine and the contractor’s. Dr. Jukan received her Ph.D. degree from Vienna Uni-
The fact that everyday people think of consumer-grade versity of Technology in Austria, and is currently a W3
networking and information appliances as possibly the Professor of Electrical and Computer Engineering at
most sensitive objects in a house reflects what they have the Technical University Carolo-Wilhelmina of
learned from their own experience in the recent past. Brunswick (Braunschweig), Germany. Dr. Jukan served
After all, a lost file can be a major blow, while a pound between 2002 and 2004 as Program Director in Com-
of rotten spinach is, well, compost. A handful of remark- puter and Networks System Research at the National
able technologies made it into these everyday devices, Science Foundation (NSF), responsible for funding and
and one that is still a topic of research, extension, and coordinating US-wide university research and educa-
overall controversy is Network Address Translation tion activities in the area of network technologies and
(NAT). NAT is no longer just a way to establish a home systems.
user’s little kingdom of an Internet-connected private As always, your feedback regarding the direction and
network (while guilt-free of hoarding IP addresses). NAT substance of the magazine is invaluable and always
boxes are increasingly active participants as the ‘’middle- appreciated. Please contact me, by e-mail, at
point’’ of communication paths and this has led to the yannis@cs.ualberta.ca, to let me know what you think
use of a new term, “middlebox,” to describe the particu- about the editorial comments, what type of content
lar class of technologies. might be more interesting to you, and in what ways the
This special issue, entitled “Implications and Control magazine’s distinct character could be improved or fur-
of Middleboxes in the Internet,” provides a timely ther publicized.

IEEE Network • September/October 2008 3


LYT-NEWBOOKS-SEPTEMBER 9/5/08 1:06 PM Page 4

NEW BOOKS AND MULTIMEDIA/EDITED BY OLIVIER BONAVENTURE

The New Books and Multimedia column contains brief reviews of new books in the is less interesting than the first part, where
computer communications field. Each review includes a highly abstracted description the CSP models could be of interest to
of the contents, relying on the publisher’s descriptive materials, minus advertising readers who are more interested in the
superlatives, and checked for accuracy against a copy of the book. The reviews also application of formal description tech-
comment on the structure and the target audience of each book. Publishers wishing to niques to network protocols.
have their books listed in this manner should contact Olivier Bonaventure by email.
Olivier Bonaventure Patterns in Network Architec-
Université Catholique de Louvain, Belgium
bonaventure@ieee.org ture : A Return to Fundamen-
tals
John Day, Prentice Hall, 2008, ISBN-
LAN Switch Security : What power over Ethernet. The second part 10: 0132252422, Hardbound, 464
focuses on techniques can that be used pages
Hackers Know About Your on switches to sustain denial-of-service
Switches attacks, from both forwarding and con- The architecture of today’s Internet was
Eric Vyncke and Christopher Pagen, trol plane viewpoints. The last part ana- mainly designed together with the TCP
lyzes recent techniques that can be used and IP protocols in the 1970s and early
Cisco Press, 2008, ISBN-10: 1-58705-
to improve the security of Ethernet 1980s. During the last years, researchers
256-3, Softbound, 360 pages
switches, such as 802.1x or 802.1AE and and funding organizations in America,
Ethernet is now the default fixed local area access control lists. Europe, and Asia have started to work on
network technology. Ethernet LANs are different alternative architectures for the
found in all enterprise environments, and Principles of Protocol Design Internet. Some consider an evolutionary
in more and more home networks. Ether- Robin Sharp, Springer Verlag, 2008, approach where the Internet architecture
net was designed in the 1970s when securi- would be incrementally modified in a
ISBN: 978-3-540-77540-9, Hard-
ty was not a concern. Since then, Ethernet backward compatible manner, while oth-
has evolved with the introduction of hubs bound, 402 pages ers believe a completely new architecture
and switches. Many network administra- This book takes an unusual path to should be developed to take into account
tors are aware that hubs are a security con- describe computer network protocols. the requirements of today’s and tomor-
cern since they broadcast Ethernet frames, While most standard networking texts row’s Internet.
and some of them assume that switches are mainly focus on a textual description of John Day’s book is a must read for
more secure. Unfortunately, hackers have the different protocols and mechanisms, researchers interested in the evolution of
learned the limitations of Ethernet switch- Robin Sharp starts from formal descrip- the Internet architecture. The book is
es and have developed several tools that tion techniques. More precisely, he choos- composed of two main parts. The first part
can be used to exploit them. es the Communicating Sequential is mainly a history of the evolution of
This book describes the current state of Processes (CSP) notation proposed by computer network architectures in the
the art in securing Ethernet switches. The Hoare. CSP is a process algebra that 1970s and 1980s. John Day participated
authors take a practical approach by using allows to model the interactions among actively in this research on both the Inter-
different types of Cisco switches and freely communicating processes. The book starts net side and the OSI side. He explains the
available tools to demonstrate the security with a detailed description of CSP and reasons for some of the design choices and
problems and their solutions. Despite its then uses the CSP formalism to describe discusses alternatives that were considered
focus on a single vendor, this book is an several mechanisms such as flow and error but not selected. The discussion considers
interesting reference for system adminis- control, fault-tolerant broadcast, and two- several of the key elements of a computer
trators who are willing to better under- phase commits. An advantage of using network architecture, including the proto-
stand how to secure their Ethernet CSP is that the book contains proofs of col elements, layering, naming, and
networks. This is particularly important in several of the described mechanisms. addressing.
environments such as schools were uncon- However, as CSP does not contain com- The second part describes John Day’s
trolled laptops are often connected. plex data types, it is difficult to completely vision of an alternative network architec-
The first part discusses the basic secu- model complex protocols in detail. Sur- ture. For this, he starts by reconsidering
rity problems that affect Ethernet switch- prisingly, the author did not consider network-based InterProcess Communica-
es: the learning bridge process and the more powerful formal description tech- tion (IPC) and shows that a distributed
implications of the limited size of the niques that evolved from CSP such as IPC should be at the core of a computer
MAC table on Ethernet switches. It also LOTOS. network architecture. This discussion is
discusses configurations to mitigate these The second part of the book is more interesting, but the author does explain in
problems. Then the book analyzes sever- heterogeneous. Several security protocols detail how it could be realized in practice.
al protocols and their security implica- are discussed, and the BAN logic is intro- The second part ends with two chapters
tions: the spanning tree protocol, the duced. Then the author briefly discusses on topological addressing influenced by
802.1q VLANs, DHCP, IPv4 ARP, and real protocols. The discussion considers Mike O’Dell’s GSE proposal, and a dis-
IPv6 Neighbor Discovery, but also sur- both open system interconnection (OSI) cussion of the impact of multicast and
prising electrical security issues with protocols and Internet protocols. This part multihoming on the architecture.

4 IEEE Network • September/October 2008


LYT-GSTEDIT-SEPTEMBER 9/9/08 12:53 PM Page 6

GUEST EDITORIAL

Implications and Control of Middleboxes in the Internet

Xiaoming Fu Martin Stiemerling Henning Schulzrinne

M
iddleboxes in the Internet have been explored, egorized as explicit control and implicit control of firewalls
sometimes quite controversially, in operations, and NATs. For explicit control, an entity, either the end host
standardization, and the research community for or a proxy in the network, has a relationship with the middle-
more than 10 years. The main concern in the box and controls its behavior (e.g., the set of policies or filter
past has been their contradicting nature to the Internet’s end- rules loaded). Examples of explicit control are universal plug
to-end principle. In the past, many have expressed concerns and play (UPnP), Internet Engineering Task Force (IETF)
that middleboxes contradict the Internet's end-to-end principle Middlebox Communications (MIDCOM), and IETF Next
that is often understood to posit that "intelligence" is placed in Steps in Signaling (NSIS). On the other hand, implicit control
end system and network elements just forward packets. Mid- is the traditional way of traversing middleboxes. Implicit con-
dleboxes introduce functions beyond forwarding in the data trol does not have any control relationship with the middlebox,
path between a source and destination, as described, for exam- because end hosts, probably with the support of other end
ple, in RFC 3234. RFC 3234 describes a wide range of middle hosts, are using hole punching techniques to get a working
boxes, from TCP performance enhancing proxies to middlebox traversal. Examples of implicit control are the
transcoders. IETF’s Session Traversal Utilities for NAT (STUN), Traversal
On the other hand, middleboxes were introduced in the Using Relays around NAT (TURN), and Interactive Connec-
Internet for various reasons: NATs intend to decouple the tivity Establishment (ICE). In addition, there have been some
internal IP addressing from the public address space while recent attempts to design or use certain types of middleboxes,
allowing multiple hosts to share a single public IP address, for such as various application proxies.
the purpose of preserving the IP address space; firewalls are In this special issue we are pleased to introduce a series of
used for administrators to enforce policies on the data traffic state-of-the-art articles on this specific area. These articles
at administrative borders with the intention of preventing their cover the subject from a variety of perspectives, offering the
networks from being attacked or monitored; application level readers an understanding of the issues and implications of var-
gateways (ALGs) are typically used to assist applications in ious middleboxes in the Internet, including their control mech-
their operations. anisms. A total of eigh articles, selected from 26 submissions
The implications of the emergence and popularity of based on a strict peer review process, cover a broad range in
middleboxes are complicated. With middleboxes it is diffi- the field of implications and control of middleboxes in the
cult to even provide basic end-to-end connectivity for many Internet. While some articles present more general issues with
applications. For example, Internet hosts behind NATs can middleboxes, understanding their behaviors and implications,
only initiate a TCP connection with another host, but can- others focus on new approaches to controlling and usiing mid-
not accept a connection request. Unlike in the past, when dleboxes.
the vast majority of applications followed the client-server NATs, an unplanned reality, have posed complications to
design pattern, and most hosts behind NATs were clients the Internet architecture and applications. The first article, “A
anyway (e.g., your browser accessing a Web server), a vari- Retrospective View of NAT” by Lixia Zhang, takes readers
ety of new applications today, such as voice-over-IP, gam- back to the early days of middleboxes. It gives a historic review
ing, and peer-to-peer file sharing cause an enormous list of of NATs and the lessons learned, including how they impeded
issues. Hosts behind NATs are not reachable from any standardization and deployment of IPv6, and an expected solu-
other host anymore, which become particularly troublesome tion for addressing the Internet address depletion problem.
for VoIP and other peer-to-peer applications. Likewise, Without a timely standardization of NAT, today there have
firewalls are usually statically configured to block certain been a number of different NAT implementations, and it is
TCP ports or do not understand non-TCP protocols, mak- vital to understand their behaviors due to their nearly ubiqui-
ing it difficult to deploy new applications and protocols. tous presence.
This results in a number of issues to be considered in the The second article, “Behavior and Classification of NAT
design and development of new protocols and applications. Devices and Implications for NAT Traversal” by Andreas
To mitigate the negative impacts of these issues, quite a Müller, Andreas Klenk, and Georg Carle, provides a compre-
number of techniques have been developed, which can be cat- hensive overview of NAT behaviors and currently available

6 IEEE Network • September/October 2008


LYT-GSTEDIT-SEPTEMBER 9/9/08 12:53 PM Page 7

GUEST EDITORIAL

NAT traversal techniques. The article presents a new catego- Yet another type of middlebox function, intelligent route
rization approach based on an analytical abstraction of NAT control (IRC) for multihomed sites and subscribers, has been
traversal, which classifies NAT traversal services into four dis- recently identified as a key issue in efficient network opera-
tinct types and deduces the corresponding NAT behaviors. tions. The final article, “Improving the Performance of Route
This may help developers of new protocols and applications to Control Middleboxes in a Competitive Environment” by
determine applicable techniques for NAT traversal. Marcelo Yannuzzi et al., addresses this issue and introduces an
While the first two articles describe the history, behavior, IRC approach for competitive environments, by blending ran-
and classification of NAT, the next article by Dilip Joseph and domization with adaptive filtering techniques.
Ion Stoica, “Modeling Middleboxes,” proposes a formal and We hope that these articles will help to clarify and explain
generic model for deducing middlebox functionalities and the state-of-the-art advances on middlebox issues in the Inter-
behaviors. Using this model, the article illustrates how differ- net, providing current visions of how the behaviors, implica-
ent middleboxes process packets, and how four common mid- tions, and control of middlboxes may be analyzed,
dleboxes — firewall, NAT, layer 4, and layer 7 load balancers encompassed, and utilized. In preparing this special issue, we
— may be depicted. As such, the article provides an initial step wish to thank all the peer reviewers for their efforts in careful-
for relevant designers, users, and researchers to understand ly reviewing the manuscripts to meet the tight deadlines. We
and refine the behaviors and implications of various middle- are grateful to our liaison editor Jon Crowcroft for his con-
boxes. structive feedbacks, and Editor-in-Chief Ioanis Nikolaidis for
Existing middleboxes mostly consider TCP and UDP in his timely and critical suggestions.
their implementations, and typically do not support other pro-
tocols, such as the Stream Control Transmission Protocol Biographies
(SCTP). In the fourth article, Michael Tüxen et al. describe the X IAOMING F U [M’02] (fu@cs.uni-goettingen.de) received his Ph.D. degree in
extensions required to support NAT for SCTP. The analysis computer science from Tsinghua University, Beijing, China, in 2000. After
presented in this article may be useful as a general lesson in almost two years of postdoctoral work at Technical University Berlin, he joined
the near future, as several other protocols after SCTP, includ- the University of Göttingen as an assistant professor, leading a team working on
networking research. Since April 2007 he has been a professor and head of the
ing DCCP, XCP, and HIP, use similar techniques such as mul- Computer Networks Group at the University of Göttingen. During 2003–2005
tihoming, rehoming, and handshake cookies. he also served as an expert on the ETSI Specialist Task Forces on Internet Proto-
Applications using the Session Initialization Protocol (SIP) col Testing; he was also a visiting scientist at the University of Cambridge and Columbia
or peer-to-peer way of operation (P2PSIP or just normal P2P University. In the research fields of architectures, protocols, and applications for
QoS, firewalls, p2p overlay, and mobile networking as well as related security issues,
applications) are among those that suffer most from the mid- he (co-)authored more than 50 referred papers as well as several RFCs/I-Ds. He
dlebox traversal issue. The fifth article, “Distributed Connec- has served as TPC member and session chair for several conferences, including
tivity Service for a SIP Infrastructure” by Luigi Ciminiera et IEEE INFOCOM, ICNP, ICDCS, GLOBECOM, and ICC. He was also founding
al., examines this issue and presents an alternative approach to chair of the ACM Workshop on Mobility in the Evolving Internet Architecture (MobiArch)
and is TPC Co-Chair of IEEE GLOBECOM 2009 Next Generation Networking
the current STUN/TURN/ICE approach to middlebox traver- and Internet Symposium. He is currently a member of the editorial board of
sal. The approach distributes the rendezvous and relay func- Computer Communications Journal (Elsevier).
tions among SIP user agents, which discover their peers
autonomously and maintain a P2P overlay to ensure connectiv- M ARTIN S TIEMERLING [M’00] (stiemerling@cs.uni-goettingen.de) received his
M.Sc. degree (Diploma) in electrical eengineering with a focus on IP networking
ity across NATs and firewalls in a SIP infrastructure without technologies from the Polytechnic University of Applied Sciences in Cologne in 2000.
relying on a centralized server. After that he joined the NEC Laboratories Europe, Heidelberg, Germany, where
The remaining three articles address new applications of mid- he is currently a senior researcher. His areas of research interest are Internet
dleboxes. The sixth article, “Dial M for Middlebox Managed architecture, Internet signaling protocols, network management, and overlay/
peer-to-peer systems. He has published several papers in these areas, and
Mobility” by Stephen Herborn and Aruna Seneviratne, served as a TPC member of IEEE IPOM 2007. In the IETF he is active as working
describes a new usage type of middleboxes for mobility support document editor in the MIDCOM, MMUSIC, and NSIS working groups, as well
via the concept of virtual private “personal networks.” Such a as in other IETF working groups and IRTF research groups. He is co-chair of the
network is created and maintained by way of HIP combined IETF Next Steps in Signaling (NSIS) working group, and secretary of the IP over
DVB (IPDVB) working group, and a co-author of RFC 3816, RFC 3989, and RFC
with IPsec and supported by middlebox state drop "(at least to 4540, as well as RTSPng.
some extent)" plus middlebox state, which may be interesting (at
least to some extent) for the recent research efforts on network HENNING SCHULZRINNE [F’06] (hgs@cs.columbia.edu) received his Ph.D. from the
virtualization, as they use today’s technologies directly. University of Massachusetts in Amherst, Massachusetts. He was a member of
technical staff at AT&T Bell Laboratories, Murray Hill, New Jersey, and an asso-
An increasing number of home users today are using NATs ciate department head at GMD-Fokus (Berlin) before joining the Computer Sci-
to connect their home IP devices with the Internet. Choongul ence and Electrical Engineering Departments at Columbia University, New York.
Park et al. discuss this issue in their article “Issues in the He is currently a professor and chair of the Department of Computer Science.
Remote Management of Home Network Devices.” By extend- He has been a member of the Board of Governors of the IEEE Communications
Society and is vice chair of ACM SIGCOMM, former chair of the IEEE Commu-
ing SNMP and using additional management objects (MOs) to nications Society Technical Committees on Computer Communications and the Inter-
gather NAT binding information, the authors attempt to net, has been technical program chair of Global Internet, INFOCOM,
address the NAT traversal problem under a symmetric NAT, NOSSDAV, and IPTCOMM, and was General Chair of ACM Multimedia 2004.
based on their observations in Korea. While the success rate of He has also been a member of the Internet Architecture Board. Protocols co-
developed by him, such as RTP, RTSP, and SIP, are now Internet standards, used
NAT traversal could be a potential issue outside Korea, the by almost all Internet telephony and multimedia applications. His research inter-
article provides an insight of what home networking standards ests include Internet multimedia systems, ubiquitous computing, mobile systems, qual-
may have to deal with. ity of service, and performance evaluation.

IEEE Network • September/October 2008 7


ZHANG LAYOUT 9/5/08 1:03 PM Page 8

A Retrospective View of
Network Address Translation
Lixia Zhang, University of California, Los Angeles

Abstract
Today, network address translators, or NATs, are everywhere. Their ubiquitous
adoption was not promoted by design or planning but by the continued growth of
the Internet, which places an ever-increasing demand not only on IP address space
but also on other functional requirements that network address translation is per-
ceived to facilitate. This article presents a personal perspective on the history of
NATs, their pros and cons in a retrospective light, and the lessons we can learn
from the NAT experience.

A network address translator (NAT) commonly


refers to a box that interconnects a local network
to the public Internet, where the local network
runs on a block of private IPv4 addresses as spec-
ified in RFC 1918 [1]. In the original design of the Internet
architecture, each IP address was defined to be globally
unique and globally reachable. In contrast, a private IPv4
I also emphasize that this writing represents a personal
view, and my recall of history is likely to be incomplete and to
contain errors. My personal view on this subject has also
changed over time, and it may continue to evolve, as we are
all in a continuing process of understanding the fascinating
and dynamically changing Internet.

address is meaningful only within the scope of the local net-


work behind a NAT and, as such, the same private address
How a NAT Works
block can be reused in multiple local networks, as long as As mentioned previously, IP addresses originally were
those networks do not directly talk to each other. Instead, designed to be globally unique and globally reachable. This
they communicate with each other and with the rest of Inter- property of the IP address is a fundamental building block
net through NAT boxes. in supporting the end-to-end architecture of the Internet.
Like most unexpected successes, the ubiquitous adoption of Until recently, almost all of the Internet protocol designs,
NATs was not foreseen when the idea first emerged more especially those below the application layer, were based on
than 15 years ago [2, 3]. Had anyone foreseen where NAT the aforementioned IP address model. However, the explo-
would be today, it is possible that NAT deployment might sive growth of the Internet during the 1990s not only sig-
have followed a different path, one that was better planned naled the danger of IP address space exhaustion, but also
and standardized. The set of Internet protocols that were created an instant demand on IP addresses: suddenly, con-
developed over the past 15 years also might have evolved dif- necting large numbers of user networks and home comput-
ferently by taking into account the existence of NATs, and we ers demanded IP addresses instantly and in large quantities.
might have seen less overall complexity in the Internet com- Such demand could not possibly be met by going through
pared to what we have today. the regular IP address allocation process. Network address
Although the clock cannot be turned back, I believe it is a translation came into play to meet this instant high demand,
worthwhile exercise to revisit the history of network address and NAT products were quickly developed to meet the mar-
translation to learn some useful lessons. It also can be worth- ket demand.
while to assess, or reassess, the pros and cons of NATs, as However, because NATs were not standardized before
well as to take a look at where we are today in our under- their wide deployment, a number of different NAT products
standing of NATs and how best to proceed in the future. exist today, each with somewhat different functionality and
It is worth pointing out that in recent years many efforts different technical details. Because this article is about the
were devoted to the development and deployment of NAT history of NAT deployment — and not an examination of how
traversal solutions, such as simple traversal of UDP through to traverse various different NAT boxes — I briefly describe a
NAT (STUN) [4], traversal using relay NAT (TURN) [5], and popular NAT implementation as an illustrative example.
Teredo [6], to name a few. These solutions remove obstacles Interested readers can visit Wikipedia to find out more about
introduced by NATs to enable an increasing number of new existing types of NAT products.
application deployments. However, as the title suggested, this A NAT box N has a public IP address for its interface
article focuses on examining the lessons that we can learn connecting to the global Internet and a private address fac-
from the NAT deployment experience; a comprehensive sur- ing the internal network. N serves as the default router for
vey of NAT traversal solutions must be reserved for a sepa- all of the destinations that are outside the local NAT address
rate article. block. When an internal host H sends an IP packet P to a

8 0890-8044/08/$25.00 © 2008 IEEE IEEE Network • September/October 2008


ZHANG LAYOUT 9/5/08 1:03 PM Page 9

public IP destination address D located in the global Inter- RFC 1287 also discussed three possible directions to extend
net, the packet is routed to N. N translates the private IP address space. The first one pointed to a direction similar
source IP address in P’s header to N’s public IP address and to current NATs:
adds an entry to its internal table that keeps track of the
mapping between the internal host and the outgoing packet. Replace the 32-bit field with a field of the same size but with a
This entry represents a piece of state, which enables subse- different meaning. Instead of being globally unique, it would be
quent packet exchanges between H and D. For example, unique only within some smaller region. Gateways on the bound-
when D sends a packet P’ in response to P, P’ arrives at N, ary would rewrite the address as the packet crossed the boundary.
and N can find the corresponding entry from its mapping
table and replace the destination IP address — which is its RFC 1335 [3], published shortly after RFC 1287, provided
own public IP address — with the real destination address a more elaborate description of the use of internal IP address-
H, so that P’ will be delivered to H. The mapping entry times es (i.e., private IP addresses) as a solution to IP address
out after a certain period of idleness that is typically set to a exhaustion. The first article describing the NAT idea, “Extend-
vendor-specific value. In the process of changing the IP ing the IP Internet through Address Reuse” [10], appeared in
address carried in the IP header of each passing packet, a the January 1993 issue of ACM Computer Communication
NAT box also must recalculate the IP header checksum, as Review and was published a year later as RFC 1631 [11].
well as the checksum of the transport protocol if it is calcu- Although these RFCs can be considered forerunners in the
lated based on the IP address, as is the case for Transmis- development of NAT, as explained later, for various reasons
sion Control Protocol (TCP) and User Datagram Protocol the IETF did not take action to standardize NAT.
(UDP) checksums. The invention of the Web further accelerated Internet
From this brief description, it is easy to see the major bene- growth in the early 1990s. The explosive growth underlined
fit of a NAT: one can connect a large number of hosts to the the urgency to take action toward solving both the routing
global Internet by using a single public IP address. A number scalability and the address shortage problems. The IETF took
of other benefits of NATs also became clear over time, which several follow-up steps, which eventually led to the launch of
I will discuss in more detail later. the IPng development effort. I believe that the expectation at
At the same time, a number of drawbacks to NATs also the time was to develop a new IP within a few years, followed
can be identified immediately. First and foremost, the NAT by a quick deployment. However, the actual deployment dur-
changed the end-to-end communication model of the Inter- ing the next ten years took a rather unexpected path.
net architecture in a fundamental way: instead of allowing
any host to talk directly to any other host on the Internet, the The Planned Solution
hosts behind a NAT must go through the NAT to reach oth- As pointed out in RFC 1287, the continued growth of the
ers, and all communications through a NAT box must be ini- Internet exposed strains on the original design of the Internet
tiated by an internal host to set up the mapping entries on architecture, the two most urgent of which were routing sys-
the NAT. In addition, because ongoing data exchange tem scalability and the exhaustion of IP address space.
depends on the mapping entry kept at the NAT box, the box Because long-term solutions require a long lead time to devel-
represents a single point of failure: if the NAT box crashes, it op and deploy, efforts began to develop both a short term and
could lose all the existing state, and the data exchange a long-term solution to those problems.
between all of the internal and external hosts must be restart- Classless inter-domain routing, or CIDR, was proposed as a
ed. This is in contrast to the original goal of IP of delivering short term solution. CIDR removed the class boundaries
packets to their destinations, as long as any physical connec- embedded in the IP address structure, thus enabling more
tivity exists between the source and destination hosts. Fur- efficient address allocation, which helped extend the lifetime
thermore, because a NAT alters the IP addresses carried in a of IP address space. CIDR also facilitated routing aggrega-
packet, all protocols that are dependent on IP addresses are tion, which slowed down the growth of the routing table size.
affected. In certain cases, such as TCP checksum, which However, as stated in RFC 1481 [12], IAB Recommendation
includes IP addresses in the calculation, the NAT box can for an Intermediate Strategy to Address the Issue of Scaling:
hide the address change by recalculating the TCP checksum “This strategy (CIDR) presumes that a suitable long-term
when forwarding a packet. For some of the other protocols solution is being addressed within the Internet technical com-
that make direct use of IP addresses, such as IPSec [7], the munity.” Indeed, a number of new IETF working groups start-
protocols can no longer operate on the end-to-end basis as ed in late 1992 and aimed at developing a new IP as a
originally designed; for some application protocols, for exam- long-term solution; the Internet Engineering Steering Group
ple, File Transfer Protocol (FTP) [8], that embed IP address- (IESG) set up a new IPng area in 1993 to coordinate the
es in the application data, application-level gateways are efforts, and the IPng Working Group (later renamed to IPv6)
required to handle the IP address rewrite. As discussed later, was established in the fall of 1994 to develop a new version of
NAT also introduced other drawbacks that surfaced only IP [13].
recently. CIDR was rolled out quickly, which effectively slowed the
growth of the global Internet routing table. Because it is a
quick fix, CIDR did not address emerging issues in routing
A Recall of the History of NATs scalability, in particular the issue of site multihoming. A multi-
I started my Ph.D. studies in the networking area at the Mas- homed site should be reachable through any of its multiple
sachusetts Institute of Technology at the same time as RFC provider networks. In the existing routing architecture, this
791 [9], the Internet Protocol Specification, was published in requirement translates to having the prefix, or prefixes, of the
September 1981. Thus I was fortunate to witness the fascinat- site listed in the global routing table, thereby rendering
ing unfolding of this new system called the Internet. During provider-based prefix aggregation ineffective. Interested read-
the next ten years, the Internet grew rapidly. RFC 1287 [2], ers are referred to [14] for a more detailed description on
Towards the Future Internet Architecture, was published in 1991 multihoming and its impact on routing scalability.
and was probably the first RFC that raised a concern about IP The new IP development effort, on the other hand, took
address space exhaustion in the foreseeable future. much longer than anyone expected when the effort first

IEEE Network • September/October 2008 9


ZHANG LAYOUT 9/5/08 1:03 PM Page 10

began. The IPv6 working group finally completed all of the changing providers, other than renumbering the public IP
protocol development effort in 2007, 13 years after its estab- address of the NAT box.
lishment. The IPv6 deployment also is slow in coming. Until Similarly, a NAT box also makes multihoming easy. One
recently, there were relatively few IPv6 trial deployments; NAT box can be connected to multiple providers and use one
there is no known commercial user site that uses IPv6 as the IP address from each provider. Not only does the NAT box
primary protocol for its Internet connectivity. shelter the connectivity to multiple ISPs from all the internal
If one day someone writes an Internet protocol develop- hosts, but it also does not require any of its providers to
ment history, it would be very interesting to look back and “punch a hole” in the routing announcement (i.e., make an
understand the major reasons for the slow development and ISP de-aggregate its address block). Such a hole punch would
adoption of IPv6. But even without doing any research, one be required if the multihomed site takes an IP address block
could say with confidence that NATs played a major role in from one of its providers and asks the other providers to
meeting the IP address requirement that arose out of the announce the prefix.
Internet growth and at least deferred the demand for a new Furthermore, this one level of indirection also is perceived
IP to provide the much needed address space to enable the as one level of protection because external hosts cannot
continued growth of the Internet. directly initiate communication with hosts behind a NAT, nor
can they easily figure out the internal topology.
The Unplanned Reality Besides all of the above, two additional factors also con-
Although largely unexpected, NATs have played a major tributed greatly to the quick adoption of NATs. First, NATs
role in facilitating the explosive growth of Internet access. can be unilaterally deployed by any end site without any coor-
Nowadays, it is common to see multiple computers, or even dination by anybody else. Second, the major gains from
multiple LANs, in a single home. It would be unthinkable deploying a NAT were realized on day one, whereas its poten-
for every home to obtain an IP address block, however small tial drawbacks were revealed only slowly and recently.
it may be, from its network service provider. Instead, a com-
mon implementation for home networking is to install a
NAT box that connects one home network or multiple home
The Other Side of the NAT
networks to a local provider. Similarly, most enterprise net- A NAT disallows the hosts behind it from being reachable by
works deploy NATs as well. It also is well known that coun- an external host and hence disables it from being a server.
tries with large populations, such as India and China, have However, in the early days of NAT deployment, many people
most of their hosts behind NAT boxes; the same is true for believed that they would have no need to run servers behind a
countries that connected to the Internet only recently. With- NAT. Thus, this architectural constraint was viewed as a secu-
out NATs, the IPv4 address space would have been exhaust- rity feature and believed to have little impact on users or net-
ed a long time ago. work usage. As an example, the following four justifications
For reasons discussed later, the IETF did not standardize for the use of private addresses are quoted directly from RFC
NAT implementation or operations. However, despite the 1335 [3].
lack of standards, NATs were implemented by multiple ven- • In most networks, the majority of the traffic is confined to
dors, and the deployment spread like wildfire. This is because its local area networks. This is due to the nature of net-
NATs have several attractions, as we describe next. working applications and the bandwidth constraints on
inter-network links.
• The number of machines that act as Internet servers, that is,
Why NATs Succeeded run programs waiting to be called by machines in other net-
NATs started as a short term solution while waiting for a new works, is often limited and certainly much smaller than the
IP to be developed as the long-term solution. The first recog- total number of machines.
nized NAT advantages were stated in RFC 1918 [1]: • There are an increasingly large number of personal
machines entering the Internet. The use of these machines
With the described scheme many large enterprises will need is primarily limited to their local environment. They also
only a relatively small block of addresses from the globally can be used as clients such as ftp and telnet to access other
unique IP address space. The Internet at large benefits through machines.
conservation of globally unique address space, which will effec- • For security reasons, many large organizations, such as
tively lengthen the lifetime of the IP address space. The enterpris- banks, government departments, military institutions, and
es benefit from the increased flexibility provided by a relatively some companies, allow only a very limited number of their
large private address space. machines to have access to the global Internet. The majori-
ty of their machines are purely for internal use.
The last point deserves special emphasis. Indeed, anyone As time goes on, however, the above reasoning has largely
can use a large block of private IP addresses — up to 16 mil- been proven wrong.
lion without asking for permission — and then connect to the First, network bandwidth is no longer a fundamental con-
rest of the Internet by using only a single public IP address. A straint today. On the other hand, voice over IP (VoIP) has
big block of private IP addresses provides the much needed become a popular application over the past few years. VoIP
room for future growth. On the other hand, for most if not all changed the communication paradigm from client-server to a
user sites, it is often difficult to obtain an IP address block peer-to-peer model, meaning that any host may call any other
that is beyond their immediate requirements. host. Given the large number of Internet hosts that are
Today, NAT is believed to offer advantages well beyond behind NAT, several NAT traversal solutions have been
the above. Essentially, the mapping table of a NAT provides developed to support VoIP. A number of other recent peer-
one level of indirection between hosts behind the NAT and to-peer applications, such as BitTorrent, also have become
the global Internet. As the popular saying goes, “Any problem popular recently, and each must develop its own NAT traver-
in computer science can be solved with another layer of indi- sal solutions.
rection.” This one level of indirection means that one never In addition to the change of application patterns, a few
need worry about renumbering the internal network when other problems also arise due to the use of non-unique, pri-

10 IEEE Network • September/October 2008


ZHANG LAYOUT 9/5/08 1:03 PM Page 11

vate IP addresses with NATs. For instance, a number of busi- addresses at the time. Furthermore, sticking to the architec-
ness acquisitions and mergers have run into situations where tural model in an absolute way also contributed to the one-
two networks behind NATs were required to be interconnect- sided view of the drawbacks of NATs, hence the lack of a full
ed, but unfortunately, they were running on the same private appreciation of the advantages of NATs as we discussed earli-
address block, resulting in address conflicts. Yet another er, let alone any effort to develop a NAT-traversal solution
problem emerged more recently. The largest allocated private that can minimize the impact of NATs on end-to-end reacha-
address block is 10.0.0.0/8, commonly referred to as net-10. bility.
The business growth of some provider and enterprise net- Yet another factor was that given that network address
works is leading to, or already has resulted in, the net-10 translation could be deployed unilaterally by a single party
address exhaustion. An open question facing these networks is alone, there was not an apparent need for standardization.
what to do next. One provider network migrated to IPv6; a This seemingly valid reasoning missed an important fact: a
number of others simply decided on their own to use another NAT box does not stand alone; rather it interacts both direct-
unallocated IP address block [15]. ly with surrounding IP devices, as well as indirectly with
It is also a common misperception that a NAT box makes remote devices through IP packet handling. The need for
an effective firewall. This may be due partly to the fact that in standardizing network address translation behavior has since
places where NAT is deployed, the firewall function often is been well recognized, and a great effort has been devoted to
implemented in the NAT box. A NAT box alone, however, developing NAT standards in recent years [16].
does not make an effective firewall, as evidenced by the fact Unfortunately the early misjudgment on NAT already has
that numerous home computers behind NAT boxes have been cost us dearly. While the big debate went on through the late
compromised and have been used as launch pads for spam or 1990s and early part of the first decade of this century, NAT
distributed denial of service (DDoS) attacks. Firewalls estab- deployment was widely rolled out, and the absence of a stan-
lish control policies on both incoming and outgoing packets to dard led to a number of different behaviors among various
minimize the chances of internal computers being compro- NAT products. A number of new Internet protocols also were
mised or abused. Making a firewall serve as a NAT box does developed or finalized during the same time period, such as
not make it more effective in fencing off malicious attacks; IPSec, Session Announcement Protocol (SAP), and Session
good control polices do. Initiation Protocol (SIP), to name a few. Their designs were
based on the original model of IP architecture, wherein IP
addresses are assumed to be globally unique and globally
Why the Opportunity of Standardizing NAT reachable. When those protocols became ready for deploy-
ment, they faced a world that was mismatched with their
Was Missed design. Not only were they required to solve the NAT traver-
During the decade following the deployment of NATs, a big sal problem, but the solutions also were required to deal with
debate arose in the IETF community regarding whether NAT a wide variety of NAT box behaviors.
should, or should not, be deployed. Due to its use of private Although NAT is accepted as a reality today, the lessons to
addresses, NAT moved away from the basic IP model of pro- learn from the past are yet to be clarified. One example is the
viding end-to-end reachability between hosts, thus represent- recent debate over Class-E address block usage [17]. Class-E
ing a fundamental departure from the original Internet refers to the IP address block 240.0.0.0/4 that has been on
architecture. This debate went on for years. As late as 2000, reserve until now. As such, many existing router and host
messages posted to the IETF mailing list by individual mem- implementations block the use of Class-E addresses. Putting
bers still argued that NAT was architecturally unsound and aside the issue of required router and host changes to enable
that the IETF should in no way endorse its use or develop- Class-E usage, the fundamental debate has been about
ment. Such a position was shared by many people during that whether this Class-E address block should go into the public
time. address allocation pool or into the collection of private
These days most people would accept the position that the address allocations. The latter would give those networks that
IETF should have standardized NAT early on. How did we face net-10 exhaustion a much bigger private address block to
miss the opportunity? A simple answer could be that the crys- use. However, this gain is also one of the main arguments
tal ball was cloudy. I believe that a little digging would reveal against it, as the size limitation of private addresses is consid-
a better understanding of the factors that clouded our eyes at ered a pressure to push those networks facing the limitation
the time. As I see it from my personal viewpoint, the follow- to migrate to IPv6, instead of staying with NAT. Such a desire
ing factors played a major role. sounds familiar; similar arguments were used against NAT
First, the feasibility of designing and deploying a brand new standardization in the past. However if the past is any indica-
IP was misjudged, as were the time and effort required for tion of the future, we know that pressures do not dictate new
such an undertaking. Those who were opposed to standardiz- protocol deployment; rather, economical feasibility does. This
ing NAT had hoped to develop a new IP in time to meet the statement does not imply that migrating to IPv6 brings no
needs of a growing Internet. Unfortunately, the calculation economical feasibility. On the contrary, it does, especially in
was way off. While the development of a new IP was taking its the long run. New efforts are being organized both in protocol
time, Internet growth did not wait. Network address transla- and tools development to smooth and ease the transition from
tion is simply an inevitable consequence that was not clearly IPv4 to IPv6 and in case studies and documentation to show
recognized at the time. clearly the short- and long-term gains from deploying IPv6.
Second, the community faced a difficult question regarding
how strictly one should stick to architectural principles, and
what can be acceptable engineering trade-offs. Architectural
Looking Back and Looking Forward
principles are guidelines for problem solving; they help guide The IPv4 address space exhaustion predicted long ago is final-
us toward developing better overall solutions. However, when ly upon us today, yet the IPv6 deployment is barely visible on
the direct end-to-end reachability model was interpreted as an the horizon. What can and should be done now to enable the
absolute rule, it ruled out network address translation as a Internet to grow along the best path forward? I hope this
feasible means to meet the instant high demand for IP review of NAT history helps shed some light on the answer.

IEEE Network • September/October 2008 11


ZHANG LAYOUT 9/5/08 1:03 PM Page 12

First, we should recognize not only the fact that IPv4 net- local IPv6 unicast addresses (ULA) [20], another new type of
work address translation is widely deployed today, but also IP address. The debate over the exact meaning of ULA is still
recognize its perceived benefits to end users as we discussed going on.
in a previous section. We should have a full appraisal of the The original IP design clearly defined an IP address as
pros and cons of NAT boxes; the discussion in this article being globally unique and globally reachable and as identify-
merely serves as a starting point. ing an attachment point to the Internet. As the Internet con-
Second, it is likely that some forms of network address tinues to grow and evolve, recent years have witnessed an
translation boxes will be with us forever. Hopefully, a full almost universal deployment of middleboxes of various
appraisal of the pros and cons of network address translation types. NATs and firewalls are dominant among deployed
would help correct the view that all network address transla- middleboxes, though we also are seeing increasing numbers
tion approaches are a “bad thing” and must be avoided at all of SIP proxies and other proxies to enable peer-to-peer-
costs. Several years ago, an IPv4 to IPv6 transition scheme based applications. At the same time, proposals to change
called Network Address Translation-Protocol Translation the original IP address definition, or even redefine it entire-
(NAT-PT; see [18]) was developed but later classified to his- ly, continue to arise. What should be the definition, or defi-
torical status,1 mainly due to the concerns that: nitions, of an IP address today, especially in the face of
• NAT-PT works in much the same way as an IPv4 NAT box. various middleboxes? I believe an overall examination of the
• NAT-PT does not handle all the transition cases. role of the IP address in today’s changing architecture
However, in view of IPv4 NAT history, it seems worthwhile to deserves special attention at this critical time in the growth
revisit that decision. IPv4, together with IPv4 NAT, will be of the Internet.
with us for years to come. NAT-PT seems to offer a unique
value in bridging IPv4-only hosts and applications with IPv6- Acknowledgments
enabled hosts and networks. There also have been discussions I sincerely thank Mirjam Kuhne and Wendy Rickard for their
of the desire to perform address translations between IPv6 help with an earlier version of this article that was posted in
networks as a means to achieve several goals, including insu- the online IETF Journal of October 2007. I also thank the co-
lating one’s internal network from the outside. This question editors and reviewers of this special issue for their invaluable
of “Whither IPv6 NAT?” deserves further attention. Instead comments.
of repeating the mistakes with IPv4 NAT, the Internet would
be better off with well-engineered standards and operational References
guidelines for traversing IPv4 and IPv6 NATs that aim at [1] Y. Rekhter et al., “Address Allocation for Private Internets,” RFC 1918, 1996.
[2] D. Clark et al., “Towards the Future Internet Architecture,” RFC 1287, 1991.
maximizing interoperability. [3] Z. Wang and J. Crowcroft, “A Two-Tier Address Structure for the Internet: A
Furthermore, accepting the existence of network address Solution to the Problem of Address Space Exhaustion,” RFC 1335, 1992.
translation in today’s architecture does not mean we simply [4] J. Rosenberg et al., “STUN: Simple Traversal of User Datagram Protocol
take the existing NAT traversal solutions as given. Instead, we (UDP) through Network Address Translators (NATs),” RFC 3489, 2003.
[5] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays around
should fully explore the NAT traversal design space to steer NAT (TURN),” draft-ietf-behave-turn-08, 2008.
the solution development toward restoring the end-to-end [6] C. Huitema, “Teredo: Tunneling IPv6 over UDP through Network Address
reachability model in the original Internet architecture. A new Translations (NATs),” RFC 4380, 2006.
effort in this direction is the NAT traversal through tunneling [7] S. Kent and R. Atkinson, “Security Architecture for the Internet Protocol, RFC
2401, 1998.
(NATTT) project [19]. Contrary to most existing NAT traver- [8] J. Postel and J. Reynolds, File Transfer Protocol (FTP), RFC 959, 1985.
sal solutions that are server-based or protocol-specific, [9] J. Postel, Internet Protocol Specification, RFC 791, 1981.
NATTT aims to restore end-to-end reachability among Inter- [10] P. Tsuchiya and T. Eng, “Extending the IP Internet through Address Reuse,”
net hosts in the presence of NATs, by providing generic, ACM SIGCOMM Computer Commun. Review, Sept. 1993.
[11] K. Egevang and P. Francis, “The IP Network Address Translator (NAT),”
incrementally deployable NAT-traversal support for all appli- RFC 1631, 1994.
cations and protocols. [12] C. Huitema, “IAB Recommendation for an Intermediate Strategy to Address
Last, but not least, I believe it is important to understand the Issue of Scaling,” RFC 1481, 1993.
that successful network architectures can and should change [13] R. M. Hinden, “IP Next Generation Overview,” http://playground.sun.com/
ipv6/INET-IPng-Paper.html, 1995.
over time. All new systems start small. Once successful, they [14] L. Zhang, “An Overview of Multihoming and Open Issues in GSE,” IETF J.,
grow larger, often by multiple orders of magnitude as is the Sept. 2006.
case of the Internet. Such growth brings the system to an [15] L. Vegoda, “Used but Unallocated: Potentially Awkward /8 Assignments,”
entirely new environment that the original designers may not Internet Protocol J., Sept. 2007.
[16] http://www.ietf.org/html.charters/behave-charter.html; IETF BEHAVE Work-
have envisioned, together with a new set of requirements that ing Group develops requirements documents and best current practices to
must be met, hence the necessity for architectural adjust- enable NATs to function in a deterministic way, as well as advises on how to
ments. develop applications that discover and reliably function in environments with
To properly adjust a successful architecture, we must have the presence of NATs.
[17] http://www.ietf.org/mail-archive/web/int-area/current/msg01299.html;
a full understanding of the key building blocks of the architec- see the message dated 12/5/07 with subject line “240/4” and all the fol-
ture, as well as the potential impact of any changes to them. I low-up.
believe the IP address is this kind of key building block that [18] G. Tsirtsis and P. Srisuresh, “Network Address Translation-Protocol Transla-
touches, directly or indirectly, all other major components in tion (NAT-PT),” RFC 2766, 2000.
[19] E. Osterweil et al., “NAT Traversal through Tunneling (NATTT),” http://
the Internet architecture. The impact of IPv4 NAT, which www.cs.arizona.edu/˜bzhang/nat/
changed IP address semantics, provides ample evidence. Dur- [20] R. M. Hinden and B. Haberman, “Unique Local IPv6 Unicast Addresses,”
ing IPv6 development, much of the effort also involved a RFC 4193, 2005.
change in IP address semantics, such as the introduction of
new concepts like that of the site-local address. The site-local Biography
address was later abolished and partially replaced by unique LIXIA ZHANG (lixia@cs.ucla.edu) received her Ph.D. in computer science from the
Massachusetts Institute of Technology. She was a member of research staff at the
Xerox Palo Alto Research Center before joining the faculty of the UCLA Comput-
er Science Department in 1995. In the past she served as vice chair of ACM
1 Historical status means that a protocol is considered obsolete and is thus SIGCOMM and co-chair of the IEEE ComSoC Internet Technical Committee. She
removed from the Internet standard protocol set. is currently serving on the Internet Architecture Board.

12 IEEE Network • September/October 2008


MUELLER LAYOUT 9/5/08 1:01 PM Page 14

Behavior and Classification of NAT


Devices and Implications for NAT Traversal
Andreas Müller and Georg Carle, Technische Universität München
Andreas Klenk, Universität Tübingen

Abstract
For a long time, traditional client-server communication was the predominant com-
munication paradigm of the Internet. Network address translation devices emerged
to help with the limited availability of IP addresses and were designed with the
hypothesis of asymmetric connection establishment in mind. But with the growing
success of peer-to-peer applications, this assumption is no longer true. Consequent-
ly network address translation traversal became a field of intensive research and
standardization for enabling efficient operation of new services. This article pro-
vides a comprehensive overview of NAT and introduces established NAT traversal
techniques. A new categorization of applications into four NAT traversal service
categories helps to determine applicable techniques for NAT traversal. The interac-
tive connectivity establishment framework is categorized, and a new framework is
introduced that addresses scenarios that are not supported by ICE. Current results
from a field test on NAT behavior and the success ratio of NAT traversal tech-
niques support the feasibility of this classification.

W hen the Internet Protocol (IP) was designed,


the growth of the Internet to its current size
was not imaginable. Therefore, it was reason-
able to use a fixed 32-bit field to identify a
host based on its IP address. This limited address range makes
it impossible to assign globally unique IPv4 addresses to the
detection of NAT behavior and for NAT traversal. On the
other hand, the IETF also standardizes behavioral properties
for NATs to work in conjunction with IETF protocols (e.g.,
Datagram Congestion Control Protocol [DCCP], Internet
Control Message Protocol [ICMP], Stream Control Transmis-
sion Protocol [SCTP]). Enterprise class NATs are among the
growing number of networked devices. Furthermore, request- first to incorporate new features introduced through standard-
ing an IP address for every newly added device results in an ization. However, the large scale deployment of residential
unacceptable administration overhead. The authors in [1] pro- gateways with NAT functionality prohibits the change of NAT
pose to assign a number of public IP addresses to a designat- and requires the use of protocols that work with existing
ed border router instead of configuring certain hosts with NATs. This is also the focus of this article, where we treat
addresses that can be routed globally. The border router is NATs as black boxes rather than trying to change them.
then responsible for translating IP addresses between the pri-
vate and the public domains, allowing as many simultaneous
connections as public IP addresses were assigned. This allows
NAT Behavior
a host within the local network to access the Internet even Today, a NAT device usually is used to share a single public
though it has a private IP address. This technique became IP address among a number of private end systems. The NAT
known as network address translation (NAT). Because the maintains a table, listing all connections between the public
translation of addresses breaks the end-to-end connectivity and the private domains. For every connection attempt (e.g., a
model of the IP, newly developed services following the peer- Transmission Control Protocol synchronize [TCP SYN] pack-
to-peer (P2P) paradigm such as file sharing, instant messag- et) coming from an internal host, the NAT creates a new
ing, and voice over IP (VoIP) applications suffer from the entry in the list. In NAT terminology this entry is called a
existence of NAT. Thus, NAT traversal is an important prob- binding [3]. Each entry contains the source IP address and the
lem today. And even in the future, after a possible success of source port. The NAT replaces the source IP address with its
IPv6, companies and home users still might deploy NAT public IP address. The source port is replaced using one of
devices to hide their topologies from Internet service providers the strategies explained later in this section.
(ISPs). There are two possible approaches to the problem. Although the concept of NAT was published as early as
One direction within the Internet Engineering Task Force 1994 [1], no common approach for NAT emerged. Current
(IETF) Behave Working Group [2] is to cope with existing NAT implementations not only differ from vendor to vendor
NAT implementations and to establish standards for the but also from model to model, which leads to compatibility

14 0890-8044/08/$25.00 © 2008 IEEE IEEE Network • September/October 2008


MUELLER LAYOUT 9/5/08 1:01 PM Page 15

Classification NAT property

Port preservation source transport address of the packet. As long as the destina-
No port preservation tion transport address of a packet matches an existing state,
Port binding
Port overloading
the packet is forwarded. With Address Restricted Filtering, the
Port multiplexing
NAT forwards only packets coming from the same host
(matching IP address) to which the initial packet was sent.
Endpoint-independent Address and Port Restricted Filtering also compares the source
NAT binding Address- (port)-dependent
port of the inbound packet in addition to address restricted
Connection-dependent
filtering.
Independent
Endpoint filtering Address restricted NAT Traversal Problem
Address and port restricted
To work properly, the NAT must have access to the protocol
n Table 1. NAT behavior categories and possible NAT properties. headers at layers 3 and 4 (in case of a network address port
translation [NAPT]). Additionally, for every incoming packet,
the NAT must already have a state listed in its table. Other-
wise, it cannot find the related internal host to which the
issues. If an application works with one particular NAT, this packet belongs. According to RFC 3027 [8], the NAT traver-
does not imply that it always works in a NATed environment. sal problem can be separated into three categories, which are
Therefore, it is very important to understand and classify presented in this section. In addition to the three problems,
existing NAT implementations in order to design applications we identified Unsupported Protocols as a new category.
that can work in combination with current NATs. The classifi- The first problem occurs if a protocol uses Realm-Specific
cation in this article is mainly derived from simple traversal of IP Addresses in its payload. That is, if an application layer pro-
User Datagram Protocol (UDP) through NAT (STUN) [4], tocol such as the Session Initiation Protocol (SIP) uses a
whereas the address binding and mapping behavior follows transport address from the private realm within its payload
the terminology used in RFC 4787 [5]. This section covers signalizing where it expects a response. Because regular NATs
only topics that are required for the understanding of this do not operate above layer 4, application layer protocols typi-
article. A detailed discussion and further information (includ- cally fail in such scenarios. A possible solution is the use of an
ing test results) is given in [6] (for TCP) and [5] (for UDP). application layer gateway (ALG) that extends the functionali-
Binding covers “context based packet translation” [7], which ty of a NAT for specific protocols. However, an ALG sup-
describes the strategy the NAT uses to assign a public trans- ports only the application layer protocols that are specifically
port address (combination of IP address and port) to a new implemented and may fail when encryption is used.
state in the NAT. Filtering, or packet discard, shows how the The second category is P2P Applications. The traditional
NAT handles (or discards) packets trying to use an existing Internet consists of servers located in the public realm and
mapping. Table 1 shows the different categories and their pos- clients that actively establish connections to these servers.
sible properties. Port binding describes the strategy a NAT This structure is well suited for NATs because for every con-
uses for the assignment. With port preservation, the NAT nection attempt (e.g., a TCP SYN) coming from an internal
assigns an external port to a new connection; it attempts to client, the NAT can add a mapping to its table. But unlike
preserve the local port number if possible. Port overloading is client-server applications, a P2P connection can be initiated
problematic and rarely occurs. A new connection takes over by any of the peers regardless of their location. However, if a
the binding, and the old connection is dropped. Port multi- peer in the private realm tries to act as a traditional server
plexing is a very common strategy where ports are demulti- (e.g., listening for a connection on a socket), the NAT is
plexed based on the destination transport address. Incoming unaware of incoming connections and drops all packets. A
packets can now carry the same destination port and are dis- solution could be that the peer located in the private domain
tinguished by the source transport address. always establishes the connection. But what if two peers, both
NAT binding deals with the reuse of existing bindings. That behind a NAT, want to establish a connection to each other?
is, if an internal host closes a connection and establishes a Even if the security policy would allow the connection, it can-
new one from the same source port, NAT binding describes not be established.
the assignment strategy for the new connection. As shown in The third category is a combination of the first two. Bun-
Table 1, the NAT binding is organized into three categories. dled Session Applications, such as File Transfer Protocol
With Endpoint Independent, the external port is only depen- (FTP) or SIP/Session Description Protocol (SDP), carry
dent on the source transport address of the connection. As realm-specific IP addresses in their payload to establish an
long as a host establishes a connection from the same source additional session. The first session is usually referred to as
IP address and port, the mapping does not change. The the control session, whereas the newly created session is
assignment is dependent on the internal and the external called the data session. The problem here is not only the
transport address with the Address (Port) Dependent strategy. realm-specific IP addresses, but the fact that the data session
As long as consecutive connections from the same source to often is established from the public Internet toward the pri-
the same destination are established, the mapping does not vate host, a direction the NAT does not permit (e.g., active
change. As soon as we use a different destination, the NAT FTP).
changes the external port. With a Connection Dependent bind- Unsupported Protocols are typically newly developed trans-
ing, the NAT assigns a new port to every connection. We dis- port protocols such as the SCTP or the DCCP that cause
tinguish between NATs that increase the new port number by problems with NATs even if an internal host initiates the con-
a specific (and well predictable) delta and NATs that assign nection establishment. This is because current NATs do not
random port numbers to the new mappings. have built-in support for these protocols. The unsupported
Endpoint filtering describes how existing mappings can be protocols also cover protocols that cannot work with NATs
used by external hosts and how a NAT handles incoming con- because their layer 3 or layer 4 header is not available for
nection attempts that are not part of a response. Independent translation. This happens when using encryption protocols
Filtering allows inbound connections independent of the such as IPSec.

IEEE Network • September/October 2008 15


MUELLER LAYOUT 9/5/08 1:01 PM Page 16

(1)
(1)
(2) (1) (1)
(2) (3) (2) (3)
(1) (2)
(3)
(3)
Requester Service Service
a) b) c) d)

n Figure 1. NAT traversal service categories for applications: a) RNT; b) GSP; c) SPPS; d) SSP.

NAT Traversal Service Categories creating the mapping in step 2, the service is accessible by any
host, depending on the selected NAT traversal technique and
Instead of classifying the NAT behavior (see classification in the filtering strategy of the NAT. SPPS supports all types of
STUN [4]), we defined four NAT traversal service categories, services where a one-to-one connection is sufficient and pre-
each making different assumptions about the purpose of the signaling is available.
connection establishment and the infrastructure that is avail- The last category, secure service provisioning (SSP), is an
able. Our categorization emphasizes that the applicability of extension of SPPS and addresses scenarios that require autho-
many NAT traversal techniques depends on the support of a rization of the remote party before initiating the NAT traver-
combination of requester, the responder, globally reachable sal process. The hereby established channel must be accessible
infrastructure nodes, and the role of the application. On the only by the authorized remote party. This requires additional
one hand, server applications set up a socket and wait for con- functionality that enforces this policy and only allows autho-
nections (which also applies to P2P applications). On the rized users to access the service. The policy enforcement can
other hand, client applications such as VoIP clients actively be done at the NAT itself, at a data relay, or at a firewall.
initiate a connection and wait for an answer on a different Table II depicts all four service categories with popular NAT
port (bundled session applications). Other applications work traversal techniques and shows the implications for automated
only across NATs if both ends participate in the connection NAT traversal and required signaling. First we distinguish
establishment (unsupported protocols). Thus, we differentiate between the service and the requester. “Support at the ser-
between supporting a service and supporting a client. In this vice” means, for example, that a framework must be deployed
article, the client is called the requester because it actively ini- at the same host providing the service. The same applies to
tiates a connection. the requester. “RP” means that a rendezvous point is required
The behavior of the NAT is important because it allows or for relaying data back and forth. “Signaling messages” means
prohibits certain NAT traversal techniques within one service that some sort of signaling protocol is used for NAT traversal.
category. If only one end implements NAT traversal support Again, we differentiate between signaling at the service and
(e.g., by running a stand-alone framework or by built-in NAT signaling at the requester. A rendezvous point for signaling
traversal functionality), NAT traversal techniques that rely on messages is required in case of pre-signaling. Finally, “stream
a collaboration of both ends (e.g., ICE) are not applicable. independent” describes the requirement for consecutive con-
Our first category, requester side NAT traversal (RNT), nections. For example, a port forwarding entry must be creat-
covers scenarios where only the requester side supports ed only once, whereas hole punching [13] requires sending a
NAT traversal (e.g., the application or the NAT itself). new hole punching packet for every new stream (with restrict-
RNT helps applications that actively participate in the con- ed filtering).
nection establishment and still suffer from the existence of Table 2 shows the main differences of our service cate-
NATs. Typical examples are applications that have prob- gories. RNT deals with bundled session applications that wait
lems with realm-specific IP addresses in their payload. This on a port after initiating a session (e.g., via a SIP INVITE).
applies to protocols using in-band signaling on the applica- GSP requires only support of the service and aims to make a
tion layer, which is related to bundled session applications service globally reachable for multiple clients. SPPS and SSP
with asymmetric connection establishment (e.g., VoIP using combine these categories and require support at both ends.
SIP/SDP). The requester initiates pre-signaling to exchange information
The second category, global service provisioning (GSP), about a global end point. The service then creates a mapping
assumes that the host providing the service implements NAT in the NAT that can be used by the client.
traversal support, helping to make a service globally accessi-
ble. This is done by creating and maintaining a NAT mapping
that then accepts multiple connections from previously Applicability of NAT Traversal Techniques for
unknown clients (Fig. 1). This is the main difference from
RNT, which only creates a NAT mapping for one particular
NAT Traversal Service Categories
session (e.g., one call in the case of VoIP). There are many different techniques for solving the NAT
The last two categories assume support at both ends, the traversal problem in specific scenarios, but none of them pro-
service and the requester. On the one side, NAT traversal is vides a solution that works well with all NATs, applications,
required to make a service behind a NAT globally accessible, and network topologies. Another article explains many of the
whereas on the other side, the support at the requester allows available protocols for NAT traversal [14] in general. This sec-
the use of sophisticated techniques through coordinated tion describes the applicability of existing techniques from the
action. Thus, service provisioning using pre-signaling (SPPS) applications point of view.
extends the GSP category by the assumption that both hosts RNT is required for protocols using in-band signaling (bun-
have interoperable frameworks (e.g., ICE [9]; NAT, URIs, dled session applications). Therefore, one common approach
Tunnels, SIP, and STUNT [NUTSS] [10]; NATBlaster [11]; or is to integrate RNT into these applications (e.g., the VoIP
NatTrav [12]) running. This allows a selection from all avail- client), to establish port bindings on the fly. One possibility is
able NAT traversal solutions, which leads to a high success the integration of a universal plug and play (UPnP) client.
rate of NAT traversal. In Fig. 1, the two hosts use a ren- Another option is to use ALGs that are integrated in the
dezvous point to agree on a NAT traversal technique. After NAT, interpreting in-band signaling and establishing map-

16 IEEE Network • September/October 2008


MUELLER LAYOUT 9/5/08 1:01 PM Page 17

Requires support at Signaling messages


Service Stream-
NAT traversal techniques
category independent
Service Requester RP NAT Service Requester RP STUN

NAT with ALG X


RNT
UPnP (for bundled session
X X X X
applications)

UPnP (port forwarding) X X X X

Hole punching —
GSP X X X X
independent filtering

Open data relay (e.g., RSIP) X X X X X

Hole punching —
X X X X X X
independent binding

UPnP X X X X X X
SPPS
Closed/open data relay
X X X X X X
(e.g., TURN, Skype)

Tunneling (e.g., over UDP) X X X X X

Hole punching —
X X X X X X
restricted filtering

NSIS NATFW NSLP X X X X X


SSP
Closed data relay (e.g.,
X X X X X X
TURN)

Tunneling (e.g., over


X X X X X
secure channel)

n Table 2. Service categories and their implications for automated NAT traversal; RP denotes rendezvous point.

pings accordingly. ALGs are not a general solution because SSP is an extension to SPPS that allows only authorized
the NAT must implement the required logic for each proto- hosts to allocate and to use a mapping. Protocols that autho-
col, and end-to-end security prohibits the interpretation of the rize requests and assume control over the middlebox, such as
signaling by the NAT. middlebox communication (MIDCOM) [16] or the NAT/Fire-
GSP depends on NAT traversal techniques that allow unre- wall Next Step in Signaling (NSIS) Layer Protocol [17] qualify
stricted access to a public end point. A control protocol can for SSP. The advantage of NSIS is that it can discover and
be used to directly establish a port forwarding entry in the configure multiple middleboxes along the data path, thus sup-
mapping tables of the NAT, for instance, with UPnP [15]. porting complex scenarios with nested NATs and multipath
Port forwarding entries created by UPnP are easy to maintain routing. However, if one NAT on the path does not support
and work independently from NAT behavior. However, UPnP the protocol, NSIS fails. Using NSIS and MIDCOM for SSP
only works if the NAT is in the local network on the path to requires restrictive rules that allow only authorized clients to
the other end point. Thus, nested NATs are not allowed, and use the mapping, for instance, by opening pinholes for IP five-
path changes break the connectivity. tuples. UPnP is not useful for SSP because it forwards
Hole punching is an alternative if UPnP is not applicable inbound packets without considering the source transport
and works for NATs with an independent filtering strategy. address. Hole punching can be used only with SSP if the NAT
The mapping must be refreshed periodically, for instance, by implements a restricted filtering strategy. All cases discussed
sending keep-alive packets. For NATs other than full-cone, previously rely on additional measures to prohibit IP spoofing.
hole-punching for GSP cannot be used because the source The use of secure tunnels impedes IP spoofing and allows
port of the request is unknown in advance. secure NAT traversal, even for unsupported protocols (e.g.,
SSPS makes no assumption about the accessibility of a cre- IPSec, SCTP, DCCP). SSP also can be achieved by using
ated mapping, thus all possible techniques are applicable. Dif- traversal using relay NAT (TURN) with authentication,
ferent from GSP, hole-punching for SPPS works as long as authorization, and secure communication (e.g., via transport
port prediction is possible. For NATs implementing restricted layer security [TLS]).
filtering, pre-signaling helps to create the appropriate map- ICE [9] is under standardization by the IETF and strives to
ping because the five-tuple of the connection is exchanged. combine several techniques into a framework flexible enough
Pre-signaling also enables the establishment of an UDP tun- to work with all network topologies. Because ICE requires
nel, allowing the encapsulation of unsupported protocols. both peers to have an ICE implementation running, it can be
SPPS also can use UPnP to establish port forwarding entries seen as a technique for SPPS or SSP, depending on the acces-
for one session. sibility and the security policies of the public endpoint.

IEEE Network • September/October 2008 17


MUELLER LAYOUT 9/5/08 1:01 PM Page 18

NAT traversal request

Requester initiated Access to service

Support at service Support at both ends


Support at both ends Support
at client

Secure Insecure
Secure Insecure endpoint endpoint
endpoint endpoint

SSP SPPS RNT SSP SPPS


GSP

n Figure 2. Decision tree for ANTS.

The same is true for solutions such as TURN [18]. TURN ing NAT traversal support. With the session manager, ANTS
is a promising candidate for SPPS, because it provides a relay can provide GSP and RNT directly. Whenever an application
with a public transport address allowing the exchange of data is added and associated with GSP or RNT, the session manag-
packets between a TURN client and a public host. er calls the NAT traversal logic and asks to allocate an appro-
priate mapping in the NAT. This also requires ANTS to have
Why Unilateral Solutions Exist sufficient knowledge about the applicability of the integrated
Despite the great flexibility of SPPS and SSP, both categories techniques regarding the service categories. For example,
involve a number of assumptions that are not always satisfied. UPnP cannot be used for SSP because it violates the idea of
The most important one is the requirement for both ends an endpoint that is accessible only by authenticated hosts.
(and sometimes also the infrastructure), to support compati- Figure 2 shows a decision tree that ANTS uses to establish
ble versions of the NAT traversal framework. It remains to be a mapping in the NAT. First, we distinguish between requester
seen if the future will bring a sufficiently big deployment of initiated NAT traversal on the one hand and the access to a
one framework on which to rely for arbitrary applications. service on the other hand. Then, we must know which ends
The chances are better within homogeneous problem domains, actually implement ANTS. If both hosts have the framework
like telecommunication, where such frameworks can be inte- running, pre-signaling is possible, which leads to a wide choice
grated with the applications and be distributed in large num- of techniques depending on the security considerations of the
bers. For instance, the adoption of ICE is occurring mainly mapping. If only one end supports ANTS, only techniques
within the VoIP/SIP community and focusing on VoIP specific belonging to GSP or RNT are applicable.
use cases. These drawbacks are the reason why RNT and GSP Despite some unsolved issues such as the question of how
as unilateral solutions for the NAT traversal problems exist. It to connect legacy applications to ANTS (e.g., by using a
is easier to enhance an infrastructure under one responsibility library or a traversal of UDP through NAT [TUN]-based
than to rely on a solution that requires a global deployment. approach), the idea of a knowledge-based framework seems
However, unilateral solutions are limited to the middle-
boxes in the given domain. They fail to provide solutions
to scenarios with nested NATs and depend on the net- S. cat. Prot. Condition Suc. rate
work topology.
UDP (UPnP or HP-UDP) 90.27%
RNT
TCP (UPnP or HP-TCP) 77.84%
Coalescing Unilateral and Cooperative
Approaches for NAT Traversal UDP (Full Cone and HP-UDP) 27.03%
TCP (Full Cone and HP-TCP) 17.30%
When investigating existing NAT traversal techniques, we GSP
UDP (UPnP or (Full Cone and HP-UDP)) 50.27%
determined that none of them can be used in all scenar-
TCP (UPnP or (Full Cone and HP-TCP)) 44.32%
ios. For example, UPnP only supports globally accessible
end points, whereas ICE requires both hosts to run the
framework. In [19], we proposed a new framework that UDP (HP-UDP) 88.65%
aims toward providing an advanced NAT traversal service TCP (HP-TCP) 71.35%
(ANTS) supporting all four service categories. The con- TCP (HP-TCP or HP-UDP) 94.59%
SPPS
cept of ANTS is based on the idea of reusing previously UDP (UPnP or HP-UDP) 90.27%
obtained knowledge about the topology of the network TCP (UPnP or HP-TCP) 77.84%
and the capability of the NAT. A small component of TCP (UPnP or HP-TCP or HP-UDP) 95.14%
ANTS, the NAT tester, is responsible for gathering this
information and will be presented (together with some UDP (Restricted NAT and HP-UDP) 48.65%
test results) in the next section. SSP
TCP (Restricted NAT and HP-TCP) 38.38%
If a user decides that a particular application should be
reachable from the public Internet, he registers it at a ses- n Table 3. Results of the field test: success rates of NAT traversal tech-
sion manager that keeps track of all applications request- niques depending on service categories.

18 IEEE Network • September/October 2008


MUELLER LAYOUT 9/5/08 1:01 PM Page 19

to be the right answer. Thus once implemented, ANTS can between support by service, client, and infrastructure and list-
help many existing services by integrating several techniques ed applicable NAT traversal techniques for each category.
and making its choice based on knowledge about the NAT Our findings from a field test showed that there are a number
and the requirements of the application. of prospective NAT traversal techniques that enable connec-
tivity for each NAT traversal service category. We emphasized
how to build upon this categorization to develop a knowledge-
Field Test on NAT Traversal based NAT traversal framework. Future frameworks that
To prove that existing techniques can be adapted to our ser- aspire to support the typical connectivity scenarios of current
vice categories, we implemented a NAT tester that acts as a applications should support all four service categories.
cornerstone for our new framework. This section presents the
results of a field test investigating 185 NATs in the wild. For a References
detailed description including all results, see our Web site: [1] K. Egevang and P. Francis, “The IP Network Address Translator (NAT),” IETF
http://nettest.net.in.tum.de. RFC 1631, May 1994.
[2] IETF, “Behavior Engineering for Hindrance Avoidance (behave);”
The first test queries a public STUN server to determine http://www.ietf.org
the type of the NAT. Afterward, the NAT tester performs the [3] P. Srisuresh and M. Holdrege, “IP Network Address Translator (NAT) Termi-
following connection tests and tries to establish a connection nology and Considerations,” IETF RFC 2663, Aug. 1999.
to the host behind the NAT: UPnP, hole punching, and con- [4] J. Rosenberg et al., “STUN: Simple Traversal of User Datagram Protocol (UDP)
through Network Address Translators (NATs),” IETF RFC 3489, Mar. 2003.
necting to a data relay (each for both protocols, UDP and [5] E. F. Audet and C. Jennings, “NAT Behavioral Requirements for Unicast
TCP) (Table 3). UDP,” IETF RFC 4787, Jan. 2007.
We then adapted the test results to our work and evaluated [6] S. Guha and P. Francis, “Characterization and Measurement of TCP Traver-
the success rates of the individual techniques regarding our sal through NATs and Firewalls,” Proc. ACM Internet Measurement Conf.,
Berkeley, CA, Oct. 2005.
defined service categories. Table III shows the categories and [7] G. Huston, “Anatomy: A Look Inside Network Address Translators,” The
the conditions that must be met according to the considera- Internet Protocol J., vol. 7, 2004, pp. 2–32.
tions made previously. For example, GSP requires the use of [8] M. Holdrege and P. Srisuresh, “Protocol Complications with the IP Network
UPnP or hole punching support in combination with a full- Address Translator,” IETF RFC 3027, Jan. 2001.
[9] J. Rosenberg, “Interactive Connectivity Establishment (ICE): A Protocol for
cone NAT to make a service globally accessible. Therefore, Network Address Translator (NAT) Traversal for Offer/Answer Protocols,”
50.27 percent of our tested NATs supported a direct connec- IETF Internet draft, work in progress, Oct. 2007.
tion for UDP and category GSP (44.32 percent for TCP). In [10] P. Francis, S. Guha, and Y. Takeda, “NUTSS: A SIP-based Approach to
all other cases (the remaining percentages), an external relay UDP and TCP Network Connectivity,” Cornell Univ., Panasonic Commun.,
tech. rep., 2004.
must be used to provide GSP. [11] A. Biggadike et al., “NATBLASTER: Establishing TCP Connections between
For SPPS, which makes no security assumptions, we divided Hosts behind NATs,” ACM SIGCOMM Asia Wksp., Beijing, China, 2005.
our results into two categories. First we determined the suc- [12] J. Eppinger, “TCP Connections for P2P Applications — A Software
cess rates without considering UPnP. With 88.65 percent of all Approach to Solving the NAT Problem,” Carnegie Mellon Univ., Pittsburgh,
PA, tech. rep., 2005.
NATs, we were able to establish a direct connection to the [13] B. Ford, P. Srisuresh, and D. Kegel, “Peer-to-Peer Communication across
host behind the NAT (71.35 percent for TCP). This rate Network Address Translation,” MIT, tech. rep., 2005.
increased slightly (for TCP to 77.84 percent) when UPnP was [14] H. Khlifi, J. Gregoire, and J. Phillips, “VoIP and NAT/Firewalls: Issues, Traversal
an option. The highest success rate for TCP NAT traversal Techniques, and a Real-World Solution,” IEEE Commun. Mag., July 2006.
[15] U. Forum, “Internet Gateway Device (IGD) Standardized Device Control
(95.14 percent) was discovered when we also allowed the tun- Protocol,” Nov. 2001.
neling of TCP packets through UDP. [16] P. Srisuresh et al., “Middlebox Communication Architecture and Frame-
SSP allows only authorized hosts to create and to use a work,” IETF RFC 3303, Aug. 2002.
mapping. Therefore, a suitable technique for SSP is hole [17] M. Stiemerling et al., “NAT/Firewall NSIS Signaling Layer Protocol (NSLP),”
IETF Internet draft, Feb. 2008.
punching in combination with a NAT implementing a restrict- [18] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays around
ed filtering strategy. This was supported by 48.65 percent for NAT (TURN),” IETF Internet draft, work in progress, June 2008.
UDP and 38.38 percent for TCP. [19] A. Müller, A. Klenk, and G. Carle, “On the Applicability of Knowledge-
The success rate for RNT depends on the effort that is Based NAT-Traversal for Future Home Networks,” Proc. IFIP Networking
2008, Springer, Singapore, May 2008.
made for the specific protocol. For example, if we assume that
we can inspect each signaling packet on the application layer
thoroughly, we could adopt the results from SPPS to RNT. If Biographies
we would only modify the packets in a way that the internal ANDREAS MÜLLER (mueller@net.in.tum.de) received his diploma degree in comput-
er science from the University of Tübingen, Germany in 2007. Currently, he is a
port is reachable by any client, the success rate of GSP would research assistant and Ph.D. candidate at the Network Architecture and Services
apply to RNT. Finally, we did not measure the effect of NATs Department at the Technical University of Munich. His research interests include
with integrated ALGs in this field test. middleboxes, P2P systems, and autonomic networking.

ANDREAS KLENK (klenk@informatik.uni-tuebingen.de) earned his diploma degree


Conclusion in computer science from Ulm University, Germany, in 2003. He is a Ph.D. can-
didate and research assistant at the University of Tübingen and works with Pro-
With the increasing popularity of P2P communication, the fessor Carle. He contributes to European research projects in the
NAT traversal problem has become more urgent than ever. telecommunication field. His research interests include negotiation and security in
autonomic systems.
Existing solutions have the drawback of supporting only cer-
tain types of NATs and cannot be viewed as a general solu- GEORG CARLE (carle@net.in.tum.de) received a M.Sc. degree from Brunel Univer-
tion to the problem. When analyzing the NAT traversal sity London in 1989, a diploma degree in electrical engineering from the Univer-
problem more thoroughly, we discovered that the question of sity of Stuttgart in 1992, and a doctoral degree from the faculty of computer
science, University of Karlsruhe in 1996. He is a full professor in computer sci-
who supports the NAT traversal framework determines which ence at the Technical University of Munich, where he is chair of the Department
NAT traversal techniques are applicable. Therefore, we iden- of Network Architecture and Services. Among the focal interests of his research
tified four NAT traversal service categories that differentiate are Internet technology and mobile communication in combination with security.

IEEE Network • September/October 2008 19


JOSEPH LAYOUT 9/5/08 1:04 PM Page 20

Modeling Middleboxes
Dilip Joseph and Ion Stoica, University of California at Berkeley

Abstract
The lack of a concise and standard language to describe diverse middlebox func-
tionality and deployment configurations adversely affects current middlebox deploy-
ment, as well as middlebox-related research. To alleviate this problem, we present
a simple middlebox model that succinctly describes how different middleboxes pro-
cess packets and illustrate it by representing four common middleboxes. We set up
a pilot online repository of middlebox models and prototyped model inference and
validation tools.

M iddleboxes, like firewalls, NATs, load bal-


ancers, and intrusion-prevention boxes have
become an integral part of networks today.
There is great diversity in how these middle-
boxes process and transform packets, and in how they are
configured and deployed. For example, a firewall is commonly
connected inline on the physical network path and transpar-
online repository of middlebox models at http://www.middle-
box.org, which we envision as filled with models of various
commonly used middleboxes. To ease model construction, we
prototyped a tool that infers hints about the operations of a
particular middlebox through black box testing. We also pro-
totyped a tool that validates the operations of a middlebox
against its model and thus helps detect unexpected behavior.
ently forwards packets unmodified or drops them. A load bal- We discuss these and other applications of our model later.
ancer, on the other hand, rewrites packet headers and contents
and often requires packets to be explicitly IP addressed and
forwarded to it.
The Model
There is currently no standard way to succinctly describe RFC 3234 [1] defines a middlebox as “an intermediary device
the complexity and diversity of middlebox packet processing performing functions other than the normal, standard functions
and deployment mechanisms. Middlebox taxonomies like of an IP router on the datagram path between a source host and
RFC 3234 [1] provide only a high-level classification of mid- destination host.” We refine this high-level definition of a mid-
dleboxes. Details about middlebox operations and deployment dlebox to construct a simple model that describes various aspects
configurations often are buried in different middlebox and of middlebox functionality and operations. A middlebox in our
vendor specific configuration manuals or simply are not docu- model consists of zones, input pre-conditions, state databases, pro-
mented clearly. Efforts like the Unified Firewall Model [2] cessing rules, auxiliary traffic, and the interest and state fields
and BEHAVE [3] provide models to describe the operations deduced from the processing rules. In this section, we describe
of specific middleboxes like firewalls and NATs. and illustrate our model using four common middleboxes —
The lack of a concise and standard language to describe dif- firewall, NAT, layer-4 load balancer, and SSL-offload capable
ferent middleboxes adversely affects current middlebox deploy- layer-7 load balancer. Table 1 describes the notations used.
ment, as well as hinders middlebox-related research. Correctly
deploying and configuring a middlebox is a challenging task by Interfaces and Zones
itself. Without a clear understanding of how different middle- Packets enter and exit a middlebox through one or more of its
boxes process packets and interact with the network and with physical network interfaces. Each physical interface belongs to
other middleboxes, network planning, verification of opera- one or more logical network zones. A zone represents a packet
tional correctness, and troubleshooting become even more entry and exit point from the perspective of middlebox func-
complicated. In our own research experience of designing and tionality. A middlebox processes packets differently based on
implementing the policy-aware switching layer [4] — a new their ingress and egress zones.
mechanism to overhaul the ad hoc manner in which middle- For example, the firewall shown in Fig. 1a has two physical
boxes are deployed in data centers today — the non-availabili- interfaces, one belonging to the red zone that represents the
ty of clear information about how some middleboxes process insecure external network, and the other belonging to the
packets led to initial design decisions that were wrong and that green zone representing the secure internal network. Packets
later manifested as hard-to-debug errors while testing. entering through the red zone are more stringently checked
In this article, we present a general model to clearly and than those entering through the green zone. Similarly, the
succinctly describe the functionality of a middlebox and NAT in Fig. 1b has two different physical network interfaces,
deployment configurations. Through sets of pre-conditions and one belonging to the internal network (zone int) and the
processing rules, the model describes the types of packets other belonging to the external network (zone ext). The
expected by a middlebox and how it transforms them. Later, source IP and port number are rewritten for packets received
we provide more details of our model and illustrate it by rep- at zone int, whereas the destination IP and port number are
resenting four common middleboxes. rewritten for packets received at zone ext. Figure 1c shows a
The middlebox model provides a standard language to con- load balancer with a single physical network interface that
cisely describe different middleboxes. We are building an belongs to two different zones — zone inet representing the

20 0890-8044/08/$25.00 © 2008 IEEE IEEE Network • September/October 2008


JOSEPH LAYOUT 9/5/08 1:04 PM Page 21

∧ Logical AND operation ! Logical NOT operation

sm Source MAC (layer 2) address dm Destination MAC (layer 2) address

si Source IP (layer 3) address di Destination IP (layer 3) address

sp Source TCP/UDP (layer 4) port dp Destination TCP/UDP (layer 4) port

p Packet [hd] Packet with header h and payload d

5tpl Packet 5-tuple: si, di, sp, dp, proto

Xrev Swaps any source-destination IP, MAC, or port number pairs in X

Z(A, p) true if packet p arrived at or departed zone A

I (P, p) Input precondition; true if packet p matches pattern P

C(p) Condition specific to middlebox functionality

newflow?(p) true if packet p indicates a new flow, e.g., TCP SYN

set(A, key → val) Stores the specified key-value pair in zone A’s state database

S : get?(A, key) Returns true and assigns val to S if key → val is present in zone A’s state database

n Table 1. Notations used in this article.

Internet and zone srvr representing the Web server farm. The grained measurement can cause discrepancies between the
load balancer spreads out packets received at zone inet to model predicted behavior of a middlebox and its actual opera-
Web server instances in zone srvr. tions. A middlebox behavior is predicted by the model. So the
We assume that the mapping between interfaces and zones model predicted behavior of a middlebox may be better than
is pre-determined by the middlebox vendor or configured dur- its actual operations. As we illustrate in the next section, we
ing middlebox initialization. Frames reaching an interface use special processing rules to flag such possible discrepancies.
belonging to multiple zones are distinguished by their virtual
local area network (VLAN) tags, IP addresses, and/or trans- Processing Rules
port port numbers. Processing rules model the core functionality of a middlebox.
A processing rule specifies the action taken by a middlebox
Input Preconditions when a particular condition becomes true. For example, the
Input preconditions specify the types of packets that are processing of an incoming packet is represented by a rule of
accepted by a middlebox for processing. For example, a trans- the general form:
parent firewall processes all packets received by it, whereas a
load balancer in a single-legged configuration processes a Z(A, p) ∧ I (P, p) ∧ C (p) ⇒ Z (B, T (p)) ∧ state ops
packet arriving at its inet zone only if the packet is explicitly
addressed to it at layers 2, 3, and 4. Similarly, a NAT process- The above rule indicates that a packet p reaching zone A of
es all packets received at its int zone, but requires those the middlebox is transformed to T(p) and emitted out through
received at its ext zone to be addressed to it at layers 2 and 3. zone B, if it satisfies the input precondition I(P, p) and a mid-
Input pre-conditions are represented using a clause of the dlebox-specific condition C(p). In addition, the middlebox may
form I (P, p), which is true if the headers and contents of pack- update state associated with the TCP flow or application ses-
et p match the pattern P. For example, the firewall has the input sion to which the packet belongs. We now present concrete
precondition I (< * >, p), and the load balancer has I (< dm = examples of processing rules for common middleboxes.
MAC LB , di = IP LB , dp = 80 >, p) for its inet zone, where
MAC LB and IP LB are the layer-2 and layer-3 addresses of the Firewall — First, consider a simple stateless layer-4 firewall
load balancer. Although I (< * >, p) is a tautology, we still that either drops a packet received on its red zone or relays it
explicitly specify it in the firewall model to enhance model clari- unmodified to the green zone. This behavior can be repre-
ty. sented using the following two rules:

State Database Z(red, p) ∧ I(< * >, p) ∧ Caccept(p) ⇒ Z(green, p)


Most middleboxes maintain state associated with the flows and Z(red, p) ∧ I(< * >, p) ∧ Cdrop(p) ⇒ DROP(p)
sessions they process. Our model represents state using key-value
pairs stored in zone-independent or zone-specific state databases. Since I(< * >, p) is a tautology, whether a packet is
Processing rules (described next) record the state using the set dropped or accepted by the firewall is solely determined by
primitive and query state using the get? primitive. the Caccept and Cdrop clauses that represent the filtering func-
Accurately tracking state removal is hard, unless explicitly tionality of the firewall. Common filtering rules can be repre-
specified by the del primitive in a processing rule. Although sented easily using the appropriate Boolean expressions (e.g.,
state expiration timeouts can be specified as part of the set Caccept(p) : p.di = 80 || p.si = 128.34.45.6). For more com-
primitive, inaccuracies in timeout values or in their fine- plex filtering rules, we leverage external middlebox-specific

IEEE Network • September/October 2008 21


JOSEPH LAYOUT 9/5/08 1:04 PM Page 22

Red Green
zone zone
(a) Insecure (iv), is keyed by [h.si, h.sp, h.di, h.dp] rather than by just
Secure
external internal [h.si, h.sp]. A symmetric NAT is also more restrictive than a
network network full cone NAT. It relays a packet with header [IP s , IP NAT ,
PORTs, PORTd] from the ext zone only if it had earlier received a
packet destined to IPs : PORTs at the int zone and had rewritten
External Internal its source port to PORTd. This restrictive behavior is captured by
zone zone keying the zone ext state set in rule (i) and retrieved in rules (iii)
(b)
and (v) with [h.di, h.dp, newport] rather than with just new-
External port. Other NAT types, like restricted cone and port restricted
network NAT Internal
network cone, can be easily represented with similar minor modifications.

Layer-4 Load Balancer — Next, we present a layer-4 load bal-


ancer, which unlike the NAT in the previous example, rewrites
Internet Server
the destination IP address of a packet to that of an available
zone zone Web server (rule box 2).

Web servers
Rule (i) describes how the load balancer processes the first
(c) packet of a new flow received at its inet zone. The load bal-
ancer dynamically selects a Web server instance Wi for the flow
and records it in the state database of the inet zone. It
rewrites the destination IP and MAC addresses of the packet to
Wi using the destination NAT (DNATfwd) transformation func-
Internet Switch tion and then emits it out through the srvr zone. It also
records this flow in the state database of the srvr zone, keyed
n Figure 1. Zones of different middleboxes: a) firewall; b) NAT; by the five-tuple of the packet expected there in the reverse flow
and c) load balancer in single-legged configuration. direction. Rule (ii) specifies that subsequent packets of the flow
simply will be emitted out after rewriting the destination IP and
MAC addresses to those of the recorded Web server instance.
models like the Unified Firewall Model [2] to construct the Rule (iii) describes how the load balancer processes a packet
appropriate C clauses. Rules for packets in the green → red received from a Web server. It verifies the existence of flow state
direction are similar. for the packet and then emits it out through the inet zone
after applying the reverse DNAT transformation — that is,
NAT — Next, consider another very common middlebox — a rewriting the source IP and MAC addresses to those of the load
NAT. Unlike the firewall in the previous example, a NAT balancer and the destination MAC to the next hop IP gateway.
rewrites packet headers and maintains per-flow state. We first Although the Web server instance selection mechanism is
describe the processing rules (rule box 1) for a full cone NAT beyond the scope of our general model, the load balancer model
and then, with minor modifications, change it to represent a easily can be augmented with primitives to represent common
symmetric NAT. selection mechanisms like least loaded and round robin. In the
Rule (i) describes how a full cone NAT processes a packet previous example, we assumed that the load balancer was set as
[hd] with a previously unseen [si, sp] pair received at its int the default IP gateway at each Web server. Other load balancer
zone. It allocates a new port number using a standard mecha- deployment configurations (e.g., direct server return or source
nism like random or sequential selection, or using a custom NAT) can be represented with minor modifications.
mechanism beyond the scope of our general model. It stores
[si, sp] → newport and newport → [si, sp] in the state Layer-7 Load Balancer — We now present our most complex
databases of zone int and zone ext, respectively. It rewrites example, a layer-7 SSL offload-capable load balancer. This
the packet header h by applying the source NAT (SNATfwd) example illustrates how our model describes a middlebox
transformation function — the source medium access control whose processing spans both packet headers and contents and
(MAC) and IP addresses are replaced with the publicly visible is not restricted to one-to-one packet transformations. The
addresses of the NAT, the source port with the newly allocated layer-7 load balancer is the end point of the TCP connection
port number, and the destination MAC with the next hop IP from a client (the CL connection). Because accurately model-
gateway of the NAT. The packet with the rewritten header and ing TCP is very hard, we abstract it using a black box TCP
unmodified payload is then emitted out through the ext zone. state machine tcp CL and buffer the data received from the
Rule (ii) specifies that the NAT emits a packet with a previously client in a byte queue DCL. The I clauses are similar to those in
seen [si, sp] pair through zone ext, after applying SNATfwd the layer-4 load balancer and hence not repeated in rule box 3.
with the port number recorded in rule (i). Rule (iii) describes Rule (i) specifies that the load balancer creates tcpCL and
how the NAT processes a packet reaching the ext zone. It DCL and records them along with the packet header on receiv-
retrieves the newport → [si, sp] state recorded in rule (i) ing the first packet of a new flow from a client at the inet
using the destination port number of the packet, applies the zone. Rule (ii) specifies how the TCP state and data queue of
reverse source NAT transformation function(SNATrev), and the CL connection are updated as the packets of an existing
then emits the modified packet through zone int. Rule (iv) and flow arrive from the client. Rule (iii), triggered when tcpCL
Rule (v) flag discrepancies resulting from the inaccuracy of the has data or acknowledgments to send, specifies that packets
model in tracking state expiration. The NAT may drop a packet from the load balancer to the client will have header hrev CL
arriving at its int or ext zone because the state associated (with appropriate sequence numbers filled in by tcpCL) and
with the packet expired without the knowledge of the model. payload read from the DLS queue, if it was already created by
Unlike a full cone NAT, a symmetric NAT allocates a sepa- the firing of rule (iv). Rule (iv), triggered when the data col-
rate port for each [si, sp, di, dp] tuple seen at its int zone, lected in D CL is sufficient to parse the HTTP request URL
rather than for each [si, sp] pair. Thus, for a symmetric NAT, and/or cookies, specifies that the load balancer selects a Web
the zone int state set in rule (i) and retrieved in rules (ii) and server instance Wi and opens a TCP connection to it, that is,

22 IEEE Network • September/October 2008


JOSEPH LAYOUT 9/5/08 1:04 PM Page 23

Z(int, [hd]) Z(ext, [SNATfwd(h,newport)d])


∧ I(<*>, [hd]) ∧ set(int, [h.si, h.sp] – newport)
(i) ∧ IS :
get?(int, [h.si, h.sp])
∧ set(ext,newport – [h.si, h.sp])
Utility of a Middlebox Model
SNATfwd([sm, dm, =[MACNAT,MACgw,IPNAT,di,PORT,dp] A middlebox model is useful only if it can easily represent many
si, di, sp, dp], PORT) real-world middleboxes and has practical applications. In this sec-
Z(int, [hd]) tion, we first describe how we constructed the models described in
(ii) ∧ I(<*>, [hd]) Z(ext, [SNATfwd(h,S)d]) the previous section and then discuss the applications of our model
∧ S : get?(int, [h.si, h.sp]) in planning and troubleshooting existing middlebox deployments
and in guiding the development of new network architectures.
Z(ext, [hd])
∧ I(<di=IPNAT,
(iii) dm=MACNAT >, [hd]) Z(int, [SNATrev(h,S.si, S.sp)d]) Model Instances
∧ S : get?(ext, h.dp) The models for the firewall, NAT, and layer-4 and layer-7
load balancers illustrated in the previous section were con-
SNATrev([sm, dm, si, di, =[MACNAT,MACIP,si,IP,sp,PORT]
sp, dp], IP, PORT) structed by analyzing generic middlebox descriptions and tax-
onomies (like RFC 3234 [1]), consulting middlebox-specific
Z(int, [hd]) DROP([hd]) manuals, and observing the working of the following real-
(iv) ∧ I(<*>, [hd]) ∧ WARN(inconsistent state) world middleboxes:
∧ S : get?(int, [h.si, h.sp])
• Linux Netfilter/iptables software firewall
Z(ext, [hd]) • Netgear home NAT
∧ I(<di=IPNAT, DROP([hd]) • BalanceNg layer-4 software load balancer
(v) dm=MACNAT >, [hd]) ∧ WARN(inconsistent state) • HAProxy layer-7 load balancer Vmware appliance
∧ S : get?(ext, h.dp)
We prototyped a black box testing-based model-inference
tool to aid middlebox model construction. The tool infers
n Rule box 1. hints about the operations of a middlebox by carefully sending
different kinds of packets on one zone and observing the
packets emerging from other zones, as illustrated in Fig. 2.
creates tcp LS and DLS. It also installs a pointer to the state The following are some of the inferences generated by it:
indexed by the DNATed header hLS in the state database of • The firewall does not modify packets; all packets sent by the
the srvr zone. Rule (v) shows how this state is retrieved, tool emerge unmodified or are dropped.
and its tcp LS and D LS are updated, on receipt of a packet • The load balancers only process packets addressed to them
from a Web server. Rule (vi) specifies the header and payload at layers 2, 3, and 4.
of packets sent by the load balancer to a Web server instance • The layer-4 load balancer rewrites the destination IP and
— hLS and data read from DCL. MAC addresses of packets in the inet → srvr direction
The rules listed above represent a plain layer-7 load bal- and the source addresses in the reverse direction. This infer-
ancer. By replacing the + and read data queue operations ence was made by pairing and analyzing packets with identical
with +ssl and readssl operations that perform SSL encryption payloads seen at the two zones of the load balancer. By using
and decryption on the data, we can represent an SSL offload- a relaxed payload similarity metric, the header rewriting rules
capable load balancer without disturbing other rules. Similar for even the layer-7 load balancer were partially inferred.
to the TCP black box, we abstract out the details of the SSL • The layer-4 load balancer caches source MAC addresses of
protocol. packets processed by it in the inet → srvr direction and
uses them in packets in the reverse direction. This inference
Auxiliary Traffic was made by correlating rewritten packet header fields with
In addition to its core functionality of transforming and for- values seen in earlier packets.
warding packets, a middlebox can generate additional traffic, Our inference tool is quite basic and serves only as an aid
either independently or when triggered by a received packet. for model construction. It is not fully automated; for example,
For example, a load balancer periodically checks the liveness it requires the IP address and TCP port of the load balancer
of its target servers by making TCP connections to each serv- as input to avoid an exhaustive IP address search for packets
er. It also can send an Address Resolution Protocol (ARP)
request for the layer-2 address of the Web server assigned to
a received packet. Such packets generated by middleboxes Z(inet, [hd]) ∧ Z(srvr, [DNATfwd(h,Wi)d])
and their responses, which support middlebox functionality, I(<dm=MACLB, di = IPLB, ∧ set(inet, h.5tpl – Wi)
are referred to as auxiliary traffic in our model. (i) dp=80>,[hd]) ∧ set(srvr,
Auxiliary traffic is represented using processing rules, as ∧ newflow?([hd]) DNATfwd(h,Wi)rev.5tpl
– true)
well. For example, the auxiliary traffic associated with the
load balancer can be represented in rule box 4. DNATfwd([sm, dm, si, di,
The PROBE function returns a set of packets to check the sp, dp], W) = [SM,MACW,si,IPW,sp,dp]
liveness of server W i. In the simple case, these are just the Z(inet, [hd]) ∧
TCP hand-shake packets with the appropriate sm, dm, si, di, I(<dm=MACLB, di=IPLB,
sp, and dp. (ii) dp=80 >,[hd]) Z(srvr,[DNATfwd(h,S)d])
∧ !newflow?([hd])
∧ S : get?(inet, h.5tpl)
Interest and State Fields
The interest fields of a middlebox identify the packet fields of Z(srvr, [hd])^
I(<sm=MACWi, si=IPWi,
interest, that is, the fields it reads or modifies. The state fields (iii) sp=80 >,[hd]) Z(inet,[DNATrev(h)d])
identify the subset of the interest fields used by the middlebox ∧ S : get?(srvr, h.5tpl)
in storing and retrieving state. Although these fields can be
deduced from the processing rules, they are explicitly present- DNATrev([sm,dm,si,di,sp,dp]) = [MACLB,MACgw,IPLB,di,sp,dp]
ed in the model because they can highlight succinctly unex-
pected aspects of middlebox processing. n Rule box 2.

IEEE Network • September/October 2008 23


JOSEPH LAYOUT 9/5/08 1:04 PM Page 24

Z(inet, [hd]) set(inet, h.5tpl – PERIODIC Z(srvr, PROBE(IPWi))


(i) ∧ I(...) [tcpCL = TCP.new,
∧ newflow?([hd]) DCL = Data.new, hCL = h]) Z(inet,[hd])
∧ S : get?(inet, h.5tpl) Z(srvr, ARPREQ(IPS))
Z(inet, [hd]) ∧ !S’ : get?(-,IPS)
∧ I(...)
(ii) ∧ !newflow?([hd]) S.tcpCL.rev(h)
∧ S.DCL+d Z(srvr, ARPRPLY (IP, MAC))
∧ S : get?(inet, h.5tpl) set(-,IP – MAC)
Z(inet,
n Rule box 4.
(iii) S.tcpCL.ready?
S.tcpCL.send(S.hrevCL,S.DLS.read))

(iv) S.DCL.url? S.hLS = DNATfwd(S.hCL, Wi)


∧ set(srvr,S.hrevLS.5tpl – S)
∧ S.DLS = Data.new different parts of a network. This information can be used to stat-
Z(srvr, [hd]) ∧ S.tcpLS = TCP.new ically analyze and detect problems with a middlebox deployment
before actual network rollout. It also aids in troubleshooting exist-
(v) ∧ I(...) S.tcpLS.recv(h) ing middlebox deployments and enhances automated traffic moni-
∧ S : get?(srvr, h.5tpl) ∧ S.DLS+d
toring and anomaly detection. For example, the model validation
(vi) S.tcpLS.ready? Z(srvr, tool helped us detect unexpected NAT behavior in the home net-
S.tcpLS.send(S.hLS,S.DCL.read)) work of one of the authors. The author’s home NAT was not
rewriting the source port numbers of the packets sent by internal
n Rule box 3. hosts. The tool automatically flagged this behavior as a violation
of rules (i) and (ii) of our NAT model. We expected the multi-
interface home NAT to use source port translation to support
accepted by it. The inferred packet header transformation simultaneous TCP connections to the same destination from the
rules and state fields may not be 100 percent accurate and same source port on multiple internal hosts. The failure of such
thus only serve to guide further analysis. For middleboxes like simultaneous TCP connections on further investigation confirmed
SSL offload boxes that completely transform packet payloads, the anomaly. Although a small example, this experience indicates
the tool cannot infer the processing rules. that our middlebox model holds practical utility in detecting unex-
We believe that completely inferring middlebox models pected middlebox behavior.
through black box testing alone is impossible. If the source
code for a middlebox implementation were available, we Guide Networking Research
hypothesize that automatic white box software test-generation Our middlebox model provides networking researchers with
tools like directed automated random testing (DART) [5] can clear and concise descriptions of how various middleboxes
be adapted to infer middlebox model parameters. Automati- operate. Such information is very useful for researchers, as
cally parsing middlebox configuration manuals to extract mod- well as companies involved in developing new network archi-
els is another open research direction. tectures, especially those that deal with middleboxes [6]. Not
We envision an online repository containing models of only does it provide hints to make a new architecture compat-
common middleboxes. We set up a pilot version of such a ible with existing middleboxes, but it also helps identify mid-
repository at http://www.middlebox.org with the models dleboxes that cannot be supported.
described in this article. We hope that middlebox manufactur- In retrospect, the availability of a middlebox model would
ers and network administrators who use middleboxes will con- have benefited our research greatly on designing the policy-
tribute additional models to the repository. aware switching layer (PLayer) [4], alluded to earlier. The
We also prototyped a model validation tool that analyzes PLayer consists of enhanced layer-2 switches (pswitches) that
traffic traces collected from the different zones of a middle- explicitly forward packets to the middleboxes specified by a net-
box and verifies whether its operations are consistent with its work administrator. In our original (erroneous) design, pswitch-
model downloaded from the repository. Apart from flagging es rewrote the source MAC addresses of packets processed by
errors and incompleteness in the models themselves, the vali- a transparent firewall to a unique dummy MAC address to
dation tool can be used to detect unexpected middlebox mark packets that had already been processed by the firewall.
behavior, as we describe next. Contrary to our expectation of the load balancer to use ARP, it
cached the dummy source MAC addresses of packets in the
Network Planning and Troubleshooting forward flow direction and used them to address packets in the
The middlebox model clearly describes how various middle- reverse direction. Such packets never reached their intended
boxes under different configurations interact with the network destinations. The presence of source MAC address in the inter-
and with each other in a standard and concise format. This est and state fields of the load balancer would have helped us
information aids in planning new middlebox deployments and more quickly debug this problem. Moreover, it would have
in monitoring and troubleshooting existing ones. warned us against rewriting the source MAC address in our
The input preconditions of a middlebox specify the types of original design, thus avoiding a time-consuming redesign.
packets expected by it and thus help a network architect plan the
network topology and middlebox placement required to deliver
the correct packets to it. The input preconditions and processing
Limitations
rules together help in analyzing the feasibility of placing different The model presented in this article is only a first step toward
middleboxes in sequence. For example, because the right-hand modeling middleboxes. Its three main limitations are:
sides of the firewall processing rules do not interfere with the • The inability to describe highly-specific middlebox opera-
conditions on the left-hand sides of the load balancer processing tions in detail
rules, the firewall can be placed in front of the load balancer with • The lack of formal coverage proofs
little scrutiny. However, placing the load balancer before the fire- • The complexity of model specification
wall requires more careful analysis as the destination address The goal of building a general middlebox model that can
rewriting indicated by the processing rules of the load balancer describe a wide variety of middleboxes precludes our model
may interfere with the Caccept and Cdrop clauses of the firewall. from representing functionality that is very specific to a partic-
The middlebox processing rules specify the packets flowing in ular middlebox. We can extend our model easily using middle-

24 IEEE Network • September/October 2008


JOSEPH LAYOUT 9/5/08 1:04 PM Page 25

Internet Server
zone zone rower and more detailed focus on how middle-
boxes operate. Reference [10] uses detailed mea-
surement techniques to evaluate the performance
and reliability of production middlebox deploy-
ments. We plan to investigate how the techniques
described in these papers can enhance our model
Control
inference and validation tools.
packet Observe RFC 3234 [1] presents a taxonomy of middle-
sending boxes. Our model goes well beyond a taxonomy
and describes middlebox packet processing in
Model inference tool more detail using a concise and standard lan-
guage. In addition, our model can naturally induce
n Figure 2. Middlebox model inference tool analyzing a load balancer.
a more fine-grained taxonomy on middleboxes
(e.g., “middleboxes that rewrite the destination IP
and port number” versus “middleboxes operating
at the transport layer”). Our model does not cur-
box-specific models like the Unified Firewall Model as rently consider the middlebox failover modes and functional ver-
described earlier, although at the expense of reducing model sus optimizing roles identified by RFC 3234.
simplicity and conciseness. The desire for simplicity and con- The Unified Firewall Model [2] and IETF BEHAVE [3]
ciseness also limits our model from capturing accurate timing working group characterize the functionality and behavior of
and causality between triggering of different processing rules. specific middleboxes — firewalls and NATs in this case. Guid-
On the other hand, our model may not be general enough ed by these efforts, we construct a general model that applies
to describe all possible current and future middleboxes. to a wide range of middleboxes and enables us to compare
Although we represented many common middleboxes in our different middleboxes and study their interactions. Further-
model and are not aware of any existing middleboxes that more, these specific models can be plugged into our general
cannot be represented, we are unable to formally prove that model and alleviate the limitations of model generality.
our model covers all possible middleboxes.
The model for a particular middlebox consists of a small
number (typically < 10) of processing rules. However, con-
Conclusion
structing the model itself is a non-trivial task even with support In this article, we presented a simple middlebox model and
from our model inference and validation tools. We expect illustrated how various commonly used middleboxes can be
models to be constructed by experts and shared through an described by it. The model guides middlebox-related research
online model repository, thus making them easily available to and aids middlebox deployments. Our work is only an initial
all, without requiring widespread model construction skills. step in this direction and calls for the support of the middle-
box research and user communities to further refine the
model and to contribute model instances for the many differ-
Related Work ent kinds of middleboxes that exist today.
The middlebox model described in this article is placed at an inter-
mediate level in between related work on very general network References
[1] “Middleboxes: Taxonomy and Issues,” RFC 3234.
communications models and very specific middlebox models. [2] G. J. Nalepa, “A Unified Firewall Model for Web Security,” Advances in
An axiomatic basis for communication [7] presents a general Intelligent Web Mastering.
network communications model that axiomatically formulates [3] “Behavior Engineering for Hindrance Avoidance”; http://www.ietf.org/
packet forwarding, naming, and addressing. This article presents html.charters/behave-charter.html
[4] D. Joseph, A. Tavakoli, and I. Stoica, “A Policy-Aware Switching Layer for
a model tailored to represent middlebox functionality and oper- Data Centers,” Proc. SIGCOMM, 2008.
ations. The processing rules and state database in our model are [5] P. Godefroid, N. Klarlund, and K. Sen, “DART: Directed Automated Random
similar to the forwarding primitives and local switching table in Testing,” Proc. PLDI, 2005.
[7]. As part of future work, we plan to investigate the integra- [6] M. Walfish et al., “Middleboxes No Longer Considered Harmful,” Proc.
OSDI, 2004.
tion of the two models and thus combine the practical benefits [7] M. Karsten et al., “An Axiomatic Basis for Communication,” Proc. SIG-
of our middlebox model (e.g., middlebox model inference and COMM ’07.
validation tools, model repository) and the theoretical benefits [8] T. Roscoe et al., “Predicate Routing: Enabling Controlled Networking,” SIG-
of the general communications model (e.g., formal validation of COMM Comp. Commun. Rev., vol. 33, no. 1, 2003.
[9] S. Kandula, R. Chandra, and D. Katabi, “What’s Going On? Learning Com-
packet forwarding correctness through chains of middleboxes). munication Rules in Edge Networks,” Proc. SIGCOMM, 2008.
Predicate routing [8] attempts to unify security and routing [10] M. Allman, “On the Performance of Middleboxes,” Proc. IMC, 2003.
by declaratively specifying network state as a set of Boolean
expressions dictating the packets that can appear on various Biographies
links connecting together end nodes and routers. This DILIP JOSEPH (dilip@cs.berkeley.edu) received his B.Tech. degree in computer science
from the Indian Institute of Technology, Madras, in 2004 and his M.S. degree in
approach can be extended to represent a subset of our mid- computer science from the University of California at Berkeley in 2006. He is current-
dlebox model. For example, Boolean expressions on the ports ly a Ph.D. candidate at the University of California at Berkeley. His research interests
and links (as defined by predicate routing) of a middlebox can include data center networking, middleboxes, and new Internet architectures.
specify the input preconditions of our model and indirectly
ION STOICA (istoica@cs.berkeley.edu) received his Ph.D. from Carnegie Mellon
hint at the processing rules and transformation functions. University in 2000. He is an associate professor in the EECS Department at the
From a different perspective, middlebox models from our University of California at Berkeley, where he does research on peer-to-peer net-
repository can aid the definition of the Boolean expressions in work technologies in the Internet, resource management, and network architec-
a network implementing predicate routing. tures. He is the recipient of the 2007 Rising Star Award, a Sloan Foundation
Fellowship (2003), a Presidential Early Career Award for Scientists and Engi-
Reference [9] uses statistical rule mining to automatically neers (PECASE) (2002), and the ACM doctoral dissertation award (2001). In
group together commonly occurring flows and learn the under- 2006 he co-founded Conviva, a startup company to commercialize peer-to-peer
lying communication rules in a network. Our work has a nar- technology for video distribution.

IEEE Network • September/October 2008 25


TÜXEN LAYOUT 9/5/08 1:00 PM Page 26

Network Address Translation for the Stream


Control Transmission Protocol
Michael Tüxen and Irene Rüngeler, Münster University of Applied Sciences
Randall Stewart, The Resource Group
Erwin P. Rathgeb, University of Duisburg-Essen

Abstract
Network address translation is widely deployed in the Internet and supports the
Transmission Control Protocol and the User Datagram Protocol as transport layer
protocols. Although part of the kernels of all recent Linux distributions, namely, the
FreeBSD 7 and the Solaris 10 operating systems, the new Internet Engineering
Task Force transport protocol — Stream Control Transmission Protocol — is not
supported on most NAT middleboxes yet. This article discusses the deficiencies of
using existing NAT methods for SCTP and describes a new SCTP-specific NAT con-
cept. This concept is analyzed in detail for several important network scenarios,
including peer-to-peer, transport layer mobility, and multihoming.

N etwork address translation (NAT) is a common


method for separating private networks from
global networks by translating private Internet
Protocol (IP) addresses to global IP addresses.
Often there is only one global IP address available for multi-
ple hosts inside the private network. In this case, the transport
layer port number also is modified, and the method is called
is not based on these protocols. Their applicability for SCTP
is analyzed. An SCTP-specific method for NAT middleboxes
that overcomes the deficiencies of the generic methods is
described. Several examples are given explaining in detail how
the SCTP-specific NAT method works for different scenarios
including single-homed and multihomed client-server scenar-
ios, peer-to-peer scenarios, and transport-layer mobility sce-
network address and port number translation (NAPT). NAT narios. Then, conclusions are presented.
and NAPT have been in use for the Transmission Control
Protocol (TCP) and the User Datagram Protocol (UDP) for a
long time, but the Stream Control Transmission Protocol Introduction to the Stream Control
(SCTP), as a fairly new transport protocol is not supported
yet. Applying this method also to SCTP does not work for
Transmission Protocol
multihomed associations. SCTP is currently specified in [5]. It was standardized by the
Currently, NAT implementations that support SCTP in a IETF as the generic transport protocol for signaling transport
way similar to TCP or UDP are being developed first. in IP-based telephone signaling networks.
Although this works well for single-homed SCTP associations, SCTP is a connection-oriented protocol providing reliable
it does not work for multihomed SCTP associations. This transport of user messages. It supports IPv4 and IPv6 as a net-
makes these solutions non-applicable for typical SCTP appli- work layer. A connection between two SCTP end points is
cations that require multihoming. However, in these cases, called an SCTP association or just an association.
some vendors and operators also want to use NAT middle- One of the major design goals was network fault tolerance,
boxes for various reasons. Therefore, it is important to have and therefore, each SCTP end point can use multiple IP-
NAT middleboxes that not only support SCTP in a limited addresses within each association but only one port number.
way, but with all features, especially multihoming. Each IP address of the peer can be used as the destination
In [1] and [2], the authors of this article describe an address of a packet. Currently, this multihoming support is
approach to integrate SCTP in network address translators for used only for redundancy, but ongoing research is analyzing
single-homed client-server communication. This article the possibility of also using it for load sharing.
extends this method in a way that also works in the case of SCTP is already part of all recent Linux distributions — the
multihomed and peer-to-peer scenarios. Additionally, it covers Solaris 10 operating system and the FreeBSD 7.0 release. It is
the case of transport layer mobility or routing changes in the deployed to signal networks of telephony network operators
network. These additions also will be provided to the Internet and is used in IP-based signaling for universal mobile telecom-
Engineering Task Force (IETF) for standardization. munication system (UMTS) networks. Other applications
The structure of this article is as follows: first, we provide using SCTP include the IP Flow Information Export (IPFIX)
an introduction to SCTP, emphasizing the features that are protocol, Diameter, and the Reliable Server Pooling (RSer-
relevant for this article. We discuss generic NAT and NAPT Pool) protocol suite. It should be noted that SCTP was the
methods for traffic based on TCP or UDP and for traffic that first transport protocol specified by the IETF in 2000 and

26 0890-8044/08/$25.00 © 2008 IEEE IEEE Network • September/October 2008


TÜXEN LAYOUT 9/5/08 1:00 PM Page 27

Host A Host B Host A Host B Host A Host B


INIT INIT INIT Both end points can start the four-way hand-
shake at about the same time, and the SCTP
setup procedure ensures that exactly one
INIT-ACK INIT-ACK INIT-ACK association is established. This is called a
COOKIE-ECHO COOKIE-ECHO INIT collision case. An example message flow is
CO
OK shown on the right-hand side in Fig. 1.
IE-
I EC It is also possible that one side starts the
NIT-A H O
COOKIE-ACK COOKIE-ACK CK association procedure while the peer is still
in the established state. This might happen,
K
KI E-AC for example, if one side reboots without tear-
COO ing down the association and then starts the
INIT association setup procedure. The four-way
handshake succeeds, and for the server side,
the association restarts. One example is
n Figure 1. Examples of the SCTP association setup. shown in the middle of Fig. 1; detailed
descriptions of the handling of all the possi-
ble cases is given in [9].
If an SCTP end point must terminate an
deployed in commercial networks after the introduction of association immediately, it can send a packet containing an
UDP and TCP in the 1980s. Four years later, a modification ABORT chunk. This chunk also is sent in response to
of UDP with limited checksum coverage — UDP-Lite — was almost all packets for which no association can be looked
standardized and is used in Third Generation Partnership up. On reception of an ABORT chunk, the association is
Project (3GPP) networks. In 2006, the IETF standardized the terminated. Error conditions can be signaled by sending an
Datagram Congestion Control Protocol (DCCP). Currently, ERROR chunk. ABORT and ERROR chunks can include
neither UDP-Lite nor DCCP are available on major operating the causes of the error in order to provide more detailed
systems. information. In addition to the base protocol, several exten-
An SCTP packet consists of a common header followed by sions also were standardized and implemented. The SCTP
a number of chunks. The common header contains source and extension that is crucial for this article is the ability to add
destination port numbers similar to TCP or UDP headers, a or delete IP addresses dynamically during the lifetime of an
32-bit verification tag and a CRC32C checksum. The check- SCTP association. This is specified in [6]. If an SCTP end
sum covers only the SCTP packet and does not take any kind point wants to add or delete an IP-address, it sends an
of pseudo header into account. Each chunk consists of a type address configuration change (ASCONF) chunk that con-
field, eight flags, a length field, and type-specific data. Fur- tains the address to be added or deleted and an address that
thermore, it is padded at the end to be 32-bit aligned. can be used to look up the association, the so-called lookup
The basic association setup procedure is based on a four- address. When the peer has processed an ASCONF chunk,
way handshake and follows the client-server principle. It is it sends back an address configuration acknowledgment
shown on the left-hand side of Fig. 1. The first SCTP mes- (ASCONF-ACK) chunk. There is a special rule that if the
sage is sent from the client to the server. It contains exactly address to be added is the wildcard address (0.0.0.0 for IPv4
one chunk, the initiation (INIT) chunk. The INIT chunk con- or ::0 for IPv6), the source address of the packet containing
tains a 32-bit random number, the initiate tag, and the list of the ASCONF chunk is added. If the address to be deleted is
IP-addresses used by the client. The server responds with an the wildcard address, all addresses except for the source
SCTP message, which also contains just one chunk, the initia- address of the packet containing the ASCONF chunk are
tion acknowledge (INIT-ACK) chunk. It also contains a 32- deleted.
bit random initiate tag and the list of addresses of the server.
If the client or server is single-homed, the list of addresses in
the INIT or INIT-ACK chunk should be empty. After the Applicability of Generic Methods for NAT or
server has sent the INIT-ACK chunk, it does not hold any NAT Traversal
state regarding the association. Instead, it puts all informa-
tion in a state cookie, which itself is put into the INIT-ACK UDP or TCP-like Network Address and Port Number
chunk. On reception of the INIT-ACK chunk, the client
sends the state cookie in a COOKIE-ECHO chunk to the Translation
server. On reception of the COOKIE-ECHO chunk, the serv- NAT in its original meaning is realized by changing the (pri-
er responds with a COOKIE-ACK chunk, and the association vate) IP address of the client to a global address of the NAT
is established. Other chunks might be bundled with the middlebox and keeping this correlation in a table (Fig. 2).
COOKIE-ECHO or COOKIE-ACK chunk in the third and Thus, the server addresses its packets to this global address,
fourth message. reaches the NAT, which substitutes the destination address
The verification tag in the common header is always the with the address of the client. This is a feasible method, as
initiate tag sent by the peer in the INIT or INIT-ACK mes- long as the source ports of the clients connecting to the same
sage during the association setup. This is used to protect asso- server are different. The source port numbers are chosen
ciations against blind attacks. Only the common header of the dynamically from operating system dependent ranges. Some
packet containing the INIT chunk has the verification tag 0. It operating systems use the port numbers between 49152 and
is important to note that most SCTP implementations use the 65535. Because many clients can be located behind the same
verification tag for looking up the association when a packet is NAT middlebox, and these clients might access a very popular
received. Some implementations even ensure that the verifica- server at about the same time, the chance that two clients get
tion tags are unique across all associations currently known. the same port is non-negligible.
SCTP supports not only the client-server model for associa- Therefore, TCP or UDP sessions usually are translated by
tion setup, but also the more general peer-to-peer model. changing the private IP address and additionally, the private

IEEE Network • September/October 2008 27


TÜXEN LAYOUT 9/5/08 1:00 PM Page 28

10.1.0.1:52001

UDP-Based Tunneling
Currently, most NAT middleboxes support only
100.4.5.1:8080 protocols running on top of TCP or UDP. A stan-
120.10.2.1 dard technique for all other protocols is to encap-
10.1.0.2:52002 sulate these packets into UDP instead of IP.
Internet Because both UDP and IP provide an unreliable
packet delivery service, this is feasible. This also
120.10.2.1:52001 => 100.4.5.1:8080 works for SCTP, as described in [3], and is cur-
120.10.2.1:52002 => 100.4.5.1:8080 rently implemented in the SCTP kernel extension
120.10.2.1:52003 => 100.4.5.1:8080 for Mac OS X.
10.1.0.3:52003
It should be noted that NAT middleboxes on
different paths are not synchronized, and there-
fore, the UDP port number might be different on
n Figure 2. Using basic NAT. different paths.
One drawback of using UDP encapsulation is
that Internet Control Message Protocol (ICMP)
port number to a global IP address and port number in the messages might not contain enough information to be pro-
TCP or UDP header, respectively. This method is called cessed by the SCTP layer. Another drawback is that the sim-
NAPT. Thereby, the NAT middlebox chooses the port num- ple peer-to-peer solution described in the sections about
bers from a pool and makes sure that no two connections to peer-to-peer communication and multihoming with a ren-
the same server obtain the same port numbers. dezvous server does not work because the UDP port numbers
As the transport layer checksum of the TCP and UDP might be changed by NAT-middleboxes.
packets covers the transport header that includes the port Tunneling SCTP over UDP must handle the same prob-
numbers, it must be modified according to the port number lems as any other UDP-based communication for NAT traver-
change. However, the checksum used for TCP or UDP has sal. However, this is the only possibility for SCTP-based
the property that the change of the checksum can be comput- communication through a NAT middlebox without modifying
ed only from the change of the port numbers. So this can be it to add SCTP support.
done very efficiently by a simple set of additions and subtrac-
tions.
It should be noted that the behavior of NAT middleboxes
An SCTP-Specific Variant of NAT
varies dramatically because there were no standards describ- In the NAPT method described previously, the NAT middle-
ing how to build them. The Behavior Engineering for Hin- box controls the 16-bit source port number of outgoing TCP
drance Avoidance (BEHAVE) working group of the IETF connections to distinguish multiple TCP connections of all
develops best current practice (BCP) documents giving clients behind the NAT middlebox to the same server. The
requirements for NAT middlebox behavior and protocols to basic idea for the SCTP-specific method is instead to use the
help applications to run over networks with NAT middlebox- combination of the source port number and the verification
es. tag. For single-homed hosts, this method is described in [2].
Considering only single-homed SCTP clients and servers, it If NAT middleboxes use the verification tags together with
is also possible to use this NAPT concept for SCTP because it the addresses and the port numbers to identify an association,
has the same port number concept as TCP and UDP. Howev- the probability that two hosts end up with the same combina-
er, the transport layer checksum used by SCTP is different tion decreases to a tolerable level.
from the one used by UDP and TCP. This checksum does not
allow the computing of the checksum change based only on A Simple Association Setup
the port number change. Therefore, the NAT middlebox must The main task of a NAT middlebox is to substitute the source
compute the new SCTP checksum again, based on the com- address of each packet with the public address used by the
plete SCTP packet. This requires a substantial amount of NAT middlebox and to keep the corresponding IP addresses
computing power that might be reduced when the computa- in a table. First, we consider an association setup between a
tion is performed directly by hardware. single-homed client and a single-homed server. Neither the
For multihomed SCTP clients and servers, reusing the INIT nor the INIT-ACK chunk contain any IP addresses. This
techniques from TCP and UDP becomes much harder. As leads to a scheme as described in Fig. 3.
we mentioned earlier, hosts can be multihomed, which In the first message of the handshake, the verification tag
means that they can simultaneously use multiple network in the common header must be set to 0, but the initiate tag
addresses and thus can be attached to multiple networks. (initTag) in the INIT chunk holds a 32-bit random number
Therefore, the traffic of one SCTP association, in general, that is supposed to be the verification tag (VTag) of the
passes through different NAT middleboxes on different incoming packets. Hence, at the beginning of the handshake,
paths. Because each SCTP end point can use only one only one verification tag is known. The NAT middlebox keeps
SCTP port number on all paths, the NAT middleboxes track of this information and takes the local private address
cannot change the port number independently. To apply (Local-Address) and the officially registered destination IP
the existing NAT concept, the NAT middleboxes involved address (Global-Address) from the IP header of the SCTP
would have to synchronize the port numbers to assign a packet and saves them in the NAT table (Fig. 3). The local
common number for the association. This is very hard to source port (Local-Port) and the destination port (Global-
achieve. Port) are obtained the same way.
Based on this discussion, it seems desirable to use a NAT The initiate tag of the INIT chunk, which the client has
mechanism for SCTP that does not require a change to the chosen for its communication, is also extracted from the INIT
SCTP header at all and hence to the port numbers, which chunk header and saved as Local-VTag. The Global-VTag
avoids synchronization among NAT middleboxes and the that eventually will be chosen by the communication partner
recomputation of the SCTP checksum. is not known yet. Before forwarding the packet, the NAT mid-

28 IEEE Network • September/October 2008


TÜXEN LAYOUT 9/5/08 1:00 PM Page 29

Client NAT Server


10.1.0.1:52001 120.10.2.1 100.4.5.1:8080

INIT: 10.1.0.1:52001=>100.4.5.1:8080 INIT: 120.10.2.1:52001=>100.4.5.1:8080


Vtag=0, initTag=12345 Vtag=0, initTag=12345

INIT-ACK: 100.4.5.1:8080=>10.1.0.1:52001 INIT-ACK: 100.4.5.1:8080=>120.10.2.1:52001


Vtag=12345, initTag=45678 Vtag=12345, initTag=45678

Chunk type Local-Address Global-Address Local-Port Global-Port Local-VTag Global-VTag


INIT 10.1.0.1 100.4.5.1 52001 8080 12345 -
INIT-ACK 10.1.0.1 100.4.5.1 52001 8080 12345 45678

COOKIE-ECHO: 10.1.0.1:52001=> COOKIE-ECHO: 120.10.2.1:52001=>


100.4.5.1:8080 100.4.5.1:8080
Vtag=45678 Vtag=45678

COOKIE-ACK: 100.4.5.1:8080=> COOKIE-ACK: 100.4.5.1:8080=>


10.1.0.1:52001 120.10.2.1:52001
Vtag=12345 Vtag=12345

n Figure 3. Four-way handshake for the SCTP association setup with NAT table.

dlebox exchanges the source address of the IP header with the address, an entry to the NAT table is made for that address.
NAT address (Nat-Global-Address) and sends the packet Because both verification tags must be added, a parameter
toward the other end point. must be included in the ASCONF chunk that contains the
The other SCTP end point receiving the packet containing verification tag that is not present in the common header.
the INIT chunk answers the request with a message contain-
ing the INIT-ACK chunk. This message is addressed to the Behavior of the SCTP End Points
NAT-Global-Address and the Local-Port. Its verification tag Because multiple clients behind the NAT middlebox might
in the common header must be identical to the initiate tag of choose the same local port when connecting to the same serv-
the INIT chunk, whereas the initiate tag of the INIT-ACK er, the restart procedure would result in a loss of an SCTP
chunk will be used as the verification tag for all packets that association. Therefore, the INIT chunk sent by the clients
are sent by the initiating end point (client 10.1.0.1 in the fig- should contain a parameter indicating that the server should
ure) of the association. For an incoming INIT-ACK chunk, not follow the restart procedure. Instead it should use the ver-
the NAT middlebox searches the table entries for the corre- ification tag to distinguish between the associations. This is
sponding combination of Local-Port, Global-Address, Global- what most SCTP implementations already do.
Port, and the Local-VTag and adds the Global-VTag. Thus, Furthermore, the SCTP end points must not include non-
after the reception of the INIT-ACK chunk, both verification global addresses in the INIT or INIT-ACK chunk.
tags are known. Now the NAT middlebox sets the destination If an SCTP end point is multihomed and has non-global
address to the Local-Address found in the table entry and addresses, it should set up the association single-homed and
delivers the packet. To complete the handshake, a packet with then add the other addresses after the association has been
a COOKIE-ECHO chunk is sent that is acknowledged with a established by sending an SCTP packet containing an
message containing a COOKIE-ACK chunk. ASCONF chunk for each address. To add such an address,
the ASCONF should contain only the wildcard address and
NAT Table the parameter providing the required verification tag. The
The NAT table consists of several entries. Each entry is a source address of the packet containing the ASCONF chunk
tuple consisting of: will be added to the association.
1) Local-Address To remove an address, an ASCONF chunk is sent with the
2) Global-Address wildcard address. Then, all addresses except the source
3) Local-Port address of the packet containing the ASCONF chunk are
4) Global-Port deleted from the association.
5) Local-VTag
6) Global-VTag Communication between the NAT Middleboxes and
In addition to the procedure to modify the table given in the the SCTP End Points
next subsection, a timer must be used to remove entries that
have not been used for a certain amount of time. This time If a NAT middlebox receives an INIT chunk that would result
should be long enough such that the SCTP path supervision in adding an entry to the NAT table that conflicts with an
procedure prevents the table entries from timing out. already existing entry, it should not insert this entry and may
send an ABORT chunk back to the SCTP end point. In the
Modifications to the NAT Table ABORT chunk, an M-bit should be set that indicates that it
The basic procedure for handling INIT and INIT-ACK chunks has been generated by a middlebox. This happens if two dif-
was described previously. If the INIT or INIT-ACK chunk ferent clients choose the same local port number and initiate
contains a list of addresses, then for each address in the list, tag and try to connect to the same server. On reception of
an entry is added to the table. such an ABORT chunk, the end point can try to choose a dif-
If an ASCONF chunk is received to add the wildcard ferent initiate tag and try setting up the association again.

IEEE Network • September/October 2008 29


TÜXEN LAYOUT 9/5/08 1:00 PM Page 30

Server
100.4.5.1:8080
100.5.5.1:8080
Client Router 1
10.1.0.1:52001
NAT

Internet

Router 2

Chunk type Local-Address Global-Address Local-Port Global-Port Local-VTag Global-VTag

INIT+INIT-ACK 10.1.0.1 100.4.5.1 52001 8080 12345 45678


INIT-ACK 10.1.0.1 100.5.5.1 52001 8080 12345 45678

n Figure 4. Building the NAT table for the single-homed client with a multihomed server.

If the NAT middlebox receives an SCTP packet that cannot Both the ERROR chunk and the ABORT chunk must
be processed because there is no entry in the NAT table, the have an M-bit indicating that the packet containing the chunk
NAT middlebox should discard the packet and can send back is generated by a middlebox instead of the peer.
an ERROR chunk. An M-bit must be set to indicate that the Two additional error causes are introduced, one to be
chunk is generated by a middlebox, and an error cause should included in the ERROR chunk to indicate that the NAT mid-
indicate that the NAT middlebox does not have the required dlebox misses some state, and one to be included in the
information to process the packet. On reception of such an ABORT chunk to indicate a conflict in the NAT table.
ERROR chunk, the end point should use an ASCONF chunk
to provide the required information to the NAT middlebox. Examples
This section provides a detailed discussion of several network
New SCTP Protocol Elements scenarios involving NAT middleboxes. The proposed NAT
Clients require a new parameter to be included in the INIT mechanisms were verified in all these scenarios using an
chunk to indicate that they will use the procedures described SCTP simulation in the INET framework for the OMNeT++
in this article. This parameter also is included in the INIT- simulation kernel described in [10].
ACK chunk to indicate that the receiver also supports it. Furthermore, a group of the Center for Advanced Internet
Another new parameter is required that can contain a verifi- Architecture at Swinburne University is implementing this
cation tag and is included in an ASCONF chunk. method for the FreeBSD operating system. This project,

Server
100.4.5.1:8080
Client
10.1.0.1:52001 Router
NAT

Internet
new NAT

Packets arriving at the server


120.10.2.1
120.10.2.1:52001=>100.4.5.1:8080
140.1.1.1:52001=>100.4.5.1:8080
140.1.1.1

10.1.0.1=>100.4.5.1 DATA: 120.10.2.1:52001=>100.4.5.1:8080

ERROR: 100.4.5.1:8080=>120.10.2.1:52001
100.4.5.1=>10.1.0.1 Cause: NAT state missing

ASCONF: 120.10.2.1:52001=>100.4.5.1:8080
10.1.0.1=>100.4.5.1 Vtag: 12345 140.1.1.1=>100.4.5.1

ASCONF-ACK:
100.4.5.1=>10.1.0.1 100.4.5.1:8080=>120.10.2.1:52001 100.4.5.1=>140.1.1.1

n Figure 5. After a route change a new NAT middlebox appears.

30 IEEE Network • September/October 2008


TÜXEN LAYOUT 9/5/08 1:00 PM Page 31

Rendezvous server

100.1.3.1 For more information on transport layer mobili-


ty, see [7].
Peer 1 Router 100.1.3.254 Peer 3
10.1.3.1 Peer-to-Peer Communication
10.1.1.1 100.1.1.254 100.1.1.253 100.1.2.253 NAT 2 A greater challenge is the communication
NAT 1 100.1.2.254 between two peers, that is, two hosts that both
use private IP addresses (peer-to-peer communi-
cation). A detailed description for UDP and
10.1.2.1 TCP is given in [8]. The two peers require an
10.1.4.1
agent to help them find their communication
Peer 2 Peer 4 partner. This agent usually is called a rendezvous
server.
n Figure 6. Peer-to-peer communication with rendezvous server. In Fig. 6 the corresponding network setup is
shown. The communication process in this case
consists of two phases. First, associations are ini-
tialized between the peers and the rendezvous
SCTP over NAT Adaptation (SONATA), is being implement- server; after retrieving the required information from the ren-
ed in cooperation with two of the authors and is based on [2]. dezvous server, the peers can communicate with each other
independently of the server. After both peers retrieve the
Single-Homed Client to Multihomed Server required information, the actual communication between the
In the case of a single-homed client and a multihomed server, peers can start. As there is no server, both hosts must be able
the server announces all its global addresses in address to act as client and server. Thus, both start an association. If
parameters included in the INIT-ACK chunk (Fig. 4). The the message containing the INIT chunk of Peer 1 reaches the
packet crosses the NAT middlebox, which updates its entries NAT middlebox, NAT 2, before the message of Peer 2 could
for the association. When the client receives the chunk, it arrive, it will be discarded. The retransmission of the INIT
adds those addresses to its list of destination addresses. As a chunk will arrive if in the meantime, Peer 2 has punched a
result, there will be a separate entry for each server address hole by triggering the NAT middlebox to set up a table entry.
although there is only one association. The best results can be achieved if the associations are started
at the same time. From the perspective of SCTP, the simulta-
Adding New NAT Middleboxes neous sending of INIT chunks also is not a normal situation
After setting up an association, data can be exchanged because the INIT chunk is not followed directly by an INIT-
between client and server. The packets are routed through the ACK chunk but by another INIT chunk. The SCTP collision
Internet. It must be emphasized that the routes are not stable handling procedure ensures that exactly one association
and can change during the lifetime of an association, in partic- between the peers is established.
ular if the association has a long life span as expected for
major SCTP application scenarios. Therefore, a new NAT Multihomed Client and Server
middlebox could become involved that has no knowledge of The client sends an INIT chunk without a list of addresses to
the properties of the association as shown in Fig. 5. the server, which responds with an INIT-ACK chunk includ-
Passing through a new NAT middlebox also means that the ing a list of all addresses of the server. As shown in Fig. 7, this
server receives a packet with a new source address, which initial handshake uses the path via NAT 1.
appears as if the client has an additional IP address. After the association is established, the client adds its sec-
In Fig. 5 the upper route shows the path where the associa- ond address by sending an ASCONF chunk. If the packet
tion was set up initially. After the route was changed, the containing this chunk is sent via the path containing NAT 2,
packets travel on the lower route. An example for the both NAT middleboxes have the required state. If this packet
address/port combination for both routes is shown below the is sent on the path via NAT 1, any packet sent from the client
server. on the path via NAT 2 results in an ERROR chunk being
If the new NAT middlebox receives the first packet from sent back, and this triggers the sending of an ASCONF chunk.
the client, it sends back a packet containing an ERROR
chunk indicating that it lacks the required NAT table entry.
Therefore, upon receipt of the ERROR chunk, the client 1 INIT
sends an ASCONF chunk on the new path with the required INIT-ACK 2
information. The new NAT middlebox can add a complete 3 COOKIE-ECHO
entry to its table upon receipt of this message. COOKIE-ACK 4
This message can pass through the NAT middlebox and can
be acknowledged by the server with an ASCONF-ACK mes-
sage. Afterward the communication can proceed as usual.
NAT 1
Client Using Transport Layer Mobility
SCTP with its functionality of dynamic address configuration NAT 2
is well suited to be employed in an environment with host Client Server
mobility. Whereas all other parameters remain the same, the
moving client will receive a new address. This not only results
in a new source address for the packet but also in a changing 5 ASCONF, ADD-IP
route, such that eventually another NAT middlebox must be
traversed, which again, initially has no knowledge of the asso- ASCONF-ACK 6
ciation. As the situation is similar to the one described in the
last subsection, we suggest that the same actions are taken. n Figure 7. Multihoming through NAT middleboxes.

IEEE Network • September/October 2008 31


TÜXEN LAYOUT 9/5/08 1:00 PM Page 32

This chunk provides the required information to the NAT numbers. This avoids the requirement of changing the port
middlebox, NAT 2. numbers and possibly synchronizing them between different
NAT middleboxes. A feature of dynamic address reconfigura-
Multihomed Transport Layer Mobility tion can be used to avoid having IP addresses in the transport
Previously, we discussed the procedure for a case when a layer, which is problematic for the processing in NAT middle-
client moves and hence changes its source address and the boxes. For peer-to-peer communications, it is helpful if the
corresponding NAT middlebox as well. During the transition transport layer supports simultaneous connection setups.
from one cell to another in a host mobility scenario, there is Finally, it might be preferable to use simple algorithms involv-
likely to be a zone where both cells are active, and thus, two ing random numbers with a small chance of collision instead
addresses can be in use. Adding the new address results in a of more complex deterministic algorithms without collision.
temporarily multihomed client. We propose to handle this sit- The solution presented in this article will be included in a
uation in a way similar to the case explained in the last sec- future version of our Internet drafts to be considered for stan-
tion. The new address is added by the sending of a message dardization in the BEHAVE working group of the IETF.
containing an ASCONF chunk. But as the old address is com-
pletely replaced by the new one as soon as the previous cell is References
left, another parameter must be added that indicates that the [1] Q. Xie et al., “SCTP NAT Traversal Considerations,” draft-xie-behave-sctp-
primary path should be set to the new address. This causes nat-cons-03.txt (work in progress), Nov. 2007.
the server to send the next packets to the new address. [2] R. Stewart and M. Tüxen, “Stream Control Transmission Protocol (SCTP) Net-
work Address Translation,” draft-stewart-behave-sctpnat-03.txt (work in
progress), Nov. 2007.
Multihoming with Rendezvous Server [3] M. Tüxen and R. Stewart, “UDP Encapsulation of SCTP Packets,” draft- tuex-
The final step in increasing the complexity of the NAT sce- en-sctp-udp-encaps-02.txt (work in progress), Nov. 2007.
nario is the communication between two multihomed peers [4] P. Srisuresh and M. Holdrege, “IP Network Address Translator (NAT) Termi-
nology and Considerations,” RFC 2663, Aug. 1999.
that are behind different NAT middleboxes. [5] R. Stewart, “Stream Control Transmission Protocol,” RFC 4960, Sept. 2007.
Just like in the single-homed case, the rendezvous server [6] R. Stewart et al., “Stream Control Transmission Protocol (SCTP) Dynamic
must gather the peer information to fill its table. This time the Address Reconfiguration,” RFC 5061, Sept. 2007.
table must be enlarged by the additional addresses. The peers [7] M. Riegel and M. Tüxen, “Mobile SCTP Transport Layer Mobility Manage-
ment for the Internet,” Proc. SoftCOM 2002, Int’l. Conf. Software, Telecom-
first set up an association with the rendezvous server. Using munications and Computer Networks, Split, Croatia, 2002, pp. 305–09.
this server the peers can obtain each other’s addresses and [8] B. Ford and P. Srisuresh, “Peer-to-Peer Communication across Network Address
port numbers. Translators,” USENIX Annual Technical Conf., Anaheim, CA, Apr. 2005.
At this point, the peers must set up an association via ini- [9] R. Stewart and Q. Xie, Stream Control Transmission Protocol (SCTP): A Refer-
ence Guide, Addison-Wesley, Oct. 2001.
tialization collision to provide a path by using hole punching. [10] I. Rüngeler, M. Tüxen, and E. Rathgeb, “Integration of SCTP in the
To also use the second path, on the way, the NAT middlebox- OMNeT++ Simulation Environment,” Int’l. Developers Wksp. OMNeT++
es must obtain the required information. By sending messages (OMNeT++ 2008), Mar. 2008.
containing ASCONF chunks almost simultaneously, the NAT
middleboxes are notified to allow packets arriving from the Biographies
opposite direction to pass through. Unfortunately, the mecha- ERWIN P. RATHGEB (erwin.rathgeb@iem.uni-due.de) received his Dipl.-Ing. and
nism described earlier to request information by sending a Ph.D. degrees in electrical engineering from the University of Stuttgart, Germany,
in 1985 and 1991, respectively. He has been a full professor at the University
message containing an ERROR chunk does not work when Duisburg-Essen since 1999 and holds the Alfried Krupp von Bohlen und Halbach
coming from the global side of the network because only the Chair for Computer Networking Technology at the Institute for Experimental Math-
host behind the NAT middlebox can provide the data to fill ematics. From 1991 to 1998 he held various positions at Bellcore, Bosch Telekom,
the NAT table. So when the message containing an ASCONF and Siemens. His current research interests include concepts and protocols for
next-generation Internets with a focus on network security. He is a member of IFIP,
chunk arrives at the opposite NAT middlebox before a hole is GI, and ITG, where he is chairman of the expert group on network security.
punched, the packet is discarded, but its retransmission might
be successful. After both NAT tables receive the appropriate IRENE RÜNGELER (i.ruengeler@fh-muenster.de) received her diplomas in computer
entries, the secondary paths also can be used. science and economics at the University of Hagen in 1992 and 2000, respec-
tively. She joined the Münster University of Applied Sciences in 2002, where she
works as a research staff member. Her research interests include innovative
Conclusion transport protocols, especially, SCTP and their performance analysis, signaling
transport over IP-based networks, and fault-tolerant systems.
In this article, we proposed a comprehensive solution for the
R ANDALL S TEWART (randall.stewart@trgworld.com) works for TRG Holdings as
support of SCTP in NAT middleboxes. We motivated the chief development officer. His current duties include integrating software solutions
necessity for a specific NAT concept with NAPT functionality, for call center applications using both SCTP and RSerPool. Previously, he was a
where the verification tags provided by SCTP are used to dis- distinguished engineer at Cisco systems. He also has worked for Motorola,
tinguish between associations. The NAT middleboxes can NYNEX S&T, Nortel, and AT&T Communications. Throughout his career he has
focused on operating system development, fault tolerance, and call-control sig-
request information from the SCTP end points and give hints naling protocols. He is also a FreeBSD committer with responsibility for the SCTP
to improve the overall procedure. reference implementation within FreeBSD.
Furthermore, several scenarios were analyzed to explain the
manipulation of the NAT table in single-homed, multihomed, MICHAEL TÜXEN (tuexen@fh-muenster.de) studied mathematics at the University of
Göttingen and received a Dipl.Math. degree in 1993 and a Dr.rer.nat. degree
and mobility environments. The peer-to-peer communication in 1996. He has been a professor in the Department of Electrical Engineering
with a preregistration was taken into account as well. and Computer Science of Münster University of Applied Sciences since 2003. In
Generalizing the SCTP-specific variant of NAT, the follow- 1997 he joined the Systems Engineering group of ICN WN CS of Siemens AG
ing is important. For supporting a transport protocol with in Munich. His research interests include innovative transport protocols, especial-
ly SCTP, IP-based networks, and highly available systems. At the IETF, he partici-
multipath support, a connection identifier makes connection pates in the Signaling Transport, Reliable Server Pooling, and Transport Area
tracking possible without a requirement to rely on the port Working Groups.

32 IEEE Network • September/October 2008


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 33

Distributed Connectivity Service for a


SIP Infrastructure
Luigi Ciminiera, Guido Marchetto, Fulvio Risso, and Livio Torrero, Politecnico di Torino

Abstract
Because of the constant reduction of available public network addresses and the
necessity to secure networks, middleboxes such as network address translators and
firewalls have become quite common. Because they are designed around the
client-server paradigm, they break connectivity when protocols based on different
paradigms are used (e.g., VoIP or P2P applications). Centralized solutions for mid-
dlebox traversal are not an optimal choice because they introduce bottlenecks and
single point-of-failures. To overcome these issues, this article presents a distributed
connectivity service solution that integrates relay functionality directly in user nodes.
Although the article focuses on applications using the Session Initialization Proto-
col, the proposed solution is general and can be extended to other application sce-
narios.

A lthough end-to-end direct connectivity was a


must in the early days of the Internet, currently,
increasing numbers of hosts are connected
through middleboxes such as network address
translators (NATs) that enable the reuse of private address-
es and/or firewalls, which are used to secure corporate net-
works and internal resources. These devices work seamlessly
for applications using the Session Initialization Protocol (SIP)
[2], which is among the protocols that suffers most from mid-
dlebox limitations. Two solutions were defined in this context.
SIP messages directed to the destination user agent (UA) are
delivered with a relay-based approach that exploits an inter-
mediate public SIP proxy [3]. For media flows, the interactivi-
ty connectivity establishment (ICE) [4] protocol was proposed.
in case of client-server applications (although the client must ICE is an integrated solution defined to discover NAT bind-
reside in the “protected” part of the network), but they limit ings and to execute the hole punching for media streams. In
the end-to-end connectivity of the applications that use dif- addition, ICE also supports media relaying based on the
ferent paradigms, such as voice over IP (VoIP) and peer-to- Traversal Using Relay around NAT (TURN) [5] protocol.
peer (P2P). In particular, middleboxes prevent nodes behind Both the hole-punching mechanism of ICE and TURN rely
them from being contacted directly from external nodes. For on simple traversal of UDP through NAT (STUN) [6], a
example, an internal host might not have a problem starting client-server protocol consisting of two messages, Binding
a data transfer to an external host, but the reverse (e.g., an Request and Binding Response. These messages are sufficient
incoming VoIP call) may be impossible. Thus, proper strate- for implementing the hole-punching procedure [1], whereas
gies for middlebox traversal are required to enable the seam- TURN must extend the STUN protocol to establish commu-
less communication between hosts, no matter where they are nication channels with relays, called TURN servers. STUN also
located. Among the known strategies, hole punching and can be used to implement a middlebox behavior discovery ser-
relaying [1] represent the ones that are used most frequently. vice [7] that can be used by internal hosts to determine the
The common idea is to make the middlebox function as if type of NAT/firewall they are behind.
the internal host begins the communication. The middlebox Current middlebox traversal solutions rely on centralized
then creates a temporary channel with the remote host, thus servers that provide rendezvous and relay capabilities. Howev-
allowing the delivery of external packets. In particular, the er, the centralized server is a single point of failure: if the
hole punching forces each internal host to maintain a persis- server fails, all UAs behind middleboxes become unreachable.
tent connection with an external rendezvous server located Furthermore, a centralized solution cannot scale to an IP-
on the public Internet. This creates a type of “hole” that can based telecommunication provider with millions of customers,
be used by an external host to contact the internal host in which servers may be required to handle a huge amount of
directly. If hole punching fails, for example, if hosts are traffic (both SIP signaling messages and media datagrams),
behind symmetric NATs, the relaying represents the last thus requiring a large amount of computational resources and
chance: internal hosts maintain a persistent connection with bandwidth. The server acting as relay for SIP (i.e., the SIP
an external node (the relay server), which operates as a for- proxy) also must handle the traffic generated by keep-alive
warder, that is, it receives all packets directed to the internal messages that UAs behind the middlebox periodically send to
host and redirects them to it. This solution requires that the it. Keep-alive messages are required to maintain the commu-
internal host advertises the IP address of the relay server as nication channel with the server and thus to guarantee that
one of its addresses, and that instructs the relay server with these UAs can always be reached. This could result in a high
the proper forwarding rules. overhead. For example, according to the NAT binding time-
This article focuses on the problem of middlebox traversal out reported in [3], in a SIP domain including 1.5 million UAs

IEEE Network • September/October 2008 0890-8044/08/$25.00 © 2008 IEEE 33


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 34

with limited connectivity, the central server must handle about ed connectivity for receiving SIP messages) and media relay.
50,000 keep-alive messages per second. Connectivity peers also can offer support to the hole-punching
This article proposes a distributed architecture — referred procedure for media session establishment, thus operating as
to as DIStributed COnnectivity Service (DISCOS) — for a distributed rendezvous server. In addition, connectivity
ensuring connectivity across NATs and firewalls in a SIP peers also provide support for middlebox behavior discovery
infrastructure. This solution overcomes the limitations of the [7]. UAs with limited connectivity can locate and attach to an
current centralized solution by creating a gossip-based P2P available peer whenever they require one of these services.
network and integrating the previously described rendezvous Connectivity peers are organized in a P2P overlay, and
and relay functionalities in the UAs. Each globally reachable their knowledge is spread through proper advertisement mes-
UA with enough resources can provide such services to UAs sages, thus building an unstructured gossip-based network.
with limited connectivity. A major emphasis is given to the Structured networks, characterized by additional overhead
overlay design, as it is a key point for ensuring a fast “service due to the maintenance of the structure, are not considered
lookup” (i.e., to find a peer that still has enough resources for because their excellent lookup properties are not required. In
offering the connectivity service), which is instrumental for fact, DISCOS uses the overlay to find only the first available
providing an adequate quality of service to the users. In par- connectivity peer and not for locating a precise resource.
ticular, we show how a scale-free topology can fit this require- Note that because DISCOS distributes existing middlebox
ment, and we propose an overlay construction model that can traversal functionalities among peers, it is also totally compati-
be used to build such topology. ble with current middleboxes and their traversal techniques.
DISCOS is somewhat orthogonal to P2P-SIP [8], although This enables a smooth deployment of the proposed solution.
both are based on P2P technologies. In fact, P2P-SIP is a
solution mainly for distributed lookup, whereas DISCOS Overlay Topology
offers a solution for middlebox traversal. In order to enable DISCOS to locate an available peer for
The idea of distributing such functionalities among end sys- UAs with limited connectivity in the shortest time possible,
tems is also one of the characteristics of Skype, a well-known peers should have a deep knowledge of the network: the
VoIP application. However, Skype uses secret and proprietary greater the number of known peers, the higher the probability
protocols that cannot be studied and evaluated by third par- of finding an available peer in a short time, especially if
ties, therefore limiting the ability to understand exactly how known peers are lightly loaded. In gossip-based networks, the
these problems are solved. For example, in the Skype analysis spread of information is based on flooding, thus the overlay
presented in [9] and [10], the authors could give only partial topology has a deep impact on the network efficiency. For
explanations about its NAT and firewall traversal mechanisms. instance, the greater the average path length between nodes,
Their experiments pointed out that nodes with enough the higher the depth of the flooding (hence the load on the
resources can become supernodes and provide support for network) that is required for an adequate spread of the infor-
NAT and firewall traversal. In particular, they offer relay mation. Thus, an overlay topology that ensures a small aver-
functionalities and probably run a sort of STUN server that age path length is required. However, this is not sufficient for
other nodes use to discover the presence (and to determine enabling peers to know a large set of suitable connectivity
the type) of NAT and firewall in front of them. Therefore, it peers from which to choose when a UA asks for the connec-
is clear that a node behind NAT must connect to a super tivity service. In fact, nodes maintain a cache that should be
node to be part of the Skype network, but no information kept small to reduce the overhead required to manage all the
could be provided about the super node discovery and selec- entries. This limits the number of peers known at each instant.
tion policies. Also, super node overlay topology is almost The limited cache size can be compensated by frequently
completely unknown. Thus, there is no way to evaluate the refreshing its contents so that the set of known peers changes
effectiveness of these solutions. On the other hand, here we frequently, resulting in a sort of round robin among peers: dif-
propose a distributed architecture for middlebox traversal ferent connectivity peers can always be provided to UAs that
whose scalability and robustness are discussed and evaluated. request the service at different instants, thus increasing the
In addition, the solution was engineered and validated by sim- opportunity for a queried connectivity peer to suggest avail-
ulation on a SIP infrastructure, but the solution is more gen- able ones when it cannot provide the service itself. Frequent
eral, and it can be seen as a mechanism to cope with cache refresh also is useful for ensuring that nodes store up-
middlebox traversal, thus opening the path to a wider adop- to-date information about existing peers. Such a policy can be
tion. efficiently adopted if the overlay results in a scale-free network
[11], an interesting topology that ensures small average path
Operating Principles length and features scalability and robustness. In a scale-free
network, few nodes (referred to in the following as hubs) have
Distributed Connectivity Service a high degree, whereas the others have a low one. The degree
DISCOS extends current centralized NAT and firewall traver- of a node is the sum of all its incoming (i.e., the in-degree)
sal solutions by distributing rendezvous and relay functionali- and outgoing (i.e., the out-degree) links. In the DISCOS over-
ties among UAs. Relaying and hole-punching service for lay, the out-degree of a node is limited by the cache size
media flows is implemented by integrating a STUN/TURN whereas the in-degree is the number of other peers that have
server in each UA. The TURN server also is used to support that node in their cache. Thus, nodes can be considered hubs
relaying SIP messages. However, DISCOS can be modified when they are in the cache of several peers, that is, when they
easily to offer the relaying of SIP messages by integrating SIP are highly popular. Hubs frequently receive advertisement
proxy functionalities in each UA, leading to a distributed messages from a large set of different nodes, so they frequent-
implementation of [3]. ly update their cache. In particular, if advertisement messages
A UA with enough resources (e.g., a public network contain nodes that are low in popularity, hubs can discover
address, a wideband Internet connection, and free CPU peers, which being low in popularity, are lightly loaded with
cycles) becomes what we define as a connectivity peer and high probability. The key is to make searches through hubs
starts to offer a connectivity service. In particular, connectivity because they potentially know a large variety of lightly loaded
peers can act as both SIP relay (leveraged by UAs with limit- peers. Thus, the proposed solution essentially exploits — and

34 IEEE Network • September/October 2008


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 35

A joining connectivity peer with no entries in cache


queries the bootstrap service for some hubs
CP

CP CP
CP

CP
CP

CP CP

Bootstrap
service CP
CP
CP
CP

CP
CP CP
NAT CP
The UA behind NAT queries a
node (possibly a hub) for service
UA A

CP = Highly popular connectivity peer (hub)


CP = Connectivity peer

n Figure 1. DISCOS overlay topology.

generalizes to the case of a single resource provided by many autonomously by each node through a simple approximated
nodes — the results achieved by Adamic et al. [12] about ran- metric based on the number of received advertisement mes-
dom walk searches in unstructured P2P overlays. They demon- sages that contain such a node. In our approximated model,
strated that searches in scale-free networks are extremely preferential attachment is implemented by forcing peers to
scalable (their cost grows sublinearly with the size of the net- evaluate the popularity of nodes through the previously men-
work), also proving that searches toward hubs perform better tioned mechanism and then to include some of the most pop-
than random searches because hubs have pointers to a larger ular peers in the advertisement messages they send. This
number of resources. In DISCOS, the benefit of searching allows nodes to insert highly popular peers (hubs) in their
through hubs comes from the high frequency with which cache, thus building and maintaining the scale-free topology.
pointers to connectivity peers change in their cache. These In summary, new nodes use the peers known through the
properties are obtained at the expense of a non-uniform dis- bootstrap service as “bootstrap” nodes; then they learn the
tribution of the number of messages handled by nodes: the most popular ones through the received advertisement mes-
higher the popularity of a node, the larger the number of sages and start to perform preferential attachment. Further-
advertisement messages received. However, a proper hub more, incoming nodes that already know peers discovered
selection policy and a reasonable advertisement rate could during their previous visits can avoid the bootstrap procedure
mitigate the effects of this disparity. These aspects are ana- by attaching directly to them. The resulting topology is shown
lyzed in more detail in the following section. in Fig. 1.
The Barabasi-Albert [11] model was proposed to create It is worth noting that different bootstrap services can be
scale-free graphs. In this model, few nodes are immediately used to create disjoint overlays because joining peers that
available and when a new node arrives, it connects to one of fetch nodes from different bootstrap services start to exchange
the existing nodes with a probability that is proportional to advertisement messages with different connectivity peers. This
the degree of popularity of such a node (preferential attach- enables the possibility of deploying different DISCOS overlays
ment); in other words, the model assumes a global knowledge in different geographical areas of a SIP domain. If a location-
of nodes and their degree, which is clearly inapplicable in a aware bootstrap service selection policy is adopted, users can
real network scenario. A first step to implement such a model find a connectivity peer that is close to them, thus preserving
in our overlay is to make M peers available to other nodes the user-relay latency achieved by current centralized solu-
through a bootstrap service. When a node joins the overlay tions, where different servers can be used at different loca-
for the first time, it queries the bootstrap service for a subset tions.
of these M registered nodes. However, preferential attach- The implementation of the bootstrap service is highly cus-
ment is not possible with the mechanism described so far tomizable. A possible solution consists in deploying M static
because all incoming peers: peers and preconfiguring their addresses on each UA. A more
• Can learn only the nodes provided by the bootstrap service flexible approach (considered in the following) consists in
• Cannot compute the popularity of a node deploying multiple bootstrap servers reachable through appro-
An adequate spread of the network knowledge can address priate domain name server service (DNS SRV) location
the first issue, but there is no way to enable a node to learn entries configured in the DNS. Each bootstrap server stores
the in-degree (i.e., the precise metric of node popularity) of information about M connectivity peers that spontaneously
the others. In our case, the popularity is computed register themselves when they join the overlay. Multiple boot-

IEEE Network • September/October 2008 35


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 36

(a) (b) (c)

Start Start Start

Is the cache Yes Are there peers No Is the cache Yes


empty? in the advertisement? empty?
No Fetch peers from Yes No Fetch peers
Start
bootstrap from bootstrap
service service
Extract one Are there No
peers not yet
Contact one visited?
using STUN Yes Stop
(timeout if SIP
Is the peer No relay lookup)
already in Contact the
cache? most popular
No Response within Yes not yet visited
a timeout? Is the No
Yes cache full?
No Response
Yes within a
Perform other STUN timeout?
tests with the
contacted peer Increase peer Drop peer in Yes
popularity average position

Is it available Yes
No Has node Yes for SIP/media relay
limited connectivity? Insert new
peer service?
Join DISCOS SIP relay No Success
overlay lookup
Get the three
peers provided
in the response
Order the cache
by popularity
Put the most
popular in cache
(drop less
popular if full)

Order the cache


by popularity

Contact the two


less popular
included in the
response

No Response
within a
timeout?
Yes

No Is it available Yes
for SIP/media relay
service?
Success

n Figure 2. Operation of DISCOS when: a) a node joins the SIP domain; b) a node in the overlay receives an advertisement message; c)
a node performs a SIP/media relay lookup.

strap servers are deployed for redundancy and load balancing Then, it sends an advertisement message to the known peers
purposes. Proper DNS configuration can enable a location- to announce itself. The UA is now part of the DISCOS over-
aware bootstrap service selection. lay, and it starts receiving messages from other nodes, thus
gradually filling its cache with new peers. A proper peer
Protocol Overview advertisement policy is adopted to implement preferential
Whenever a UA joins the SIP domain, it must determine if it attachment (thus building and maintaining the scale-free
can become a connectivity peer, or if it is behind a middlebox. topology) and to enable caches to be refreshed with lightly
This is done by contacting a connectivity peer and exploiting loaded peers (thus having potential nodes available for the
its STUN functionalities [7]. The described bootstrap proce- service). In particular, advertisement messages include the
dure is performed if it does not know an active peer. The flow sender node, the two most popular peers it knows (enabling
chart related to the join procedure is shown in Fig. 2a. preferential attachment), and the two less popular peers it
If the UA can become a connectivity peer, it checks the knows (spreading the knowledge of lightly loaded peers).
number of addresses registered on each bootstrap server and Advertisement messages are periodically sent by peers to
if it is smaller than a fixed bound M, it adds itself to the list. all nodes they have in their cache and contain a special time-

36 IEEE Network • September/October 2008


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 37

to-live (TTL) field that allows the message to cross N hops: as solutions were proposed and can be seamlessly applied in
soon as the message is received, the TTL value is decrement- DISCOS. For example, in [14], public key certificates are dis-
ed and if it is a positive value, the recipient sends another tributed among users to enable them to verify the origin and
message to all the nodes in its cache. Every time a peer the integrity of messages. Analogously, certificates can be
receives an advertisement message, it updates its cache by used in DISCOS to authenticate advertisement messages, so
increasing the popularity of nodes already present and by that they can be considered trusted. This limits the operation
inserting the new ones. As previously described, it is impor- of malicious peers as they can be easily traceable. This and
tant for a node to have both hubs and peers of low popularity other P2P-SIP derived security policies certainly require fur-
in its cache. Thus, a proper cache management policy also is ther improvement to better fit specific DISCOS requirements.
adopted if the cache is full: the node with average popularity However, we are confident that effective results can be
is removed before the insertion, resulting in a cache that privi- obtained with minimal modifications because, as mentioned
leges big hubs and peers of low popularity. Figure 2b details previously, the security issues that must be addressed are simi-
the operations of a peer when it receives an advertisement lar in the two environments. This additional effort is left for
message. future work.
UAs with limited connectivity have a different behavior
because essentially they exploit DISCOS features to find SIP Overlay Simulation
relays (they choose a connectivity peer as relay for SIP mes-
sages as soon as they join the SIP domain; in addition, they Simulations Background
select another when the current one disappears) and media We developed a custom, event-driven simulator to evaluate
relays (when they need one to establish a media session). A the effectiveness of the proposed solution. In particular, we
UA with limited connectivity performs these lookups by con- were interested in proving its scalability and validating its
tacting the most popular peers in its list, which can accept or algorithms. Thus, we implemented a simulator supporting the
decline the request. If it refuses, it includes in the answer the following four operations: node arrival/departure, media ses-
two least popular peers and the most popular peer it knows: sion set up/teardown, SIP relay lookup (triggered when a
the least popular peers are queried immediately (since they node with limited connectivity joins the network or when its
are supposed to be free enough to provide connectivity), current SIP relay disappears), and media relay lookup (that
whereas the most popular is inserted in the cache (because it occurs when a node requires a relay to perform a media ses-
can perform faster searches as it is probably a hub). If both sion).
queried peers refuse to provide the service, another node is Simulations are referred to a single SIP domain. Node
picked from the cache, and the procedure is repeated. If all arrivals and call occurrences are modeled using a Poisson pro-
the nodes in the cache were queried without success, two dif- cess, whereas node lifetime and call length are extracted from
ferent policies are applied, depending on the type of service real Skype traffic coming from/to the network of the universi-
the UAs with limited connectivity require: in the case of ty campus to approximate the behavior of real VoIP networks.
lookup for a SIP relay, the UA waits for a random time and With our parameters, the average number of nodes in the net-
then repeats the procedure; in the case of lookup for a media work depends on their arrival rate because of the effect of the
relay, the procedure is stopped, and the media session cannot Poisson arrivals model coupled with the lifetime distribution
be established. Relay lookup procedure is shown in Fig. 2c. of Skype. For example, an arrival rate λN = 100 nodes/minute
UAs with limited connectivity also receive ad hoc messages leads to a network consisting, on average, of 30,000 nodes,
from their relays containing three highly popular peers that which is the standard size in our simulation and is a good
allow them first to fill, and then to update, their cache with trade-off between simulation length (some lasting several days
new hubs. This enables them to direct searches toward hubs on a Dual Xeon 3 GHz processor) and significance of results.
when they require a connectivity peer. Broken hubs (e.g., To test our solution within different traffic load scenarios,
because of a network failure) are detected through a timeout: three different rates are used for media session occurrences:
if a hub does not reply to a query, the UA can query one of 1.4 λN, 5 λN, and 20 λN sessions/minute. These values, coupled
the others hubs in its cache. If no peers are available, the UA with the distribution of the Skype call duration, lead to 10
again fetches the registered ones from the bootstrap server; percent, 30 percent, and 98 percent of nodes simultaneously
however, this situation is unlikely to occur because UAs with involved in a media session, respectively.
limited connectivity periodically receive new hubs from their Statistics presented in [15] show that about 74 percent of
SIP relays. hosts are behind a NAT. In addition, [1] shows that hole
This protocol could be integrated in SIP, as well as imple- punching is successful in about 82 percent of the cases. To the
mented separately. The former approach is more straightfor- best of our knowledge, no detailed information is available
ward as it simply consists in defining new SIP header fields. about firewall proliferation over the Internet. On the strength
The latter one is more efficient, especially concerning the of these available data, we consider for simulation a network
message size. In fact, the human-readable nature of SIP mes- scenario where nodes have limited connectivity with probabili-
sages would result in advertisement messages of about 800 ty P LC = 0.74 and media sessions directed to these nodes
bytes. require relaying with probability P MR = 0.18. Whenever a
node joins the SIP domain, two different actions can be per-
Security Issues formed at simulation level: if it is tagged as a node with limit-
The deployment of a P2P architecture for providing connec- ed connectivity (with probability PLC), it triggers a SIP relay
tivity service raises several security issues that are different lookup; otherwise it joins the DISCOS overlay as a connectivi-
than in centralized solutions. In DISCOS, like in many other ty peer. Media sessions are possible between each pair of
distributed systems, the control of the consequences of mali- nodes (selected randomly). When a node behind a NAT is
cious behavior of nodes can be more difficult than in the cen- contacted, a media relay lookup is triggered by this node with
tralized counterpart. Much effort has been expended during probability PMR.
past years in investigating these issues in the context of P2P- The number of UAs with limited connectivity to which a
SIP overlays [8, 13, 14] that must deal with similar concerns as peer can simultaneously provide SIP relay service is set to 10;
they replace centralized SIP proxies for user locations. Some advertisement messages have a TTL equal to 2; their sending

IEEE Network • September/October 2008 37


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 38

0,5 1
DISCOS overlay DISCOS observations
Average clustering coefficient Random graph Power law, c=0.7, y=1.5
0,4
0,1

Fraction of nodes
0,3
0,01
0,2

0,001
0,1

0 0,0001
0 5000 10,000 15,000 20,000 25,000 30,000 1 10 100
Network size (nodes) In-degree
(a) (b)

12 0,007
DISCOS overlay 10% involved in a call
Random spread and lookup 30% involved in a call
10 0,006
98% involved in a call
0,005

Failure probability
Contacted nodes

8
0,004
6
0,003
4
0,002

2 0,001

0 0
0 5000 10000 15000 20000 25000 30000 0 1 2 3
Network size (nodes) Number of backup relays
(c) (d)

14 1
10% involved in a call 10% involved in a call
12 30% involved in a call 30% involved in a call
98% involved in a call 0,8 98% involved in a call
10
Fraction of nodes
Contacted nodes

8 0,6

6
0,4
4
0,2
2

0 0
1 2 3 0 1 2 3
Number of allocated relays Number of media flows per relay
(e) (e)

n Figure 3. Simulations results: a) average clustering coefficient evaluation; b) in-degree power law distribution; c) average number of
contacted peers to find a SIP relay; d) media session failure probability vs. number of allocated backup relays; e) average number of
peers contacted to allocate K relays; f) bandwidth consumption distribution.

interval is set to 60 minutes; and the cache of a peer is sup- coefficient higher than the one of a random graph obtained in
posed to contain 10 entries. Furthermore, the number of the same conditions, which is clearly proved in Fig. 3a. In
peers registered in the bootstrap server (which is supposed to detail, the average clustering coefficient for DISCOS decreas-
be unique and reachable by nodes) is set to 20. Simulation es when the network size grows, asymptotically converging to
lasts enough to exit from the transient period; presented a value that is about 20 times the clustering coefficient of a
results are referred to the steady state. random graph. We also verified that at all network sizes
experimented, the coefficient remains almost constant in time.
Overlay Topology Evaluation Concerning the in-degree, the requirement to be met is that
First, simulation aims at demonstrating that our protocol cre- the distribution of node degree follows a power law, where
ates a scale-free network among connectivity peers. In partic- the probability is that a node has k connections, and c is a
ular, we consider the clustering coefficient and the in-degree normalization factor. Figure 3b shows that the distribution of
of nodes [11]. The clustering coefficient of a node is defined in-degree values obtained through simulation fits well a power
as the number of links between its neighboring nodes divided law P(k) = ck–γ with c = 0.7 and γ = 1.5. These tests validate
by the number of links that could possibly exist between them. our overlay construction model, showing that the resulting
To be scale-free, an overlay must have an average clustering topology really evolves in a scale-free network.

38 IEEE Network • September/October 2008


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 39

To prove the effectiveness of the DISCOS topology, we deriving from the search of back-up relays is depicted in Fig.
compare our solution with a distributed system where the 3e, which plots the average number of peers that must be con-
information is randomly spread, and the nodes to query dur- tacted to find K available media relays. For a reasonable num-
ing lookup procedures are randomly chosen among peers in ber of simultaneous sessions, this value remains low. However,
the cache. Figure 3c depicts the average number of peers that we set the number of back-up relay nodes to one, which is a
must be contacted to reach an available SIP relay for both reasonable trade-off between the probability of a session drop
DISCOS and the randomized overlay. Although the advertise- and the additional complexity that results when a UA must
ment rate and the TTL value remain the same, the figure search a back-up relay node before starting media sessions.
shows that in DISCOS, the number of peers contacted is sen- Finally, we analyzed the distribution of load among connec-
sibly lower. Furthermore, the ratio between the performances tivity peers. In particular, Fig. 3f shows the distribution of the
obtained by the two policies increases with the network size, number of media flows simultaneously handled by media
thus demonstrating the scalability properties of our solution. relays. It can be observed that although media flows have dif-
These tests prove the effectiveness and the scalability of ferent bandwidth requirements, the great part of relays simul-
DISCOS. In particular, results show how the scale-free taneously handles no more than one media session. Thus, a
topology ensures overlay efficiency with a limited message good load balancing among peers is guaranteed.
rate (each peer sends an advertisement message every 60
minutes) with a small TTL (equal to 2) and a limited cache
size (10 entries). We also evaluated the number of adver-
Conclusions
tisement messages that connectivity peers must handle in This article presents a distributed infrastructure, called DIS-
our simulated SIP domain including 30,000 UAs: 99 percent COS that aims at providing connectivity service to hosts
of nodes process less than seven advertisement messages behind middleboxes. This solution extends current centralized
per minute and the remaining 1 percent process a number approaches (and overcomes their scalability and robustness
of messages that varies between eight and 48 messages per limitations) by integrating middlebox traversal functionalities
minute, thus resulting in a reduced per-node overhead. into edge nodes. The article also presents the mechanisms
However, this confirms that hubs should be chosen careful- that can be used to manage such infrastructure and exploit its
ly, with a preference for nodes with enough computational services. The proposed infrastructure is based on an unstruc-
and bandwidth resources, for example, using the dynamic tured peer-to-peer paradigm and proved to be extremely
protocol proposed by Chawathe et al. for the Gia P2P net- effective in locating suitable relays and distributing media ses-
work [16]. sions evenly among the available connectivity peers. Results
confirm that the overhead for managing the overlay is low,
Media Sessions Relaying Performance that each host is able to locate a suitable connectivity peer
This section analyzes the overlay support for media sessions, with a small number of messages (hence, in a very short time),
in particular when hole punching fails and relaying is required. and the blocking probability of a new media call is negligible
To prevent resource wasting, a media relay is typically chosen even for a very high load. Although our simulations cannot
by a UA immediately preceding the establishment of a media simulate a nationwide network (for processing/memory prob-
session. Various types of media flows are considered, differing lems), we are confident that results can be extended to such
in the amount of consumed bandwidth. In particular, assum- an environment because the distributed infrastructure is based
ing b bit/s is the consumed bandwidth unit, five types of flows on the scale-free topology, which is the key to achieving these
requiring nb (1 ≤ n ≤ 5) bit/s are defined. The flow type is ran- results ensuring overlay scalability and robustness.
domly selected (with uniform distribution) when a new session Future work aims to validate the proposed infrastructure in
starts. We also define Bi as the amount of bandwidth that non-SIP environments and more exhaustively address security
peer i can offer for relaying media sessions. For the sake of issues.
simplicity, Bi is assumed to be the same for each connectivity
peer and equal to 5b bit/s. However, in a real scenario, this Acknowledgment
value could vary according to node capabilities. The authors would like to thank Marco Mellia who was
We start the evaluation of the DISCOS support for media instrumental in obtaining a proper characterization of Skype
sessions from the estimation of the call failure probability user agents.
because it is the parameter that mainly affects the quality of
service perceived by users. A session can fail because either References
an available relay cannot be found, or the relay is found but [1] B. Ford, P. Srisuresh, and D. Kegel, “Peer-to-Peer Communication across
becomes unavailable during the session (e.g., because it dis- Network Address Translators,” USENIX Annual Tech. Conf., Anaheim, CA,
connects from the network). Apr. 2005.
[2] J. Rosenberg et al., “SIP: Session Initiation Protocol,” IETF RFC 3261, June
With respect to the first problem, we never observed such 2002.
an event during simulation: a UA with limited connectivity [3] C. Jennings and R. Mahy, Eds., “Managing Client Initiated Connections in
was always able to find a media relay. This result suggests that SIP,” http://tools.ietf.org/html/draft-ietf-sip-outbound-11, Nov. 2007.
with our assumptions about the number of media sessions [4] J. Rosenberg, “Interactive Connectivity Establishment (ICE): A Protocol for
NAT Traversal for Offer/Answer Protocols,” http://tools.ietf.org/html/draft-
requiring a relay, the probability for this event to occur in a ietf-mmusic-ice-18, Mar. 2008.
DISCOS environment can be considered negligible. The sec- [5] J. Rosenberg, R. Mahy, and P. Matthews, “Traversal Using Relays around
ond issue could be mitigated by implementing proper relay NAT (TURN): Relay Extensions to Session Traversal Utilities for NAT (STUN),”
back-up policies. As shown in Fig. 3d, the media session can http://www3.tools.ietf.org/html/draft-ietf-behave-turn-07, Feb. 2008.
[6] J. Rosenberg et al., “Session Traversal Utilities for (NAT) (STUN),”
fail in about 0.6–0.65 percent of cases, but the selection of a http://tools.ietf.org/html/draft-ietf-behave-rfc3489bis-15, Feb. 2008.
single back-up relay (that handles the communication in case [7] D. MacDonald and B. Lowekamp, “NAT Behavior Discovery Using STUN”;
the first relay fails) sensibly reduces this probability, and fur- http://www3.tools.ietf.org/html/draft-ietf-behave-nat-behavior-discovery-03,
ther reductions are possible increasing the number of relay Feb. 2008.
[8] D. A. Bryan and B. B. Lowekamp, “Decentralizing SIP,” ACM Queue, vol. 5,
nodes. The blocking probability remains low even in the no. 2, Mar. 2007.
unlikely case in which 98 percent of the users are involved in [9] S. A. Baset and H. Schulzrinne, “An Analysis of the Skype Peer-to-Peer Inter-
a call (i.e., almost all users are at the phone). The overhead net Telephony Protocol,” IEEE INFOCOM ’06, Barcelona, Spain, Apr. 2006.

IEEE Network • September/October 2008 39


CIMINIERA LAYOUT 9/5/08 1:03 PM Page 40

[10] P. Biondi and F. Desclaux, “Silver Needle in the Skype,” Black Hat Europe GUIDO MARCHETTO (guido.marchetto@polito.it) received his Ph.D. in computer
2006, Amsterdam, The Netherlands, Mar. 2006. engineering in April 2008 and his laurea degree in telecommunications engi-
[11] R. Albert and A.-L. Barabási, “Statistical Mechanics of Complex Networks,” neering in April 2004, both from Politecnico di Torino. He is a post-doctoral fel-
Rev. Modern Physics, 74, 2002, pp. 47–97. low in the Department of Control and Computer Engineering at the Politecnico di
[12] L. A. Adamic et al., “Search in Power Law Networks,” Physical Rev., E 64, Torino. His research topics are packet scheduling and quality of service in pack-
2001. et-switched networks, peer-to-peer technologies, and voice over IP protocols. His
[13] J. Seedorf, “Security Challenges for Peer-to-Peer SIP,” IEEE Network, vol. interests include network protocols and network architectures.
20, no. 5, Sept. 2006.
[14] C. Jennings et al., “Resource Location and Discovery (RELOAD)”; FULVIO RISSO (fulvio.risso@polito.it) received his Ph.D. in computer and system
http://www.p2psip.org/drafts/draft-bryan-p2psip-reload-04.txt, June 2008. engineering from Politecnico di Torino in 2000 with a dissertation on quality of
[15] M. Casado and M. J. Freedman, “Peering through the Shroud: The Effect of service in packet-switched networks. He is an assistant professor in the Depart-
Edge Opacity on IP-Based Client Identification,” USENIX/ACM Int’l. Symp. ment of Control and Computer Engineering of Politecnico di Torino. His current
Networked Sys. Design and Implementation, Cambridge, MA, Apr. 2007. research activity focuses on efficient packet processing, network analysis, net-
[16] Y. Chawathe et al., “Making Gnutella-Like P2P Systems Scalable,” ACM work monitoring, and peer-to-peer overlays. He is the author of several papers
SIGCOMM ’03, Karlsruhe, Germany, Aug. 2003. on quality of service, packet processing, network monitoring, and IPv6.

Biographies LIVIO TORRERO (livio.torrero@polito.it) is a Ph.D. student in computer and system


engineering in the Department of Control and Computer Engineering at Politecni-
LUIGI CIMINIERA (luigi.ciminiera@polito.it) [M] is a professor of computer engineer-
co di Torino. He received his laurea degree in computer engineering from
ing in the Dipartimento di Automatica e Informatica at Politecnico di Torino,
Politecnico di Torino in November 2004. His research topics include voice over
Italy. His research interests include grids and peer-to-peer networks, distributed
IP protocols, IPv6, and peer-to-peer technologies and their NAT/firewall related
software systems, and computer arithmetic. He is a co-author of two international
issues.
books and more than 100 contributions published in technical journals and con-
ference proceedings.

40 IEEE Network • September/October 2008


SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 41

Dial “M” for Middlebox Managed Mobility


Stephen Herborn and Aruna Seneviratne, NICTA

Abstract
Users can be served by multiple network-enabled terminal devices, each of which
in turn can have multiple network interfaces. This multihoming at both the user and
device level presents new opportunities for mobility handling. Mobility can be han-
dled by utilizing devices, namely, middleboxes that can provide intermediary rout-
ing or adaptation services. This article presents an approach to enabling this kind
of mobility handling using the concept of personal networks (PNs). Personal net-
works (PNs) consist of dynamic conglomerations of terminal and middlebox devices
tasked to facilitate the delivery of information to and from a single human user.
This concept creates the potential to view mobility handling as a path selection
problem because there may be multiple valid terminal device and middlebox con-
figurations that can successfully carry a given communication session. We present
details and an evaluation of our approach, based on an extension of the Host
Identity Protocol, which demonstrate its simplicity and effectiveness.

A major trend evident in mobile communication


systems today is the use of multiple network-
enabled terminal devices to deliver information
to a single user. These devices can be equipped
with multiple network interfaces, of which some, all, or none
may be satisfactorily operational at a given point in time. This
trend toward multiple devices, which we refer to as “user mul-
This article analyzes the requirements for exploiting the oppor-
tunities that arise as a result of user and device multihoming in
mobile communications systems. It then proposes a solution that
enables the use of intermediaries to redirect or transform data
flows so that they can be received by the best available terminal
device, or combination thereof, at any given time.
The article presents a vision for mobility management that
tihoming,” combined with what is commonly termed “device captures the potential for mobility support using middleboxes
multihoming,” introduces the potential for multiple end-to- and then describes an approach to solving one of the funda-
end paths between two mobile communicating parties. These mental requirements to realize this vision. The following sec-
alternate paths can be facilitated with the support of special- tions elaborate on the details of a proposed scheme,
ized intermediaries, called middleboxes or service proxies to concluding with a summary and review of future work.
either establish or maintain a communication session. For the
purposes of our discussion here, the common terms middle-
box, intermediary, and our own term — service proxy (or SP)
Overview
— are synonymous. The conceptual foundation on which this article is based is
Although some of the aspects of user multihoming will that users are served by a PN comprising loose, dynamic con-
diminish with the next generation of devices that will provide glomerations of many devices focused on serving a particular
integrated functionality, the availability of different paths may user. PNs, as depicted in Fig. 1, encompass personal area net-
be useful for different purposes or at different times. There- works (PANs) [4] and may incorporate other terminal and
fore, multihoming can be exploited to maximize the service non-terminal devices that belong to public infrastructure or
offerings to mobile users, for example, to take advantage of a devices at distant locations in the network.
more cost-effective network interface available on another In a PN, terminal devices (TDs) terminate an end-to-end
device by re-routing ongoing communications sessions through connection on behalf of a user to either display or store the
that device. information received via the connection. In Fig. 1 the mobile
One of the primary requirements for exploitation of this phone and laptop computer, as well as the large display, all
network/path diversity is the ability to seamlessly switch com- represent potential terminal devices. The non-terminal enti-
munication sessions between different networks/paths and ties (middleboxes) act as an intermediary relay point for an
potentially add or remove intermediaries in the process. end-to-end connection. These entities intercept information in
Although, there have been numerous proposals for transit from one end of an end-to-end connection, possibly
network/path switching, such as [1–3], they do not take advan- process, and then forward toward the other end of the con-
tage of intermediaries, namely, middlebox devices. One major nection. The processing provides either high-level adaptation,
advantage of being able to utilize intermediaries is that they filtration, and transformation of application data, or low-level
may be able to perform intensive mobility handling operations connectivity provision. Figure 1 presents four examples of
such as content adaptation or time-shifting at the edges of the non-terminal entities. The mobile phone and laptop computer
core networks, thus saving last-hop bandwidth and terminal are able to serve as bridges between various access technolo-
device processor/power capacity. Additionally, intermediaries gies. The third and fourth examples of service proxy devices
can be used to provide indirect connectivity to the core net- are a remotely located application-aware general packet radio
work at times when no direct connectivity is possible. service (GPRS) gateway and a mobile router.

IEEE Network • September/October 2008 0890-8044/08/$25.00 © 2008 IEEE 41


SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 42

Personal network

GPRS

Bl
ue
to
et

o
ern

th
Eth
Wi-Fi

Public Internet Personal area network Local area network

n Figure 1. A personal network.

Managing Personal Networks Identity Delegation


One of the primary management functions for PNs is the The overwhelming majority of user-level applications that
selection of service paths. Consider a single unicast communi- require network connectivity do so by creating a socket bound
cation session between two sets of terminal devices belonging to some local interface identifier (i.e., IP address) and possi-
to PN A and PN B in a content distribution application. bly statically connected to a peer identifier. End-to-end con-
Assume that certain parts of the requested content is available nections based on sockets cannot cope well with changes in
in multiple and different forms on different servers, for exam- identifiers (i.e., IP addresses) that occur as a result of mobili-
ple, advertisements can be served from a different location to ty. This can be solved at the middleware level by providing
the rest of the content. Non-terminal entities can be used if applications with “bind-able” identifiers that can be assigned
required and also can be composed to form an aggregated on a per-flow basis and delegated between physical devices.
end-to-end service. The result is a number of valid end-to-end This enables applications on both ends of a communication
paths that can be used to carry the communication session, as session to remain oblivious to mobility management activities
shown in Fig. 2. To facilitate this, at least one candidate path performed at the operating system level or by mobility han-
from one set of terminal devices to the other must be discov- dling middleware. To transfer sessions, first there must be
ered, selected, and configured. In the simplest case, the best assurance that devices within a given PN are able and willing
path is one directly between two terminal devices. In other to accept any incoming data stream that is redirected to them.
cases, it may involve some non-terminal intermediary. To real- This is not the case for a number of reasons. First, no device
ize this, it is necessary to develop mechanisms for: blindly accepts an unsolicited, incoming data stream. Second,
• Discovering and constructing end-to-end paths the redirected stream may be forwarded toward a port on the
• Dynamically configuring and utilizing these paths new device that is already occupied by another application.
The first requires a means to discover and select appropri- Finally, even if the device can accept the redirected stream, it
ate intermediaries that are distributed throughout a network is not aware of the application to which the data should be
and compose an end-to-end path. Such mechanisms are delivered. The use of cryptographic identifiers provides the
beyond the scope of this article but are addressed in numer- means to solve at least the first issue by decoupling network
ous existing works including [5]. The second requires a locators from identifiers and by providing strong authentica-
means to switch ongoing communication sessions between tion of the identities used to send, receive, and forward data.
terminal devices and transparently insert or remove interme- Port conflicts can be solved with stateful connection manage-
diaries into the end-to-end path. This section focuses on the ment similar to the approach used by network address transla-
second issue. tors (NATs). What is missing is a way to delegate identifiers
For the second issue there have been numerous attempts to between devices, and a solution for this problem is proposed
develop mechanisms to dynamically discover, construct, and here and described in detail later.
configure end-to-end paths, for example, [6] and [7]. Howev- The capacity for identity delegation makes possible a num-
er, the deployment of these schemes is predicated on develop- ber of interesting application scenarios. The scenarios consid-
ment and large-scale deployment of proprietary system ered here are identity delegation toward a single and an
components. The contribution this article makes is to show arbitrary number of intermediate SPs. Figure 3 and Fig. 4
that it is possible achieve the desired functionality through illustrate how this could be achieved with the proposed
minor modification to a pre-existing protocol. Specifically we scheme using HIP and IPSec. Background on HIP and its
show that it is possible to extend the mobility and multihom- relationship to IPSec is provided in [8].
ing capability present in the Internet Engineering Task Force
(IETF) Host Identity Protocol (HIP) [8]. Identity Delegation Toward Terminal Devices
The proposal extends the IETF HIP with the capability for A clear, potential application for host identity delegation is
movement of communication sessions between terminal to enable single intermediary hosts to be inserted dynamically
devices, as well as the transparent insertion and removal of or removed from an end-to-end session. This occurs transpar-
intermediaries (middleboxes), while retaining ultimate control ently to the transport and higher layers so it does not break
at the terminal devices on either side of an end-to-end con- Transmission Control Protocol (TCP) connections. Figure 3
nection through the use of a central functional building block provides a generic step-by-step illustration of the process of
we call “identity delegation.” This is explained in further inserting an intermediary, SPA in between two terminal
detail in the following section. devices TDB and TDA. This process starts with TDA or TDB

42 IEEE Network • September/October 2008


SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 43

Personal network

PDA Display Laptop


PN ‘A’ PN ‘A’

Service proxies
Mobile ins
router Jo

PN ‘B’ PN ‘B’
Server #1 Server #2 Server #3
(a) (b)

Personal network
Dep
arts
PN ‘A’ PN ‘A’
arts
Dep

Dep
arts

PN ‘B’ PN ‘B’

(c) (d)

n Figure 2. Service-path-based mobility (example): a) many candidate paths, one selected (bold line)
using laptop as SP for display; b) mobile router appears and offers a better path, previous path
replaced by new path; c) mobile router leaves coverage range, display is switched off, path readjusts
laptop now used as a terminal device; d) laptop battery depleted, triggering selection of different end-
to-end path toward PDA.

initiating the action. In the example, TDA is the initiator as can be centralized or distributed. Service selection and discov-
shown in Fig. 3c. ery is not addressed in detail here but is discussed in the con-
text of related work.
Identity Delegation Toward Multiple SPs
Identity delegation assists service composition by allowing two
hosts engaged in a communication session, TDA and TDB, to
Design and Implementation
delegate their identity to the head and tail of the composed In current systems, IP addresses are the most common type of
SP chain, SP1 and SP2. This enables an arbitrary AA number identifier used for end-to-end communication. However, IP
of intermediate SPs to be inserted in between the head and addresses are strongly bound to topological location and thus
the tail transparently to TDA or TDB. Figure 4 follows Fig. 3 not suitable for the purpose of delegation that is required to
and depicts this usage case as a sequence of twelve consecu- realize the scenarios described in the previous section. As a
tive steps. result, we base our design on the HIP, which uses identifiers
Since application data streams can consist of several sepa- that are decoupled from network topology. This section first
rable atomic components that can be routed independently, provides some background on HIP and IPSec, a general
for example, audio and video, and because some intermediary description of the identity delegation approach compared with
SPs can split or join certain application data flows, it is possi- a naïve private key duplication-based approach, and then
ble to construct an end-to-end SP path that is composed of delves into the specifics of the prototype implementation.
two or more converging subpaths. The benefits provided by
splitting and joining media include the potential for selection Host Identity Protocol and IPSec
of hybrid service paths that are more efficient than any avail- Schemes such as mobile IPv6 (MIPv6) [9] and the HIP [10]
able serial service path. In some cases, it may be desirable to provide a static identifier, referred to as a home-address (HoA)
construct service paths that do not completely converge, for in the former and a host identity tag (HIT) in the latter, which
example, to deliver the audio component of a media stream to is separate from its IP or IPv6 address that can be routed. The
a separate network interface or terminal device to the rest of approach presented here is based on HIP, although the general
the stream. approach is applicable to any similar scheme.
For the purpose of discussion, it is assumed that SPs partic- HIP is an end-to-end communication protocol that intro-
ipate in some common directory, the administration of which duces a thin layer of resolution between the network and

IEEE Network • September/October 2008 43


SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 44

(a) (b) (c)

Decision to utilize
TDB TDB service proxy SPA TDB

Data flow
SPA SPA
xy
Pro aling
g n
si
TDA TDA TDA

(d) (e) (f)

TDB sig HIP TDB TDB Da


na ta
lin flo
g w

SPA SPA SPA


HIP ling up
na set
sig ec
IPS
TDA TDA TDA

n Figure 3. Insertion of a single host.

transport layers, decoupling sockets from network addresses. bound end-to-end tunnel (BEET) mode of IPSec operation
Instead of binding to IPv6 addresses, applications bind to 128- that eliminates the requirement to retain the source and desti-
bit HITs, a flat (non-hierarchical) crypto-graphic identifier nation HIT as an encapsulated header in each transmitted
generated by hashing a public key. Due to the decoupling packet [10]. The set up of a HIP connection between two
between the network and transport layers, HIP enables appli- hosts results in a pair of unidirectional BEET mode IPSec
cations on a mobile host to continue communication oblivious security associations (SAs) at each host. The security parame-
to changes in local network addresses. New HIP communica- ter index (SPI) for each SA is contained in the I2 and R2 base
tion sessions are preceded by a challenge-response-based exchange packets and used by the hosts to determine the
authentication process. source and destination HIT. The mapping between IPSec SPI
As HIP deals only with control signaling, standard IPSec is and source/destination HIT is performed by the BEET mode
used to carry the actual data traffic. The implementation of association, which simply replaces the network layer addresses
HIP referred to in this article uses the recently proposed with HITs after decryption.

(a) (b) (c)

TDB Proxy SPA2 TDB SPA2


TDB SPA2 signaling
Da
ta
flo HIP
w signaling

TDA TDA SPA1 TDA SPA1


SPA1

(d) (e) (f)

IPSec setup
TDB SPA2 TDB SPA2 TDB SPA2
Data flow

Arbitrary SPAy
service proxy
configuration SPAx
now possible
IPSec setup
TDA SPA1 TDA SPA1 TDA SPA1

n Figure 4. Insertion of arbitrary number of intermediaries.

44 IEEE Network • September/October 2008


SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 45

TDA TDB
IPSec secured
key for signing. The owner of the private key can
then, at its discretion, sign the messages and return
SPA them to the host that requested them, which can in
connect (IP?)
turn forward them on to the corresponding host
ack(IPSPA) thereby verifying the claim to use the HIT. The
main advantage of this approach is that it avoids
Setup IPSec secured signaling the dissemination of private keys. This approach
p[I1(IPSPA-IPTDB, HITTDA-HITTDB)] I1(IPSPA-IPTDB, HITTDA-HITTDB)
also allows temporary delegation of a HIT because
the destination host can use the HIT only for the
p[R1] R1 duration that the corresponding host does not
request a re-keying procedure to be performed. An
p[R12] I2 additional advantage is that because the location-
update-signaling messages forwarded to the key-
p[R2] R2
holder host for signatures also contain the HIT and
IPSec security association details
IP addresses of the corresponding host, the key-
owner host can keep track of the corresponding
IPSec secured IPSec secured hosts with which communication sessions are being
conducted using its identity. Should the key-owner
wish to revoke the use of its HIT from a certain
n Figure 5. Proxy insertion by TDA. destination host, it need only perform a re-key
directly with the corresponding host. The drawback
of this approach is that if the key-owner host disap-
HIP mobility handling comprises an authenticated location pears, any further requests to sign location-update-signaling
update procedure in which the mobile host delivers a signed messages cannot be processed. This means that a destination
location update packet to the correspondent host with details of host may be forced to terminate a communication session if
the new network layer address. Our contribution is an extension the corresponding host initiates a re-key.
to standard HIP that provides a means to delegate HITs between One potential philosophical ramification of the delegation
physical hosts on-the-fly in response to a mobility event. approach (on HIP specifically) is that so called host identities
Depending on local security policy, either the mobile host no longer explicitly belong to a specific host but are capable
or the correspondent host may ask to re-key the connection in of being moved around between physical hosts, contrary to
response to mobility. Re-keying also can be requested by the original intention of the designers of HIP. The proposed
either host after a certain time period has elapsed. Re-keying approach limits the architectural impact of this by ensuring
involves the deletion of existing IPSec SAs and the establish- that identities are delegated only temporarily and can never
ment of new ones with a newly generated session key. If re- be used without the explicit consent of the actual entity that
keying is not required, existing SAs are deleted and the host identity serves to identify.
re-established with the previous session key. This reconfigura- Our proposal changes the notion of end-to-end security of
tion of IPSec SAs is transparent to the transport layer. HIP because even though communication is still encrypted
To mitigate the effect of the implementation-specific secu- (IPSec), all nodes explicitly included in the service path can
rity policy on experimental results, a base exchange was sub- read the payload. This is a desired functionality because we
stituted for an update procedure in the work presented here. envisage that nodes included on the service path may be
A base exchange is, in fact, in most cases roughly equivalent tasked with some application-layer processing such as content
to an update with the main difference being modified HIP adaptation.
header fields.

Transferring Cryptographic Identities: Duplication


Implementation Details
versus Delegation We implemented the identity delegation approach described
in this article by extending publicly available code from the
The use of cryptographic identifiers, as in HIP, decouples the Infrastructure for HIP project (InfraHIP) [11] as a base. Our
identifier used by applications and transport layer sockets
from the locator used for routing. A natural consequence of
decoupling is that it is possible to transfer the identity between
different physical devices, which is what we want to achieve to
be able to insert intermediate middleboxes into an ongoing
communication session. However, for a host to be able to ver- (b)
ify that it is authorized to use a certain identifier it must pre-
sent messages signed with the private key corresponding to
TCP sequence

the identifier. This can be solved in two ways, either by dupli- (c)
cating the requisite private key on any host that requires it, or
by forwarding location update signaling packets to be signed (a)
on demand. The second approach is advocated in this article
because private keys should, in principle, remain private. To
avoid the introduction of another acronym, the abbreviation
HIT is used interchangeably with the term cryptographic iden-
tifier in the remainder of this article. 2 4 6 8 10 12 14 16
In the approach proposed in this article, hosts that wish to Time (s)
use a delegated HIT are required to forward location update
signaling packets to the owner of the corresponding private n Figure 6. TCP sequence number vs. time plot: two service proxies.

IEEE Network • September/October 2008 45


SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 46

extensions were evaluated on a Debian Linux system running respective times at which SP 1A and SP 2A were inserted in
kernel version 2.6.16. The remainder of this section provides between TDA and TDB. From the plot it can be observed that
an analysis of the signaling procedures specific to the imple- the effects of the insertion of SP2A are similar to those of the
mentation. Our description refers to a simple scenario: a insertion of SP1A in terms of latency and impact on TCP per-
mobile terminal device (TD A) communicating with a static formance. However, the plot also demonstrates that insertion
correspondent terminal device (TD B ). The analysis com- of multiple consecutive SPs does not result in any further
mences with a description of how TDA may delegate its iden- drop in performance provided the SPs are powerful enough to
tity to an intermediary SP. handle the required IPSec sessions without CPU saturation.
Figure 5 depicts the signaling involved in the delegation Some smaller gaps such as that indicated by (b) can be
process. It is assumed that there is a pre-existing trust rela- attributed to the CPU being utilized by the cryptographic
tionship between terminal devices belonging to a single PN. operations required to set up a secure signaling channel prior
The delegation process starts when the TD A1 queries the SP to hand off.
for the IP address (IPSP) that it wants to use for the delegat- It is important to note that if the capability to delegate or
ed identity. At the same instance, TDA and the SP establish transfer identity were not available, then the session must be
transport mode IPSec. This channel carries encapsulated HIP broken and restarted to insert and remove each intermediary
signaling traffic, as well as the IPSec security policy and asso- proxy, causing the TCP sequence number to reset to the
ciation information used to establish a BEET mode IPSec beginning each time.
used for applications data. The HIP signaling traffic between
TDA and SP is sent as encapsulated payloads indicated in Fig.
5 by “p […].” The SP relays any HIP signaling traffic either to
Related Work
TDA or TDB without modification. The whole process of iden- There are a number of previous and ongoing related works
tity delegation and subsequent session redirection is transpar- addressing inter-device mobility. On the other hand, there are
ent to applications running on TDB. The modular nature of fewer proposals that address the insertion of intermediary SPs
our design means that the scheme can be implemented as an as a mobility-handling technique.
extension to an existing HIP-enabled network stack. The only proposal that has a similar functionality is Stream
As mentioned previously, intermediary SP insertion also Control Transmission Protocol (SCTP). Like any other trans-
can be performed by TDB to construct chains of two or more port protocol, a node can be made to act as a proxy. In SCTP,
composed SPs between TDA and TDB. The signaling involved when an end point (A) initiates a connection, the other end
in the insertion of the second SP is equivalent to the SP inser- (B) can, with or without the knowledge of the initiator, open
tion by TDA shown in Fig. 5. an association to another entity (C) and act as a proxy in
between. Then B can either remove itself or make an associa-
Experimental Evaluation tion with C to receive the data from C. All this must be done
Evaluation of the identity delegation scheme was performed before heartbeat signals are exchanged [12]. The HIP base
for the second usage scenarios described previously. The specification provides no mechanism for inter-device mobility.
results of this scenario are also applicable to the other scenar- However, [1] and [8] allude to the possibility of identity dele-
ios because they utilize the same identity delegation mecha- gation using signed certificates. The approach proposed here
nism. The intention of the experiments was twofold: first, to provides a higher degree of transparency and control and is
provide a general evaluation of HIP performance in a real sys- more responsive than delegation certificates.
tem, and second, to show that the delegation approach does Koponen, Gurtov, and Nikander provide a high-level dis-
not result in any measurable performance drop compared to cussion of the potential for HIP identity delegation with cer-
unmodified HIP. Initially, it was assumed that most of the tificates [1]. References [2, 3] are related solutions that enable
hand-off latency overhead would be due to heavy CPU load ongoing communication sessions to be moved between
caused by the cryptographic operations required to sign HIP devices. In [2], Su creates a virtualized network interface that
signaling messages and establish IPSec sessions. As such, it can be transferred between different devices and with it the
was expected that the performance of both approaches would associated communication sessions. It should be noted that
be equivalent, provided that the machines used to sign HIP none of these schemes conflict with HIP or with the scheme
messages and set up IPSec sessions were equal in terms of presented here; in fact, there is even potential for useful inter-
processing power. These assumptions were confirmed by the operation. A major difference of the delegation approach is
evaluation results presented below. that it focused only on managing connectivity and can be
The experiment was performed to evaluate the scenario of implemented in such a way that it is at least transparent to
inserting intermediary SPs, which, for example, can be a con- one end of an end-to-end connection, if not both. There are
tent adaptation SP between two devices engaged in a TCP also a number of related activities in the IETF associated with
communication session. The purpose of evaluating this sce- locator/ID split [13 and 14]. Of this work, the network-based
nario was to demonstrate that the TCP connection between schemes such as Locator/Identifier Separation Protocol
the two devices remains unbroken and that the scheme does (LISP) do not consider the use of middleboxes. The others,
not cause any specific harm to the normal performance of especially mobility Internet key exchange (MOBIKE) and
higher-layer protocols. In reality, altering the end-to-end path SHIM6 focus on device mobility and do not support the use
in midsession may introduce some degradation in TCP perfor- of middleboxes as described in this article.
mance if the new path is of lower quality than the old path;
however, this issue is outside the scope of our proposal.
In these experiments, an initial communication session was
Conclusion
established from TDA (600 MHz PIII) toward TDB (500 MHz Auxiliary devices that can serve as dynamically configured
Celeron). The evaluated scenario was the insertion of two 3- middleboxes introduce potential for a new approach to mobil-
GHz Pentium 4 service proxies, SPA1 and SPA2, in serial between ity handling that makes use of multiple available network
the initial TCP session end points, TD A and TD B. Figure 6 interfaces and terminal devices. Mobility handling in this case
shows the resulting TCP sequence number vs. time plot. means adapting to the changing status of an individual termi-
The two large gaps (a) and (c) in the plot represent the nal by delivering application data flows to the best available

46 IEEE Network • September/October 2008


SENEVIRATNE LAYOUT 9/5/08 1:01 PM Page 47

terminal device(s) and utilizing the available service proxies [3] R. Baratto et al., “MobiDesk: Mobile Virtual Desktop Computing,” Proc.
(middleboxes) in the best possible way. This cannot be MobiCom, Philadelphia, PA, Sept. 2004.
[4] I. G. Niemegeers and S. M. Heemstra De Groot, “From Personal Area Net-
achieved using currently available technology. works to Personal Networks: User Oriented Approach,” Wireless Personal
This article addresses the problem by creating and exploit- Commun., vol. 22, no. 2, 2002, pp 175–86.
ing PNs to provide enhanced mobility handling to mobile [5] S. Herborn, A Personal-Network Centric Approach to Mobility Aware Net-
users. This article is focused on the specific problem of decou- working, Ph.D. diss., Univ. New South Wales, Mar. 2007.
[6] S. Ardon et al., “MARCH: A Distributed Content Adaptation Architecture,”
pling application data flows from specific devices by making Int’l. J. Commun. Sys., vol. 16, 2003, pp. 97–115.
use of multiple available network interfaces and terminal and [7] B. Knutsson and H. Lu, “Architecture and Performance of Server Directed
service proxy devices. Transcoding,” ACM Trans. Internet Technology, vol. 3, 2003, pp. 392–424.
We propose mechanisms to switch ongoing communication [8] R. Moskowitz et al., “Host Identity Protocol,” Internet RFC 5201; http://
www.ietf.org/rfc/rfc5201.txt
sessions between terminal devices and to transparently insert [9] D. Johnson, C. Perkins, and J. Arkko, “Mobility Support in IPv6,” IETF RFC
or remove intermediary service proxies, with the mobility 3775; http://www.ietf.org/rfc/rfc3775.txt
management schemes at layers lower than the transport layer. [10] P. Nikander and J. Melen, “A Bound End-to-End Tunnel (BEET) Mode for
The proposed identity delegation approach is based on the ESP,” Internet draft; http://tools.ietf.org/id/draft-nikander-esp-beet-mode-
08.txt
HIP and allows the identity creator to retain full control over [11] InfraHIP project; http://infrahip.hiit.fi/
the use of their identity. The approach enables the movement [12] T. Aura, P. Nikander, and G. Camarillo, “Effects of Mobility and Multihom-
of communication sessions between terminal devices, as well ing on Transport-Protocol Security,” IEEE Symp. Security and Privacy, Berke-
as the transparent insertion and removal of middleboxes, ser- ley, CA, May 2004.
[13] D. Meyer, “The Locator/ID Split, Its Implications for IP Architecture, and a
vice proxies, or other intermediaries able to perform routing Few Current Approaches,” Future of Routing Wksp., APRICOT ’07;
or adaptation. http://www.1-4-5.net/dmm/talks/apricot2007/locid
[14] D. Lee, X. Fu, and D. Hogrefe, “A Review of Mobility Support Paradigms
for the Internet,” IEEE Commun. Surveys & Tutorials, vol. 8, no. 1, 2006.
Future Work
Future work in support for movement of communication ses- Biographies
ARUNA SENEVIRATNE (aruna.seneviratne@nicta.com.au) received his Ph.D. in elec-
sions between terminal devices may include the coupling of trical engineering from the University of Bath, United Kingdom, in 1982. He is
identity delegation with “checkpointing” and transfer of trans- director of the NICTA Australian Technology Park Laboratory. He has held aca-
port, session, and application layer state to allow full applica- demic appointments at the University of Bradford, United Kingdom, Curtin Uni-
tion sessions to be moved between devices. Another problem versity, and the University of New South Wales. He has also held visiting
appointments at the University of Pierre and Marie Curie, Paris, and INRIA,
worthy of investigation for security reasons is how to enable Nice. In addition, he has been a consultant to numerous organizations including
independent verification of whether or not two terminal Telstra, Vodafone, Inmarsat, and Ericsson.
devices belong to the same PN.
STEPHEN HERBORN (stephen.herborn@nicta.com.au) completed his Ph.D. at the
University of New South Wales under the supervision of Professor Aruna Senevi-
References ratne. He works for Accenture consulting. Between 2003 and 2008, he was a
[1] T. Koponen, A. Gurtov, and P. Nikander, “Application Mobility Using the member of the Networking and Pervasive Computing (NPC) program at NICTA
Host Identity Protocol,” Proc. ICT ’05, Madeira, Portugal, May 2005. in Sydney, first as a student and then as a full-time researcher. While at NICTA,
[2] G. Su, MOVE: Mobility with Persistent Network Connections, Ph.D. diss., his research activities centered around personal area networking, mobile net-
Columbia Univ., Oct. 2004. working, and context-aware computing.

IEEE Network • September/October 2008 47


PARK LAYOUT 9/5/08 1:02 PM Page 48

NAT Issues in the Remote Management of


Home Network Devices
Choongul Park, Kitae Jeong, and Sungil Kim, KT Technology Lab
Youngseok Lee, Chungnam National University

Abstract
Currently, many customer devices are being connected to home networks. For this
reason, it is expected that device management capabilities will be a powerful
instrument for the service provider to cope with high maintenance costs, security
concerns, and management issues related to home networks. Through DM, the ser-
vice provider could provide valuable services such as auto-provisioning, remote
configuration, firmware and software updates, diagnostics, monitoring, scheduling,
and fraud management. However, network address translators that are widely
deployed in the home network environment prohibit DM operations from reaching
user devices behind the NAT. In this article, we focus on NAT issues in the man-
agement of home network devices. Specifically, we discuss efforts relating to stan-
dardization and present our proposal to deploy DM services for VoIP and IPTV
devices behind NATs. By slightly changing the behavior of Simple Network Man-
agement Protocol managers and agents and by defining additional management
objects (MOs) to gather NAT binding information, we could solve the NAT traver-
sal problem under a symmetric NAT. Moreover, we propose an enhanced method
to search for the UDP hole binding time of the NAT box. For evaluation, we
applied our method to 22 randomly selected VoIP devices out of 194 NATed hosts
in the real broadband network and achieved a success ratio of 99 percent for
exchanging SNMP request messages and a 26 percent enhancement in determin-
ing the UDP hole binding time.

I n the broadband network, the service provider must com-


municate with customer devices located at the end of the
last mile for administrative purposes. As user devices
become more diverse and complex, the software that
controls them will become more complex as well. Thus, for
the development of effective device management (DM) soft-
ware, it is important to deal with the various customer prob-
in home networks. According to recent statistics2 from Korea
Telecom (KT), which is the largest Internet service provider
(ISP) in Korea, approximately 20 percent of customer devices
are located behind a NAT middlebox.
A NAT [1] allows several computers to share a single public
IP address. Private IP addresses are assigned to hosts behind a
NAT, which means that communication between hosts and the
lems that could be centered on the device, such as firmware public Internet nodes pass through the NAT, which maintains
updates, software misbehaviors, or configuration errors. port and address translation information. A managing device
The costs relating to deployment, customer care, operation, that is behind a NAT is one of the most urgent problems for
and management in a large-scale network could be significant- service providers who are attempting to provide DM services
ly reduced through the DM services described in Fig. 1. The to their customers. As shown in Fig. 1, a device management
importance of the remote device management function will be system (DMS) communicates with a device management client
emphasized further as the number of broadband subscribers (DMC) to receive information from the remote devices and
carrying network-attached devices increases dramatically. In control them. In a NATed environment, a DMS, like other
Korea, the number of subscribers to high-speed broadband Internet applications, cannot avoid the NAT traversal problem.
Internet services is over 14 million.1 Therefore, it is assumed Namely, if a customer is using a NAT, the device behind the
that many network address translators (NATs) are deployed NAT cannot be controlled by the DMS.
The NAT traversal problem has been studied a great deal in
order to support hosts using voice over IP (VoIP) and peer-to-
1 This was announced by the Korea Information Promotion Committee in peer (P2P) applications behind a NAT [2]. However, to the
the domestic information trend (vol. 7, no. 8) in March 2008. best of our knowledge, this issue has not been considered with
the aim of controlling NATed hosts through DM although
2The percentage of NAT penetration was produced by our device manage- some efforts for standardization have begun recently.
ment system named U-CEMS, even though official figures were lower than In this article, we aim to identify the issues and challenges
ours by 5 percent in July 2007, which was referenced by the Digital Times relating to NATs when using DM to manage home network
news (www.dt.co.kr) published on July 30th, 2007 devices that are behind a NAT. In addition, we present a Sim-

48 0890-8044/08/$25.00 © 2008 IEEE IEEE Network • September/October 2008


PARK LAYOUT 9/5/08 1:02 PM Page 49

Management DM server (DMS) DM client (DMC)


authorities

DM services DM operations
Auto-provisioning Get MOs Notebook
Set MOs PC
Remote diagnostics and control
Service quality management Event notification
Add/replace/copy MOs IPTV
Firmware and software STB
management Exec MO
Status and fault monitoring VoIP
Operation supporting -069
Inventory management P, TR NAT phone
SNM middlebox
Statistics and report WiFi phone
management
Home network

OM
A-D
M

Mobile Note-
phone book
Northbound interface Southbound interface

WiFi PDA
phone
Mobile network

n Figure 1. Management of customer devices.

ple Network Management Protocol (SNMP)-based approach Protocol (SOAP)3/HTTP protocol, it enables communication
to control hosts under NATs, which employs a User Data- between a device and a DMS. Typical applications of TR-069
gram Protocol (UDP) hole-punching technique with the cor- are safe auto-configuration and the control of other customer
rect timer estimation method. The remainder of this article is premises equipment (CPE) management functions within the
organized as follows. We provide an overview of DM proto- integrated framework.
cols and standards and then discusses the open issues of the The SNMP [5, 6] is popular in network management
remote management of NATed devices. We describe our pro- because it enables easy monitoring of the status of network-
posal using SNMP as device management, give the results of attached devices through SNMP. A set of standards for net-
our experiment, and also make comparisons with other DM work management and application-layer protocols, a database
methods. Our conclusions and suggestions for future research schema, and a set of data objects are defined in SNMP, with
are presented later. management data specified in the form of variables on the
managed systems, which describe the system configuration
Managing Devices Behind a NAT information. These variables can then be queried and some-
times set by SNMP manager applications.
Overview of Device Management Protocols
There are many device management protocols; the protocols Open Issues of Remote Management of NATed
we discuss here are presented in Table 1. These are stan- Devices
dards-based protocols that are widely accepted around the
world by many service and solution providers for device man- As explained earlier, several protocols have been standardized
agement. to support device management. However, with the advent of
Open Mobile Alliance (OMA) [3] for DM uses extensible many NATs in the home network environment, a NAT
markup language (XML) for data exchange, more specifically becomes an important part to consider. Therefore, we present
the subset defined by Open Mobile Alliance device synchro- open issues in the remote management of NATed devices.
nization (OMA DS). Open Mobile Alliance-device manage- A NAT translates between internal private IP addresses
ment (OMA DM) is designed to support Wireless Session and external public ones. NATs, particularly network address
Protocol (WSP), Wireless Application Protocol (WAP), port translation (NAPT), one of the most common NAT sys-
Hypertext Transfer Protocol (HTTP), or OBject Exchange tems, deal with communication sessions, which are identified
(OBEX) or similar transports as a transport layer protocol. uniquely by the combination of source IP address, source port
The protocol specifies the exchange of packages during a ses- number, destination IP address, and destination port number.
sion, with each package consisting of several messages and When a NATed device in a private network sends packets
each message in turn being composed of one or more com- to the external host, the NAT intercepts the packet and
mands. The server initiates the commands, and the client is replaces the source private IP address and the port number
expected to execute the commands and return the result with with a public IP address and a port number. Subsequently,
a reply message. when the NAT receives an incoming packet from the same
Technical Report 069 (TR-069) [4] is also a device manage- public IP address and port number, it replaces their destina-
ment protocol that is defined by a digital subscriber line tion address and port number with the corresponding entry
(DSL) forum technical specification. This application layer stored in the translation table, forwarding the packet to the
protocol provides the remote management function for end- private network.
user devices. Based on a bidirectional Simple Object Access The first issue in the remote management of a NATed device
is to find an efficient way to facilitate the successful exchange of
remote management request/response messages through the
3SOAP stood for Simple Object Access Protocol, but this acronym was NAT box. A DMS cannot provide management authorities with
dropped in Version 1.2 of the standard because it was considered to be management functions for a device behind a NAT because the
misleading management operations are blocked by the NAT.

IEEE Network • September/October 2008 49


PARK LAYOUT 9/5/08 1:02 PM Page 50

OMA-DM TR-069 SNMP

Organization OMA1 DSL Forum2 IETF3

Mobile devices (mobile Network elements (computers,


Fixed devices (Residential gateways,
Target devices phones, PDAs, palm top routers, switches, terminal servers,
VoIP phones, IPTV STB, etc)
computers, etc) VoIP phones)

Provisioning Configuration of Fault management Configuration


Auto configuration Dynamic service
device Software management management Account management
Typical uses activation Firmware management Status
FOTA (Firmware Over the Air) Performance management Security
and performance control
Fault management management

Data model OMA-DS4 XML ASN.15

Transport protocol TCP TCP UDP

GetRPCMethods, Get/Set Parameter


Add, Get, Replace, Exec, (Values, Names, Attributes),
Operations Get, GetNext, GetBulk, Trap, Set
Copy, Event (Add/Delete)Object, Download, Inform,
etc.

OMA Device Management


Current CPE WAN Management Protocol v1.1 RFC3411(Dec. 2002), RFC3418(Dec.
V1.2 Approved Enabler
specification (Dec. 2007) 2002)
(April.2006)

1 Open Mobile Alliance; http://www.openmobilealliance.org


2 Digital Subscriber Line Forum; http://www.dslforum.org
3 Internet Engineering Task Force (IETF); http://www.ietf.org
4 Open Mobile Alliance Data Synchronization; the former name of OMA-DS is Synchronization Markup Language (SyncML).
5 Abstract Syntax Notation One (ASN.1): a standard and flexible notation that describes data structures for representing, encoding,
transmitting, and decoding data.

n Table 1. Overview of three device management standards.

On the other hand, a NAT maintains a table that maps way (ALG) for payload address translation, but this ALG
the private addresses and the port numbers to the public has serious limitations, including its scalability and speed of
port numbers and IP addresses. Thus, it is important to deployment of new applications. Moreover, it requires an
note that this “binding” information could be initiated upgrade to existing NATs. A CALLHOME BoF was held at
only by outgoing traffic from the internal host. In addi- the 64th IETF meeting and suggested a connection model
tion, most NATs maintain an idle timer for several outgo- that reversed the client-server role when establishing a con-
ing sessions and close the hole if no traffic is observed for nection. However, its activity ended without a clear result.
the given time period. If we knew the default timer value For these reasons, in this section we focus on the efforts of
of a NAT, we could minimize session management over- the defacto DM standardization body to manage NATed
head. However, there is no way to know the default timer devices. We discuss and compare these with our approach in
value without any information about the NAT itself, such detail in a later section.
as vendor or model. In other words, the issue that we
focus on here is determining the NAT timer value for an Technical Report-111 — TR-111 [9] extends the mechanism
unknown NAT. Therefore, a second important issue is to defined in TR-069 for the remote management of devices
estimate the correct timer value for each NAT box at a and is incorporated in TR-069 ANNEX G. TR-111 enables
minimal cost. Without knowledge of the appropriate timer a management system to access and manage devices con-
values for each NAT, the DMS repeatedly must send nected to a local area network (LAN) through a NAT.
unnecessary probe packets to each NAT to find it in a Two mechanisms were suggested in TR-111. TR-111 Part 1
large-scale network. is defined for the situation in which both the NAT and the
These two issues are not specific to DM but related to all device are TR-069 managed by the same DMS. TR-111
applications under an unknown NAT. To provide DM services Part 2 provides a mechanism to realize a remote connec-
against a NAT, we look into efforts for standardizing NATed tion request to a device behind a NAT, in the event that
device management. the NAT does not support TR-069. It allows a DMS to ini-
tiate a TR-069 session with a device that is operating
Efforts for the Standardization of NATed Device behind a NAT. The simple traversal of UDP through
Management NATs (STUN) protocol mechanism defined in RFC 3489
[10] is included as Part 2 of TR-111, in which a device uses
When it comes to the issue of how to manage NATed STUN to determine whether or not the device is behind a
devices, there exist similar discussions such as RFC 2962 [7] NAT. Then, if the device is behind a NAT with a private
and the CALLHOME Birds of a Feather (BoF) draft [8]. allocated address, the device uses the procedures defined
First, RFC 2962 describes an SNMP application level gate- in STUN to discover the binding timeout. The device

50 IEEE Network • September/October 2008


PARK LAYOUT 9/5/08 1:02 PM Page 51

sends periodic STUN-binding requests at a sufficient fre-


quency to maintain the NAT binding, on which it listens
Our Proposal: Using SNMP as Device
for UDP connection requests. The STUN-based mecha- Management
nism requires a large amount of bandwidth but covers a
wide range of usages including VoIP service deployed More than 700,000 IPTV set-top boxes and VoIP devices
behind an unmanaged or unfriendly NAT, or home net- had been distributed to high-speed broadband customers
works with multiple NATs. Two alternative mechanisms in the KT network as of the end of 2007. 4 Consequently,
based on Dynamic Host Configuration Protocol (DHCP) those customer devices must be managed by the integrated
or universal plug and play (UPnP) have been proposed device management system. To manage those devices, we
and discussed recently, as follows: are adopting the popular SNMP version 2c as a DM proto-
• DHCP-based TR-111: DHCP is a well-known protocol col. There are several reasons for this choice. First, SNMP
used by networked devices to obtain the parameters nec- is a well-known protocol both for service providers and
essary for Internet connectivity. A device informs its device vendors, which means that we can benefit from fit-
connection request URL to the NAT via DHCP option ting the time-to-market by rapidly implementing our
60. The NAT in turn creates a proxy URL to use this req u irements . I n additio n, vendo rs w ant to adopt a
URL for the communication back to the device. Then, lightweight DM client to avoid a cost problem for devices
the device can communicate the proxy URL as its con- in terms of required resources. However, we are trying to
nection request URL to the DMS. The NAT forwards change this situation by considering SNMPv3, TR-069, or
packets on the proxy URL to the device connection OMA-DM because of the issue of security, which could be
request URL. big challenge in the future.
• UPnP-based TR-111: UPnP is a set of protocols that allow Accordingly, in this article we propose an SNMP-based DM
devices in the home network to be connected seamlessly. A method for NATed hosts over unmanageable NATs. The
device uses UPnP to discover the NAT, learn its public IP UDP mechanism using SNMP traps that we propose in this
addresses, and open a forwarding port. After a port is article is easier to implement and deploy than a TCP mecha-
opened, the device can register for notification of changes nism to manage NATed devices because a TCP mechanism
to the wide area network (WAN) IP address and communi- could result in substantial system overhead by holding a large
cate the connection request URL with the public IP address number of sessions initiated by more than hundreds of thou-
of the NAT and the forwarding port to the ambient control sands of devices.
space (ACS).
However, these two alternatives also are limited, in that the Our Challenges in Avoiding the NAT Traversal
NAT must support the DHCP option mechanism or the Problem
UPnP protocol.
As stated previously, to solve the NAT traversal problem, we
OMA CDM — There is standardization work being performed designed a connection request mechanism for the NATed
in the area that involves the discussion of converged DM device, which enables a DMS to exchange SNMP messages
(CDM) issues, such as the configuration and management of through an unknown NAT.
devices that support one or more bearer technologies for ser- Based on the well-known NAT traversal mechanism of
vices. This is expected to standardize urgent management UDP hole punching, we slightly modified the behaviors of the
issues for devices within a consumer’s network, including the SNMP manager and the agent and defined additional MOs so
assessment of device management when a device is located that we could effectively solve the NAT traversal issues in a
behind a NAT. However, the CDM standardization issue, manner that avoids the problems of cost and scalability.
which will be a part of the OMA-DM v2.0 work item docu- Moreover, we developed an enhanced method to determine
ment (WID), is still in its infancy. the UDP hole binding time for the NAT box and applied it to
VoIP devices. Through experiments, we obtained a signifi-
Lessons from Standardization Efforts — Note that two approach- cantly reliable result that 99 percent of exchanges of SNMP
es exist to provide DM services over NATs. One is to make request messages were successful, and the searching time for
NATs manageable like ALGs or friendly to device manage- the default UDP hole binding timer value was reduced by 26
ment protocols like proxies. Thus, the NAT box could relay percent.
the operation of a DMS to the device. The other is to adopt Most NATs hold an idle timer for a UDP session and close
common NAT solutions like STUN and to make DMCs inde- the hole if no traffic is observed for the given time period.
pendent of NAT traversal mechanisms. Hence, we can reach a device behind a NAT by using the
The first solution is not easily applied in the real environ- UDP hole punching scheme, in which the SNMP agent sends
ment because most currently deployed NATs do not support keep-alive trap messages to the DMS periodically. This
SNMP ALG, the DHCP option, or UPnP. Thus, the deploy- enables the DMS to recognize the private/public address and
ment cost becomes expensive. The second solution using port binding information. Because the SNMP agent at the
STUN, a well-known NAT traversal mechanism, also has scal- device usually uses UDP port 161 for the SNMP request mes-
ability and cost issues in that it requires additional dedicated sage, the binding entry for UDP port 161 must exist in the
servers and clients. binding table at the NAT. Generally, any port could be allo-
Then, we propose a SNMP-based device management cated to the source port for sending the SNMP trap message.
scheme exploiting the UDP hole punching technique [11] that If an SNMP agent sends the trap message using the fixed
easily could be implemented and deployed in the current net- UDP port of 161, we can ensure that the binding entry will be
work. The comparison of those issues is discussed more maintained in the binding table at the NAT.
specifically later. On the other hand, there is another problem in managing
devices with a private IP address, known as a symmetric NAT
problem, whereby only a public IP:port can reach a private
4This was announced in the pressroom on the MegaTV portal; IP:port if the traffic is initiated from the private network. To
http://mymegatv.com/pressroom/pressroomList03.asp. solve this problem, a DMS uses UDP port 162 to send the

IEEE Network • September/October 2008 51


PARK LAYOUT 9/5/08 1:02 PM Page 52

DMC NAT middlebox DMS

(Address port)

Private IP address: A (A:161) (B:p) (C:162)


Source Destination
Public IP address: B, C Address A C
Port 161 162
Port: 161,162,p(dynamic)
Trap object ClientAddress=A
SNMP trap
Creating
Step 1 UDP hole
Source Destination
Address B C
Port p 162
Trap object ClientAddress=A
Extract A
Binding SNMP trap with (B:p)
Step 2 discovery

Keep punching SNMP trap at hole-timer interval


Step 3 UDP hole
Source Destination
Address C B
Port 162 p
Pass
through SNMP request
Sending SNMP UDP hole
Step 4 request message
Source Destination
Address C A
Port 162 161
SNMP request
Source Destination
Address A C Source Destination
Port 161 162 Address B C
Port p 162
SNMP response
SNMP response

n Figure 2. The sequential message flow of SNMP device management on the NAT environment.

SNMP request message because the SNMP agent sends the • Precondition: The DMC uses port 161 as its listening port
SNMP trap message to the destination port of 162. for receiving SNMP requests from the DMS and as its
By using these concepts, we propose an SNMP-based source port for sending SNMP trap messages. The DMS
remote management method for a device behind a NAT as uses port 162 as its listening port for receiving SNMP trap
follows. First, we define the behavior of the SNMP agent messages sent from the DMC and as its source port for
embedded in the device. An SNMP agent triggers a UDP hole sending SNMP request messages to the DMC.
by periodically sending SNMP trap messages and keep-alive • Step 1: Creating a UDP hole: When the IP address (A) is
messages to the DMS. We chose 180 seconds for the interval assigned to the device by the NAT, the DMC (A:161) sends
of keep-alive messages because it was the most frequently the SNMP trap message to the DMS (C:162), which
found value of NATs in our experiments. We also fixed the includes the device address as a trap object and its value
source port of the SNMP trap message sent from the SNMP (ClientAddress=A), which provides the private IP address of
agent to UDP port 161. Second, we added a function of gath- the device. The NAT translates the IP:port pair (A:161) of
ering the agent IP address and its source port number to the the SNMP trap packet to (B:p), which are the IP address
SNMP manager in the DMS. If the agent IP address is differ- and the port number allocated by the NAT, randomly or
ent from the source IP address, the SNMP manager decides sequentially. In other words, the NAT creates a UDP hole
that the device that has sent the SNMP message is located and a binding entry (A:161, B:p).
behind a NAT. To avoid the symmetric NAT problem, the • Step 2: Binding discovery: The DMS determines that the
SNMP manager must fix the source port to 162 when sending device is located behind a NAT when it knows that the
the SNMP request message. address (A) of the device extracted from the SNMP object
ClientAddress differs from the source address of the SNMP
Proposal of SNMP-Based DM over NAT trap packet. If a device is behind the NAT, the DMS
Figure 2 shows the message flow associated with the proce- extracts the binding information (A:161, B:p) from the
dures of our proposed method to manage a NATed device by SNMP trap message, whereby the IP:port (A:161) of the
using the UDP hole punching scheme. device is extracted from the SNMP message, and the
In Fig. 2 the address/port pairs use the notation (Address:port). IP:port pair (B:p) of the NAT is extracted from the received
There are four steps in our mechanism, as follows: SNMP trap packet.

52 IEEE Network • September/October 2008


PARK LAYOUT 9/5/08 1:02 PM Page 53

Method Concept Algorithm

Linear 1) EV = IV + TE
• Search UDP mapping time with a linearly increasing value of EV
search 2) Wait for EV and send SNMP command

• Similar method as TCP congestion control 1) EV = IV + PEV*2


Slow start
• Search UDP mapping time by increasing EV exponentially, but after 2) Wait until EV and send SNMP command
search
failure perform a linear search increase 3) If success go to 1) If fail do linear search

1) EV = (MinVal + MaxVal)/2
Binary 2) Wait until EV and send SNMP command
• Use binary search method
search 3) If success, MinVal = EV and go 1)
If fail, MaxVal = EV and go 1)

• Limited types of NAT vendors are deployed in the real environment 1) Binary search in the Top N list
TopN binary
• Maintain TopN list of UDP hole time 2) Binary search between TopN(i) and
search
• Based on the TopN list, first perform binary search between entries TopN(i+1) until the difference is in TE

n Table 2. Methods for searching for the UDP hole timer values of a NAT (EV: Expect Value, IV: Initial Value, TE: Tolerable Error,
PEV: Previous Expect Value).

• Step 3: Keep punching the UDP hole: To maintain the UDP nation-wide high-speed broadband network, as shown in Fig.
hole bound with entry (A:161, B:p) of the NAT, the DMC 3. Of 1177 manageable VoIP devices, 194 hosts are shown to
keeps sending SNMP trap messages to the DMS (C:162) at be NATed in our network; thus, on average 17 percent of end
hole-timer intervals. hosts are NATed. Based on these hosts, we randomly selected
• Step 4: Sending SNMP request messages: When DMS (C:162) 22 devices and tested our proposed method. The reason why
wants to manage a NATed device, it sends the SNMP we chose only a small number of devices is that we had to
request message to manage the device through the hole carefully test the minimum number of devices so as not to
(B:p). Then, the message can pass through the hole and affect customer service if we sent command messages repeat-
reach the NATed device. The NAT translates (B:p) to edly.
(A:161) according to the binding table. The DMC receives The SNMP manager was implemented based on University
this SNMP message from UDP port 161, and it sends the of California, Davis (UCD) SNMP version 4.2.6, 5 and can
SNMP response message to the DMS (C:162). The process send and receive SNMP messages simultaneously using one
to deliver this SNMP response message with the result to port, UDP 162. Hence, we embedded the SNMP agent into
the DMS is the same as that of the SNMP request message. the device with our proposed method.
Table 3 shows the results of different methods of searching
Heuristic to Estimate the UDP Hole Punching Timer for the UDP hole time. Our heuristic, based on the binary
Values search method, showed the best performance in the experi-
ments. Compared with the popular binary search method, our
In general, UDP mapping timer values are not standardized heuristic could reduce search times by 26 percent, as well as
so they could be different for each NAT vendor. For the reduce the average number of probes by 0.6.
remote management of devices behind a NAT from a public Table 4 shows the command success ratio of our proposed
network, the DMS should make the user device send a UDP method. We could achieve a 99 percent success rate of SNMP-
packet periodically before the UDP hole is closed. In other command penetration into the NATed devices. This result
words, the device should punch the UDP hole periodically at provides compelling evidence that it is possible to manage a
the time interval configured by the DMS. Note that searching device using private IP addresses, without any additional
the UDP mapping time could cause a large amount of over- servers or equipment. It also shows that the top N binary
load to the DMS in a large-scale network because the DMS search heuristic is useful for the efficient management of
must send many probe packets with the estimated timeout val- NATed hosts.
ues for each NAT box. There might be an argument with our NAT traversal success
As such, we propose a heuristic method to maintain the list rate of 99 percent when compared with the well-known result
of the top 10 UDP mapping times statistically obtained of 80 percent in [12]. First, we think that the small number of
through experiments. Then, we applied the binary search algo- experimented devices (22 devices) could be contributing to the
rithm to the list of the top 10 known timer values.
That is, a DMS uses the binary search algorithm to
find the UDP mapping time in the list and then to Average number Average
Method Device Test
search it between two items. Table 2 summarizes of probes time (s)
four kinds of applicable search methods. The experi-
mental results are explained in the next section. Linear search 22 196 25.6 4608

Experimental Results Slow start search 22 275 19.2 2984


To evaluate the proposed method, we implemented
an SNMP manager and defined the client behavior. Binary search 22 488 2.9 470
With the implementation, we tested our method in a
TopN binary search 22 541 2.3 348

5 The current release version of UCD SNMP is NET-SNMP n Table 3. Average time of searching UDP hole timer values for different
5.4.1; http://net-snmp.sourceforge.net NATs.

IEEE Network • September/October 2008 53


PARK LAYOUT 9/5/08 1:02 PM Page 54

Hole punching Searching Success


high success ratio of passing through NATs, com-
Device Test Success pared to the experiment in [12] with 40 different
method hole timer ratio (%)
kinds of NATs. We could not perform experi-
No hole punching — 22 NA NA 0 ments with a large number of subscribers because
testing that should not affect ongoing service. It
Fixed timer of
also is possible that the failure rate of 1 percent is
Hole punching 22 2160 1636 75 due to multiple NATs or abnormal NATs.
180 s

TopN binary Comparison of DM Approaches under


Hole punching 19 1221 1207 99
search NAT
n Table 4. SNMP command success ratio for NATed devices. In this section, we present a brief comparative
discussion of various approaches to realizing
remote device management functions under
NATs as in Table 5.
Using STUN servers requires an additional
TR-111 server, as well as client modules to support the
Our
CDM mechanism, thus demanding more overhead in
proposal
STUN DHCP UPnP the aspects of implementation complexity, scala-
bility, fault tolerance, security, deployment cost,
Implementation complexity l ™ ™ NA ™ command response time, system load, and com-
patibility. DHCP and UPnP are available only in
Scalability º ™ ™ NA º environments where NATs are compatible with
TR-111. Moreover, as mentioned before, CDM
Fault tolerance l ™ ™ NA ™ still has no standardization result. However, our
proposal based on SNMP shows manageable
Security ™ º º NA º advantages when compared with other methods.

Deployment cost l º º NA ™ Conclusion and Future Work


Command response time l º º NA ™ As the number of NATs deployed in the broadband
network grows, more and more IP devices will be
System load l ™ ™ NA º hidden behind a NAT. Therefore, it is necessary for
a DMS to find a connection request mechanism for
º l l ™
Compatibility NA NATed devices so that it can exchange messages
through an unknown NAT. When we apply the
–Implementation complexity: Our proposal employs a NAT traversal mecha-
known NAT traversal solutions to the real environ-
nism by slightly changing SNMP-based DM software. However, STUN requires ment, we can meet new challenges, such as expen-
an additional dedicated server and a client in addition to the DM software. sive maintenance costs and symmetric NAT
–Scalability: The periodic hole punching method that increases the number of problems. The problem of expensive maintenance
in-flight packets in proportion to the number of devices may cause a scalabili- costs is related to the additional servers that must
ty problem like STUN. However, our proposal uses fewer and smaller-sized be deployed, or the NATs that must be upgraded to
packets than STUN. support a remote connection request mechanism.
–Fault tolerance: Our proposal has fewer points of failures causing the overall The problem of symmetric NATs is that the NAT
availability of the system, whereas TR-111 with STUN needs an additional traversal mechanism must work under all different
STUN server and a client for NAT traversal. kinds of NATs, including symmetric NATs.
–Security: Our proposal may be affected by source address spoofing attacks In this article, we presented a simple overview
like DHCP and UPnP. of early standardization efforts and have pro-
–Deployment cost: Except slightly changing the SNMP trap mechanism, our posed an effective remote SNMP connection
proposal does not require an additional deployment cost. On the other side, request mechanism for NATed devices using the
for TR-111, STUN needs dedicated clients and servers, and DHCP or UPnP UDP hole punching method. By slightly modify-
functions should be deployed on all NATs. ing the behaviors of the SNMP manager and the
–Command response time: Our proposal could reduce the command response agent, and by defining additional management
time by estimating the UDP hole timer values correctly. However, STUN might objects to gather NAT binding information, we
experience a long command response time because of additional intermediate solved the cost problem and symmetric NAT
nodes like STUN servers and clients. issue. In addition, we proposed an enhanced
–System load: Our proposal will result in less server load to maintain the NAT method to efficiently determine the binding time
traversal mechanism using simple UDP packets. However, STUN uses a little
of the UDP holes of the NAT box. For the exper-
complex mechanism exchanging many UDP and TCP packets for NAT traversal.
imental evaluation, we applied our method to 22
–Compatibility: Our proposal is compatible with all NAT environments includ-
VoIP devices behind NATs in the real environ-
ing symmetric NAT. However, STUN is the independent standard in addition
to device management and needs additional considerations for the compati-
ment and achieved a success ratio of 99 percent
bility with the legacy system. in exchanging SNMP request messages and a 26
percent enhancement in determining the UDP
n Table 5. Comparisons of device management methods (™ : good, º : average, hole binding time. Even though the proposed
l : bad). DM protocol is to be changed to SNMP v3 in the
future, we believe that this would necessitate only
a slight change in our scheme.

54 IEEE Network • September/October 2008


PARK LAYOUT 9/5/08 1:02 PM Page 55

Backbone Access network Home network

DMS located in Ethernet, xDSL, FTTH 194 hosts exist under NATs
KT’s backbone network (KORNET) out of 1,177 VoIP phones

Ethernet
DMS
NAT VoIP phone
FES
DMC

xDSL

KORNET Modem NAT VoIP phone


DSLAM
DMC

FTTH
Modem NAT VoIP phone
OLT
DMC

n Figure 3. Test environment of estimating UDP hole time at the commercial broadband network in Korea.

There are some issues in our system. One is the issue of [8] E. Lear, “Simple Firewall Traversal Mechanisms and Their Pitfalls,” IETF draft,
scalability, in that our system must put up with thousands of Oct. 2005.
[9] DSL Forum, “Technical Report 111, Applying TR-069 to Remote Management
keep-alive SNMP trap messages per minute from thousands of Home Networking Devices,” Dec. 2005.
of devices. As a result, we are now approaching a time-to-live [10] J. Rosenberg et al., “STUN — Simple Traversal of User Datagram Protocol
(TTL)-based scheme to avoid heavy traffic that is not required through Network Address Translators,” IETF RFC 3489, Mar. 2003.
to reach a DMS, in which we send periodic trap messages [11] B. Ford and P. Srisuresh, “Peer-to-Peer Communication across Network
Address Translators,” USENIX Annual Technical Conf., 2005.
with TTL = n (n being the least count of TTL to punch the [12] C. Jennings, “NAT Classification Test Results,” IEEE draft, July 2007.
hole), which will be discussed in the future. Another issue is
the security issue arising from SNMP v2. To address this Biographies
issue, in the near future we are considering changing the DM CHOONGUL PARK (lion@kt.com) received B.S. and M.S. degrees in computer engi-
protocol to one of the standards-based secure DM protocols, neering in 2001 from Pusan National University and in 2008 from Chungnam
such as SNMP v3, OMA-DM, or TR-069. In future work, we National University, Korea, respectively. Currently, he is a Ph.D. student at
are going to estimate and analyze real environment results in Chungnam National University. He joined KT Technology Laboratory in 2002
and started his research work on the Next Generation OSS project. Since 2005
our large scale VoIP and IPTV network that will be widely he has been a member of the KT Device Management project and a senior
deployed this year, reaching over a million devices. researcher in the Department of Next Generation Network Research. His
research interests include device management and traffic engineering in the next-
Acknowledgment generation Internet.
This work was partly supported by the IT R&D program of
KITAE JEONG (kjeong@kt.com) received B.S. and M.S. degrees in 1983 and 1986
MKE/IITA (2008-F-016-01, Collect, Analyze, and Share for in electronic engineering from Kyungpook National University, and a Ph.D. from
Future Internet) and partly by the ITRC (Information Tech- Tohoku University of Japan in 1996. He joined KT Laboratory in 1986, and is
nology Research Center) support program of MKE/IITA the leader of the Department of Next Generation Network Research. His
(IITA-2008-C1090-0801-0016). The corresponding author is research interests are in the fields of device management, next-generation net-
work, and fiber to the home.
Youngseok Lee.
References SUNGIL KIM (sikim@kt.com) received B.S. and M.S. degrees in 1992 and 1994 in
[1] M. Holdrege, “IP Network Address Translator (NAT) Terminology and Con- computer engineering from Choongbuk National University. He joined KT Tech-
siderations,” IETF RFC 2663, Aug. 1999. nology Laboratory in 1994, and is the leader of the KT Device Management pro-
[2] S. Guha and P. Francis, “Characterization and Measurement of TCP Traver- ject and delegate to the Broadband Convergence Network Standardization
sal through NATs and Firewalls,” Proc. Internet Measurement Conf., Berke- Group. His research interests are in the fields of device management and next-
ley, CA, Oct. 2005. generation networks.
[3] OMA, “OMA Device Management V1.2 Approved Enabler,” Feb. 2007.
[4] DSL Forum, “CPE WAN Management Protocol v1.1,” Dec. 2007. YOUNGSEOK LEE [SM] (lee@cnu.ac.kr) received B.S., M.S., and Ph.D. degrees in
[5] J. Case et al., “Introduction and Applicability Statements for Internet Stan- 1995, 1997, and 2002, respectively, all in computer engineering, from Seoul
dard Management Framework,” IETF RFC 3410, Oct. 2000. National University, Korea. He was a visiting scholar at Networks Lab at the Uni-
[6] W. Stallings, SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, 3rd ed., versity of California, Davis from October 2002 to July 2003. In July 2003 he
Addison Wesley, 1998. joined the Department of Computer Engineering, Chungnam National University.
[7] D. Raz, J. Schoenwaelder, and B. Sugla, “An SNMP Application Level Gate- His research interests include Internet traffic measurement and analysis, traffic engi-
way for Payload Address Translation,” IETF RFC 2962, Oct. 2000. neering in next-generation Internet, wireless mesh networks, and wireless LAN.

IEEE Network • September/October 2008 55


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 56

Improving the Performance of Route Control


Middleboxes in a Competitive Environment
Marcelo Yannuzzi, Xavi Masip-Bruin, Eva Marin-Tordera, Jordi Domingo-Pascual,
Technical University of Catalonia
Alexandre Fonte, Polytechnic Institute of Castelo Branco
Edmundo Monteiro, University of Coimbra

Abstract
Multihomed subscribers are increasingly adopting intelligent route control solutions
to optimize the cost and end-to-end performance of the traffic routed among the
different links connecting their networks to the Internet. Until recently, IRC practices
were not considered adverse, but new studies show that in a competitive environ-
ment, they can lead to persistent traffic oscillations, causing significant performance
degradation rather than improvements. To cope with this, randomized IRC tech-
niques were proposed. However, the proliferation of IRC products raises concerns,
given that randomization becomes less effective as the number of interfering IRC
systems increases. In this article, we present a more scalable route control strategy
that can better support the foreseeable spread of IRC solutions. We show that by
blending randomization with adaptive filtering techniques, it is possible to drasti-
cally reduce the interference between competing route controllers, and this can be
achieved without penalizing the end-to-end traffic performance. In addition to the
potential improvements in terms of scalability and performance, the route control
strategy outlined here has various practical advantages. For instance, it does not
require any kind of protocol or coordination between the competing IRC middle-
boxes, and it can be adopted readily today because the only requirement is a soft-
ware upgrade of the available route controllers.

T oday, the vast majority of the communications on the


Internet are between nodes located in non-transit (i.e.,
stub) networks. Stub networks are primarily composed
of medium and large enterprise customers, universities,
public administrations, content service providers (CSPs), and
small Internet service providers (ISPs). These networks
exploit a widespread practice called multihoming, which con-
works. Most available IRC solutions follow the same princi-
ple, that is, they dynamically shift part of the egress traffic of
a multihomed subscriber from one of its ISPs to another,
using measurement-driven path switching techniques. IRC sys-
tems operate in relatively short timescales — even reaching
switching frequencies on the order of a few seconds — allow-
ing IRC users to balance cost and performance criteria
sists of using multiple external links to connect to different according to the priority and requirements of their applica-
transit providers. By increasing their connectivity to the Inter- tions.
net, stub networks potentially can obtain several benefits, Despite these strengths, IRC practices have one major
especially in terms of resilience, cost, and traffic performance weakness, that is, they try to achieve a set of local objectives
[1]. These are described as potential benefits because multi- individually without considering the effects of their decisions
homing per se cannot improve any resilience, cost, and traffic on the performance of the network. Recently, it was discov-
performance. Accordingly, multihomed stub networks require ered that in a competitive environment, IRC systems actually
additional mechanisms to achieve these improvements. In par- can cause significant performance degradation rather than
ticular, when an automatic mechanism actively optimizes the improvement. In [5], the authors show that persistent oscilla-
cost and end-to-end performance of the traffic routed among tions can occur when independent controllers become syn-
different links connecting a multihomed stub network to the chronized due to a considerable overlap in their measurement
Internet, it is referred to as intelligent route control (IRC). time windows. To avoid synchronization issues, the authors
During the last few years, IRC has attracted significant propose randomized IRC strategies and empirically show that
interest in both the research and the commercial fields. Sever- the oscillations disappear after introducing a random compo-
al vendors are developing and offering IRC solutions [2–4] nent in the route control decision.
that increasingly are being adopted by multihomed stub net- It is important to note that although randomization offers a
straightforward mechanism to mitigate the oscillations, it can-
not guarantee global stability. This issue raises concerns given
This work was partially funded by the European Commission through the proliferation of IRC products because as the number of
CONTENT under contract FP6-0384239. interfering IRC systems increases, randomization becomes

56 0890-8044/08/$25.00 © 2008 IEEE IEEE Network • September/October 2008


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 57

MMM IRC box


IP systems, such as endowing them with an SRC
algorithm supported by adaptive filtering
RCM techniques, is enough to drastically reduce
the number of path switches, and most impor-
Enforce tantly, this can be accomplished without
RVM routing
decision penalizing the end-to-end traffic perfor-
mance. Extensive simulations show that with
SRC, it is possible to reduce the overall num-
ber of path switches between approximately
ISP11 ISP12 ISP1n
40 to 80 percent on average (depending on
the load on the network) and still obtain bet-
ter end-to-end traffic performance than with
randomized IRC techniques in a competitive
environment.
The rest of the article is structured as follows.
ISP21 ISP22 ISP2m
First, we present the basics of IRC. Then, we
overview the most relevant related work. Next,
we analyze some general aspects of different
MMM IRC box IRC strategies and describe the SRC approach
together with some of our main results. We con-
clude with directions for future research in the
RCM area of IRC.

RVM IP The Basics of IRC


A typical IRC scenario with two different con-
figurations is shown in Fig. 1. The IRC box at
n Figure 1. The IRC model. IRC systems are composed of three modules: the the top of Fig. 1 is connected by a span port off
monitoring and measurement module (MMM), the route control module a router or switch so although the egress traffic
(RCM), and a reporting and viewer module (RVM). is controlled by the box, it is never forwarded
through it. The IRC box in the multihomed net-
work at the bottom of Fig. 1 is placed along the
less effective, and hence, the more likely it is that the oscilla- data path so traffic always is forwarded through it. Typically,
tions reappear. In light of this, it is necessary to explore more the former configuration offers a more scalable solution than
scalable route control strategies that can safely support the the latter, in the sense that it is able to control and optimize a
foreseeable spread of IRC solutions. larger number of traffic flows.
In principle, two research approaches can be taken. On the Conceptually, an IRC system is composed of the following
one hand, the research community could formally study the three modules (Fig. 1):
stability properties of IRC practices and provide guidelines on • Monitoring and measurement module (MMM)
how to design IRC systems with guaranteed stability. Unfortu- • Route control module (RCM)
nately, several challenging stages must be completed properly • Reporting and viewer module (RVM)
before a formal study of stability can be conducted. For The existing IRC systems can control a moderately large
instance, accurate measurements are required to understand number of flows1 toward a set of target destination networks.
comprehensively the actions of the closed-source IRC systems These target destinations can be configured manually or dis-
deployed today (e.g., [2–4]) and thereby, model the stochastic covered by means of passive measurements performed by the
distribution of path switches in a competitive IRC environ- MMM. By using passive measurements, the MMM can rank
ment. Only after characterizing the distribution of path the destinations according to the amount of traffic sourced
switches, is it possible to formally study the stability aspects of from the local network and subsequently optimize the perfor-
competitive IRC. mance for the traffic toward the D destinations at the top of
In the absence of such characterization, the practical alter- the rank. The MMM also uses passive measurements to moni-
native is to find ways to drastically reduce the potential inter- tor the target flows in real time and analyze packet losses,
ference between competing route controllers without latency, and retransmissions, among others, as indicators of
penalizing the end-to-end traffic performance. This is precise- conformance or degradation of the expected traffic perfor-
ly the challenge addressed in this work. This article makes the mance. To assist the RCM in the dynamic selection of the
following contributions: best egress link to reach each target destination, the MMM
• We show that although randomization offers a straightfor- probes all the candidate paths using both Internet Control
ward way to mitigate the oscillations, it leads to a large Message Protocol (ICMP) and Transmission Control Protocol
number of unnecessary path switches. (TCP) probes.
• We report some of our recent results on the development of The set of active and passive measurements collected by
strategies blending randomization with a lightweight and the MMM enables IRC systems to concurrently assess the
more “sociable” route-control algorithm. The term sociable quality of the active and the alternative paths toward the tar-
route control (SRC) refers here to a route control strategy get destinations. The role of the RCM is to dynamically
that explicitly considers the potential implications of its
decisions in the performance of the network and can adap-
tively restrain its intrinsic selfishness depending on the net- 1 Typically this is on the order of several hundreds and even thousands,

work conditions. using a configuration like the one shown at the top of Fig. 1 with several
• We show that a simple enhancement to randomized IRC border routers.

IEEE Network • September/October 2008 57


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 58

choose the best egress link for each target flow, depending on evaluations of multihoming in combination with IRC tools, as
the outcome of these measurements. More specifically, the in [1, 8, 11]. These research publications, along with the docu-
RCM is capable of taking rapid routing decisions for the tar- mentation provided by vendors, allowed us to capture and
get flows, often avoiding the effects of issues such as distant model the key features of conventional IRC techniques. A
link/node failures2 or performance degradation due to conges- similar approach was followed by the authors in [5]. For sim-
tion.3 plicity, and as in [5, 8, 10], we consider traffic performance as
The third module of an IRC system, namely the RVM, typ- the only criteria to be optimized for the target flows.4
ically supports a broad set of reporting options and provides
online information about the average latency, jitter, band-
width utilization, and packet loss experienced through the dif-
The General IRC Network Model
ferent providers, summaries of traffic usage, associated costs The general IRC network model is composed of a multi-
for each provider, and so on. homed stub network S, a route controller C , the transit
Overall, IRC offers an incremental approach, complement- domains, and a set of target destinations {d} with cardinality
ing some of the key deficiencies of the Interior Gateway Pro- |d| = D to be optimized by C . The source domain S has a set
tocol/Border Gateway Protocol (IGP/BGP)-based route of egress links {e}, with |e| = E. For the sake of simplicity,
control model. It is worth emphasizing that the set of candi- we keep the notation in the granularity of destinations (d),
date routes to be probed by IRC boxes usually is determined but the model easily can be extended to consider various flows
by IGP/BGP; so conversely to overlay networks [8], IRC boxes per target d.
never circumvent IGP/BGP routing protocols. The effective- To dynamically decide the best egress link for each target
ness of multihoming in combination with IRC is confirmed destination d, the MMM in C probes all the candidate paths
not only by studies like [8], but also by the increased trend in through the egress links e of S. Then, the collected measure-
the deployment of these solutions. ments are processed and abstracted into a performance func-
In this article we deal with the algorithmic aspects of tion Pe(d,t) at time t, associated with the quality perceived for
IRC systems so hereafter we focus our attention on the each of the available paths toward the target destinations d.
RCM in Fig. 1 — the functionality of the MMM and RVM Let N (d) denote the number of available paths to reach d.
modules essentially is orthogonal to the proposals made in Because N (d) usually represents the number of candidate
this work. paths in the forwarding information base (FIB) of the BGP
border routers of S, N(d) ≤ E ∀ d.
We assume that the better the end-to-end traffic perfor-
Related Work mance perceived by C for a target destination d through
In [9], the authors simultaneously optimize the cost and per- egress link e, the lower the value of the performance function
formance for multihomed stub networks, by introducing a Pe(d,t).
series of new IRC algorithms. The contributions of that work In this framework, IRC strategies can be taxonomized into
are fundamentally theoretical. For instance, the authors show two categories, namely, reactive route control (RRC) and
that an intelligent route controller can improve its own per- proactive route control (PRC). RRC practices switch a target
formance without adversely affecting other controllers in a flow from one egress link to another only when a maximum
competitive environment, but the conclusions are drawn at tolerable threshold (MTT) is met. The MTTs are application-
traffic equilibria (traffic equilibrium is defined by the authors specific and typically represent the maximum acceptable pack-
as a state in which no traffic can improve its latency by unilat- et loss, the maximum tolerated packet delay, and so on, for a
erally changing its link assignment). However, after examining given application. Beyond any of these bounds, the perfor-
and modeling the key features of conventional IRC systems, it mance perceived by the users of the application becomes
becomes clear that they do not seek this type of traffic equi- unacceptable.
libria. Indeed, more recent studies, such as [5], show that in PRC strategies, on the other hand, switch traffic before any
practice, the performance penalties can be large, especially of the MTTs are met and in turn, can be taxonomized into
when the network utilization increases. two categories: those that can be called fully proactive (FP),
In light of this, and considering the current deployment and those that follow a controlled proactivity (CP) approach.
trend of IRC solutions, it becomes necessary to explore alter- FP IRC practices always switch to the best path. Therefore,
native IRC strategies. These new route control strategies the dynamic optimization problem addressed by a FP route
should always improve the performance and reliability of the controller is to:
target flows, or at least, they should drastically reduce the
potential implications associated with frequent traffic reloca- Find the min{Pe(d,t)} ∀ d, t and enforce the redirection of the
tions, such as persistent oscillations causing packet losses and corresponding traffic to the egress link found.
increased packet delays [5].
Although most commercially available IRC solutions do not The alternative offered by CP is to keep the proactivity, but
reveal in depth the technical details of their internal operation switch traffic as soon as the performance becomes degraded
and route control decisions, the behavior of one particular to some extent, typically represented by a relocation threshold
controller is described in detail in [10]. That work also pro- (R th). The dynamic optimization problem addressed by CP-
vides measurements that evaluate the effectiveness of differ- based strategies can be formulated as follows.
ent design decisions and load balancing algorithms. Akella et
al. also provided rather detailed descriptions and experimental Let e best denote the egress link utilized to reach d at time t,
(d,t)
and let e′ be such that Pe′ = min{Pe(d,t)} for destination d at
time t. 5 A CP-based route controller would switch traffic to d
(d,t)
2 The timescale required by IRC systems to detect and react to a distant (d,t)
from ebest to e′ whenever Pebest – Pe′ ≥ Rth, with Rth > 0.
link/node failure is very small compared to that of the general IGP/BGP
routing system [2–4, 6].
4Cost reductions are typically accomplished by aggregating traffic toward
3 This cannot be automatically detected and avoided with BGP [7]. non-target destinations over the cheapest ISPs.

58 IEEE Network • September/October 2008


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 59

RTTe(d,t) Me(d,t)
First Second RCM
MMM filter filter
Medians

(ms)
RTTe(d,t) Me(d,t)
MTT
Qe(d,t)

Qe(d,t)
Randomized Compute ∆(d,t)
0
SRC Sampling instants (s)
algorithm Pe(d,t)

n Figure 2. Filtering process and interaction between the monitoring and measurement module (MMM) and the route control module
(RCM) of a sociable route controller. The Randomized SRC Algorithm within the RCM is outlined in Algorithm 1.

After extensive evaluations and analysis, we confirmed that rently are experiencing in the network. These medians are
PRC performs much better than RRC. The reason for this is precisely the input to the second filter, where the social
that proactive approaches can anticipate network congestion nature of the route control algorithm covers two different
situations, which in the reactive case, typically demands sever- facets:
al traffic relocations when congestion already was reached. In • CP
addition, we found that in a competitive environment, CP- • SRC
based route control strategies can outperform the FP ones.
Therefore, our SRC algorithm (outlined in the following sec- Controlled Proactivity
tion) is supported by a CP-based route control strategy. On the one hand, the proactivity of box C is controlled to
avoid minor changes in the medians triggering traffic reloca-
tions at S. This prevents interfering too often with other route
Sociable Route Control controllers. For this reason, our sociable controllers filter the
In the SRC strategy that we conceive, each controller remains medians.
independent so the SRC boxes do not require any kind of The second filter in Fig. 2 works like an analog-to-digital
coordination with one another — just as conventional IRC (A/D) converter, with quantization step ∆, and its output is
systems operate today. Moreover, our SRC strategy does not one of the levels of the converter Qe(d,t). The right-hand side
introduce changes in the way measurements are conducted of Fig. 2 illustrates how the instantaneous samples of RTT are
and reported by conventional IRC systems, so both the MMM filtered to obtain the median Me(d,t), and then, the latter is fil-
and the RVM in Fig. 1 remain unmodified. Our SRC strategy tered to obtain Qe(d,t).
introduces changes only on the algorithmic aspects of the As described earlier, IRC systems compare the quality of
RCM. the active and alternative paths by means of a performance
function Pe(d,t), which as shown in Fig. 2, is fed by Qe(d,t). The
High-Level Description of the SRC Strategy controller C would switch traffic toward d only when the vari-
(d,t)
For simplicity in the exposition, we focus on the optimization ations of Qe(d,t) cause that Pebest = Pe(d,t) ≥ Rth. A more detailed
of a single application, namely, voice over IP (VoIP), and we description of the route selection process is shown in Algo-
describe the overall SRC process for the round-trip time rithm 1. For simplicity, only the stationary operation of the
(RTT) performance metric. For a comprehensive and formal algorithm is summarized. The randomized nature of Algo-
analysis, the reader is referred to [12]. rithm 1 is discussed later. The timer in Step 8 is also intro-
Our goal is that a controller C becomes capable of adap- duced later.
tively adjusting its proactivity, depending on the RTT condi- For the RCM described here, we simply used the outcome
tions for each target destination d. To be precise, a sociable of the digital conversion as the performance function Pe(d,t),
controller analyzes the evolution of the RTT, that is, that is, the number of quantization steps in the quantification
{RTTe(d,t)}, and depending on its dynamics, the controller can level Q e(d,t) . Similarly, R th represents the difference in the
restrain its traffic reassignments adaptively (i.e., its proactivi- number of quantization steps that Pe(d,t) must reach to trigger
ty). To this end, the RCM processes the RTT samples gath- a path switch.
ered from the MMM using two filters in cascade (Fig. 2). The Overall, the advantage of this filtering technique is that it
first filter corresponds to the median RTT, M e(d,t), which is produces the desired effect (i.e., controlled proactivity)
constantly computed through a sliding window. This approach because it prevents minor changes in the medians from trig-
is used widely in practice because the median represents a gering unnecessary traffic relocations at S.
good estimator of the delay that the users’ applications cur-
Socialized Route Control
The second facet of the social behavior of the algorithm
5 We notice that with CP, ebest might be different from e′. relates to the dynamics of the median RTTs; more precisely,

IEEE Network • September/October 2008 59


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 60

Input: d – A target destination of network S


{e} – Set of egress links of network S
Pe(d,t) – Performance function to reach d through e at time t On the other hand, multiple performance functions Pe(d,t)
can be used (e.g., one for each metric), and the selection of
Output: ebest – The best egress link to reach target destination d the best path for each target destination can be performed by
(d,t) sequentially comparing the performance functions Pe(d,t) and
1: Wait for changes in Pebest tie-breaking similarly to the BGP tie-breaking rules [7]. With
(d,t)
2: ifPebest – Pe(d,t) < Rth ∀e ≠ ebest then go to Step 1 this approach, the order in which the performance functions
3: /* Egress link selection process for d */ are compared can be tuned on an application basis. For exam-
ple, a controller might select the path with the maximum AB,
4: Choose e′ as Pe′(d,t) = min{Pe(d,t)}
and if there is more than one path with the same AB, choose
5: Estimate the performance after switching the traffic the one with the lowest RTT.
(d,t)
6: if Pebest – Pe′(d,t)Estimate ≥ Rth then In either case, adaptive filtering techniques are required to
7: Wait until TH =0 /* Hysteresis Switching Timer */ prevent rapid variations in the performance metrics consid-
8: Switch traffic toward d from ebest to e′ ered.
9: ebest ← e′ Randomization
(d,t)
10: Pebest ← Pe′(d,t) Randomization is present in Algorithm 1 in two different
11: end if ways: implicitly and explicitly. On the one hand, the route
12: /* End of egress link selection process for d */ control decisions in Algorithm 1 are inherently stochastic for a
13: Go to Step 1 number of reasons, for example, due to its adaptive features
along time, the fact that different controllers might have con-
n Algorithm 1. Randomized SRC algorithm. figured different thresholds R th , and others. On the other
hand, we explicitly use a hysteresis switching timer TH that we
introduced in a previous work [13] and that guarantees a ran-
dom hysteresis period after each traffic relocation. More pre-
with how rapid the variations are in the median values that cisely, traffic toward a given destination d cannot be relocated
are typically computed by IRC systems using a sliding window. until the random and decreasing timer T H = 0. A similar
The motivation for this is that when the median values start to approach was used in [5] for one of the randomized algo-
show rather quick variations, the algorithm must react so as to rithms presented there.
avoid a large number of traffic reassignments in a short
timescale. Such RTT dynamics typically occur when several
route controllers compete for the same resources, leading to
Performance Evaluation
situations where their traffic reassignments interfere with each The performance of our SRC strategy is compared against
other. To cope with this problem, we turn the second filter in that obtained with:
Fig. 2 into an adaptive filter. This filter is endowed with an • Randomized IRC
adaptive quantization step ∆(d,t) for each target destination d • Default IGP/BGP routing
that is automatically adjusted by the algorithm according to
the evolution of the median RTTs. If the RTT conditions are Evaluation Methodology and Simulation Set Up
smooth, the quantization step is small, and more proactivity is The simulation tests were performed using the event-driven
allowed by the controller C . However, if the RTT conditions simulator J-Sim [14]. All the functionalities of the route con-
could lead to instability, the quantization step ∆(d,t) automati- trollers were developed on top of the IGP/BGP implementa-
cally increases, so the number of changes in the values of tions available in this platform.
Qe(d,t) is diminished or even stopped until the network condi-
tions become smooth again. This has the effect of desynchro- Network Topology — The network topology was built using
nizing only the competing route controllers. Therefore, the the Boston University Representative Internet Topology
filtering technique outlined here allows a controller C to gEnerator (BRITE) [15]. The topology was generated using
“sociably” decide whether to switch traffic to an alternative the Waxman model with (α, β) set to (0.15, 0.2) [16], and it
egress link or not, in the sense that the degree of proactivity was composed of 100 domains with a ratio of domains to
of C is constantly adjusted by the adaptive nature of the sec- inter-domain links of 1:3. This simulated network aims at
ond filter. representing a set of ISPs that can provide connectivity and
For the sake of simplicity, we focused here on the optimiza- reachability to customers operating stub networks. We
tion of a single performance metric (the RTT), but the con- assume that all ISPs operate points of presence (PoPs)
cept of SRC is general and can be extended to consider other through which the stub networks are connected. We consid-
metrics, such as available bandwidth, packet losses, and jitter. ered 12 uniformly distributed stub networks across the
When multiple metrics are used, two straightforward domain-level topology as the traffic sources toward the set
approaches can be followed. of target destinations. These source networks are connected
On the one hand, a combination of two or more metrics to the routers located at the PoPs of three different ISPs.
can be used in the same performance function P e(d,t) . For We considered triple-homed stub networks given that signif-
instance, [12] introduces a more general performance function icant performance improvements are not expected from
based on a non-linear combination of the quantification level higher degrees of multihoming [1]. For the stub networks
Qe(d,t) and the available bandwidth (AB) in the egress links of containing target destinations, we considered 25 uniformly
the source network. This, in turn, can be extended to consider distributed destinations across the domain-level topology.
the AB along the entire path to a target destination d, using This offers an emulation of 12 × 25 = 300 IRC flows com-
available bandwidth estimation techniques like the one peting for the same network resources during the simulation
described in [5]. With this approach, the weights of the differ- run time.
ent metrics combined in Pe(d,t) can be tuned on an application Furthermore, given that IRC solutions operate in short
basis, for example, to prioritize the role of the AB over the timescales, we assumed that the domain-level topology
RTTs (or vice versa) depending on the application type. remains invariant during the simulation run time.

60 IEEE Network • September/October 2008


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 61

12000 12000
SRC SRC SRC
4000 Randomized IRC Randomized IRC Randomized IRC
10000 10000

3000 8000 8000


Path switches

Path switches

Path switches
6000 6000
2000
4000 4000
1000
2000 2000

0 0 0
2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68
1 10 100 1 10 100 1 10 100
Rth Rth Rth

140 140 140


SRC SRC
Randomized IRC 120 Randomized IRC 120
120 IGP/BGP routing IGP/BGP routing
100 100
100
RTTs (ms)

RTTs (ms)

RTTs (ms)
80 80
80 60 60

60 40 40
SRC
20 20 Randomized IRC
40 IGP/BGP routing
0 0
2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68 2 4 68
1 10 100 1 10 100 1 10 100
Rth Rth Rth

n Figure 3. Number of path switches (top) and <RTTs> (bottom) for L = 0.450 (left), L = 0.675 (center), and L = 0.900 (right).

Simulation Scenarios — We run the same simulations sepa- are assigned with three outgoing flows (including those in the
rately using three different scenarios: multihomed stub domains and those in the ISPs). All back-
• Default IGP/BGP routing, where BGP routers choose their ground connections were active during the simulation run
best routes based on the shortest AS-path time.
• BGP combined with the SRC strategy at the 12 source Furthermore, the frequency and size of the probes sent by
domains the route controllers were correlated with the outbound traffic
• BGP combined with randomized IRC systems at the 12 being controlled, just as conventional route controllers do
source domains today [2–4].
For a more comprehensive comparison between the differ- Finally, we assume that the route controllers have pre-
ent route control strategies, we performed the simulations for established performance bounds (i.e., the MTTs) for the traf-
three different network loads. We considered the following fic under control. For instance, the recommendation G.114 of
load factors (L): the International Telecommunication Union-Telecommunica-
• L = 0.450, low load corresponding to an average occupancy tion Standardization Sector-(ITU-T) suggests a one-way-delay
of 45 percent of the egress links capacity (OWD) bound of 150 milliseconds to maintain a high quality
• L = 0.675, medium load corresponding to an average occu- VoIP communication over the Internet. Thus, for VoIP traf-
pancy of 67.5 percent of the egress links capacity fic, the maximum RTT tolerated was chosen as twice this
• L = 0.900, high load corresponding to an average occupan- OWD bound, that is, 300 ms.
cy of 90 percent of the egress links capacity
Objectives of the Performance Evaluation
Simulation Conditions — The simulation tests were conducted Our evaluations have two main objectives.
using traffic aggregates sent from the source domains to each
target destination d. These traffic aggregates were composed Assess the Number of Path Switches — The first objective of
of a variable number of multiplexed Pareto flows as a way to the simulation study is to demonstrate that the sociable nature
generate the traffic demands, as well as to control the network of our SRC strategy contributes to drastically reducing the
load during the tests. The flow arrivals were modeled accord- potential interference between competing route controllers.
ing to a Poisson process and were independently and uniform- To this end, we compared the number of path switches that
ly distributed during the simulation run time. This approach occurred during the simulation run time for the 300 compet-
aims at generating sufficient traffic variability to support the ing IRC flows for the SRC and randomized IRC scenarios.
assessment of the different route control strategies. The number of path switches is obtained by adding the num-
In addition, we used the following method to generate traf- ber of route changes that are required to meet the desired
fic demands for the remaining Internet traffic, usually referred RTT bound for each target destination d.
to as background traffic. We started by randomly picking four It is worth emphasizing that in both the randomized IRC
nodes in the network. The first one chosen acts as the origin and SRC strategies, the route controllers operate indepen-
(O) node, and the remaining three nodes act as destinations dently and compete for the same network resources. This
(D) of the background traffic. We assigned one Pareto flow allows us to evaluate the overall impact on the traffic caused
for each O-D pair. This process continues until all the nodes by the interference between several standalone route con-

IEEE Network • September/October 2008 61


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 62

1 1 1
0.9 SRC SRC SRC
Randomized IRC 0.9 Randomized IRC 0.9 Randomized IRC
0.8 IGP/BGP routing 0.8 IGP/BGP routing 0.8 IGP/BGP routing
0.7 0.7 0.7
P(RTT>=x)

P(RTT>=x)

P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)

n Figure 4. Complementary cumulative distribution function (CCDF) of the RTTs for the 300 competing IRC flows, for Rth = 1, and
for L = 0.450 (left), L = 0.675 (center), and L = 0.900 (right).

trollers running at different stub domains. Thus, when analyz- The second observation is that the reductions in the num-
ing the results for the different route control strategies, it is ber of path switches offered by the SRC strategy become
important to keep in mind that we take into account all the more and more evident as the proactivity of the controllers
competing route controllers present in the network. increases, that is, for low values of Rth, which is precisely the
To contrast the number of path switches under fair condi- region where IRC solutions operate today. It is worth recall-
tions, we made the following decisions. First, both the ran- ing that these results were obtained when both route control
domized IRC and SRC controllers are endowed with the strategies were complemented by the same randomized deci-
same (explicit) randomization technique [5, 13]. This approach sions. This confirms that in a competitive environment, SRC
avoids the appearance of persistent oscillations that might is much more effective than pure randomization in reducing
lead to a large number of path switches in the case of conven- the potential interference between route controllers.
tional IRC [5]. Second, both types of controllers follow a con- On the other hand, our results show that when the route
trolled proactivity approach. We have conducted the control strategies become less proactive, that is, for higher val-
simulations modeling the same triggering condition R th for ues of Rth, randomized IRC and SRC tend to behave compar-
both of them. The main difference is that in the SRC case, atively the same so SRC does not introduce any benefit over a
the social adaptability of the controllers can result in the trig- randomized IRC technique.
ger being reached more often, or less often, depending on the To assess the effectiveness of SRC, it is mandatory to con-
variability of the RTTs on the network. firm that the reductions obtained in the number of path
switches are not excessive, resulting in a negative impact on
End-to-End Traffic Performance — The second objective of the the end-to-end traffic performance. To this end, we first ana-
simulation study is to demonstrate that the drastic reduction lyze the performance of randomized IRC and our SRC “glob-
in the number of path switches obtained with our SRC strate- ally,” that is, by averaging the RTTs obtained by “all”
gy can be achieved without penalizing the end-to-end traffic competing route controllers. This is shown at the bottom of
performance. To this end, we compared the RTTs obtained Fig. 3 and in Fig. 4. The end-to-end performance obtained by
for the 300 flows in the three different scenarios, namely, “each” route controller individually, is shown in Fig. 5.
default IGP/BGP, SRC, and randomized IRC. The bottom of Fig. 3 reveals that as expected, both SRC
and randomized IRC perform much better than IGP/BGP for
Main Results all values of L and Rth, and the improvements in the achieved
The top of Fig. 3 illustrates the total number of path switches performance become more evident as the network utilization
performed by both the randomized IRC and SRC strategies, increases. In particular, SRC is capable of improving the
in all the stub networks, and for the three different load fac- 〈RTTs〉7 by more than 40 percent for L = 0.675 and by more
tors: L = 0.450 (left), L = 0.675 (center), and L = 0.900 than 35 percent for L = 0.900 when compared with IGP/BGP.
(right). The number of path switches is contrasted for differ- Moreover, the 〈RTTs〉 obtained by SRC and IRC are compar-
ent triggering conditions, that is, for different values of the atively the same and particularly for L = 0.675, SRC not only
threshold Rth (shown on a logarithmic scale). drastically reduces the number of path switches, but also
Several conclusions can be drawn from the results shown in improves the end-to-end performance for almost all the trigger-
Fig. 3. In the first place, the results confirm that SRC drasti- ing conditions assessed. It is worth emphasizing that a low value
cally reduces the number of path switches compared to a ran- of Rth together with a load factor of L = 0.675 reasonably reflect
domized IRC technique. 6 An important result is that the the conditions in which IRC currently operates in the Internet.
reductions are significant for all the load factors assessed. For Our results also reveal an important aspect: by allowing
instance, when compared with randomized IRC, our SRC more path switches, some route controllers can improve
strategy contributes to reductions of up to: slightly their end-to-end performance, but such actions have
• 77 percent for Rth = 1 and 71 percent for Rth = 2 when L = no major effect on the overall 〈RTTs〉. Indeed, a certain num-
0.450 ber of path switches is always required, and this number of
• 75 percent for Rth = 1 and 74 percent for Rth = 2 when L = path switches is what actually ensures the average perfor-
0.675 mance observed in the RTTs at the bottom of the Fig. 3 (this
• 34 percent for Rth = 1 and 36 percent for Rth = 2 when L = becomes clear as the proactivity decreases).
0.900 By analyzing Fig. 3 as a whole, it becomes evident that the

6Clearly, no results are shown for the default IGP/BGP routing scenario 7As mentioned previously, this average is computed over the RTTs
here because BGP does not perform path switching actively. obtained by all competing route controllers in the network.

62 IEEE Network • September/October 2008


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 63

1 1 1
0.9 IGP/BGP routing 0.9 IGP/BGP routing 0.9 IGP/BGP routing
0.8 0.8 0.8
0.7 0.7 0.7
P(RTT>=x)

P(RTT>=x)

P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)

1 1 1
0.9 SRC 0.9 SRC 0.9 SRC
0.8 0.8 0.8
0.7 0.7 0.7
P(RTT>=x)

P(RTT>=x)

P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)

1 1 1
0.9 Randomized IRC 0.9 Randomized IRC 0.9 Randomized IRC
0.8 0.8 0.8
0.7 0.7 0.7
P(RTT>=x)
P(RTT>=x)

P(RTT>=x)
0.6 0.6 0.6
0.5 0.5 0.5
0.4 0.4 0.4
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
30 40 50 60 70 80 90 100 110 120 50 100 150 200 250 300 50 100 150 200 250 300 350
RTTs (ms) RTTs (ms) RTTs (ms)

n Figure 5. CCDFs for IGP/BGP routing (top), SRC (center), and randomized IRC (bottom), for L = 0.450 (left), L = 0.675 (center),
and L = 0.900 (right).

selection of the best triggering condition actually depends on picture than Fig. 4 because it shows the CCDFs of the RTTs
the load present in the network. For this particular case, the obtained by each of the 12 competing route controllers. The
best trade-offs are Rth = 30 for L = 0.450, Rth = 10 for L = figure shows the results for the three studied scenarios and for
0.675, and Rth = 7 for L = 0.900, which is a reasonable pro- all the load factors assessed when Rth = 1. Our results show
gression to lower values of Rth because the route controllers that the targeted bound of 300 ms is satisfied by both SRC
require less proactivity when the network utilization is low. and randomized IRC in all cases and for all controllers.
The corollary is that the triggering condition should be adap- IGP/BGP, however, shows a distribution of large delays given
tively adjusted as well, depending on the amount of traffic that the shortest AS-paths are not necessarily the best per-
carried through the egress links of the domain. We plan to forming paths. Figure 5 also shows that when considering
investigate this in the future. boxes individually, randomized IRC achieves slightly better
Figure 4 compares the distribution of the RTTs obtained by end-to-end performance for some of them but at the price of
IGP/BGP, SRC, and randomized IRC for the 300 competing a much larger number of path switches:
IRC flows, for the three different load factors assessed, and • ≈ ≈ 435 percent larger for L = 0.450
for Rth = 1, which as mentioned above is in the range of oper- • ≈ ≈ 400 percent larger for L = 0.675
ation of the IRC solutions presently deployed in the Internet. • ≈ ≈ 80 percent larger for L = 0.900 when Rth = 1.
To facilitate the interpretation of the results, we use the com-
plementary cumulative distribution function (CCDF).
An important observation is that under high egress link uti-
Conclusion
lization, that is, L = 0.900, there is a fraction of 〈RTTs〉 for In this article, we examined the strengths and weaknesses of
which the bound of 300 ms is exceeded in the case of randomized IRC techniques in a competitive environment.
IGP/BGP; whereas both SRC and the randomized IRC fulfill We proposed a way to blend randomization with a sociable
the targeted bound. route control (SRC) strategy, where by sociable, we mean a
To complete the analysis, Fig. 5 provides a more granular route control strategy that explicitly considers the potential

IEEE Network • September/October 2008 63


YANNUZZI LAYOUT 9/9/08 12:35 PM Page 64

[11] A. Akella, S. Seshan, and A. Shaikh, “Multihoming Performance Benefits:


implications of its decisions in the performance of the net- An Experimental Evaluation of Practical Enterprise Strategies,” USENIX
work and with the ability to adaptively restrain its intrinsic Annual Technical Conf., Boston, MA, June 2004.
selfishness depending on the network conditions. We have [12] M. Yannuzzi, “Strategies for Internet Route Control: Past, Present, and
shown that in a competitive scenario, our SRC strategy is Future,” Ph.D. diss., Tech. Unive. of Catalonia, Barcelona, Spain, 2007.
[13] M. Yannuzzi et al., “A Proposal for Inter-Domain QoS Routing Based on
capable of drastically reducing the potential interference Distributed Overlay Entities and QBGP,” Proc. QoFIS ’04, LNCS 3266,
between controllers without penalizing the end-to-end traffic Barcelona, Spain, Oct. 2004.
performance. This makes SRC more scalable and promising [14] J-Sim homepage; http://www.j-sim.org
than pure randomization, given the proliferation of IRC sys- [15] A. Medina et al., “BRITE: An Approach to Universal Topology Generation,”
Proc. MASCOTS, Aug. 2001.
tems in the Internet. [16] B. Waxman, “Routing of Multipoint Connections,” IEEE JSAC, Dec. 1988.
SRC strategies, like the one described in this article, also
have a number of practical advantages; for example, they do Biographies
not require any kind of coordination between the competing MARCELO YANNUZZI (yannuzzi@ac.upc.edu) received a degree in electrical engi-
IRC boxes; and they can be supported by a lightweight soft- neering from the University of the Republic (UdelaR), Uruguay, in 2001, and
ware implementation based on well-known filtering tech- DEA (M.Sc.) and Ph.D. degrees in computer science from the Department of
Computer Architecture, Technical University of Catalonia (UPC), Spain, in 2005
niques, with no additional requirements to be adopted other and 2007, respectively. He is with the Advanced Network Architectures Lab at
than a software upgrade of existing IRC systems. UPC, where he is an assistant professor. He held previous positions with the
Among the open issues in the area, the most important Physics Department of the School of Engineering, UdelaR, from 1997 to 2003,
is the lack of a stochastic model characterizing the distri- and with the Electrical Engineering Department of the same university from 2003
until 2006. He worked in industry for 10 years at the national telco in Uruguay
bution of path switches in a competitive environment. (1993–2003).
Studies like [5] have shown that randomized techniques
are effective in desynchronizing some route controllers XAVI MASIP-BRUIN (xmasip@ac.upc.edu) received M.S. and Ph.D. degrees from
when their measurement windows are sufficiently over- UPC, both in telecommunications engineering, in 1997 and 2003, respectively.
He is currently an associate professor of computer science at UPC. His current
lapped; however, they cannot guarantee stability. Only research interests are in broadband communications, QoS management and
after characterizing the distribution of path switches will it provision, and traffic engineering. His publications include around 60 papers in
be possible to formally study the local and global stability national and international refereed journals and conferences. Since 2000 he has
aspects of competitive IRC. Furthermore, the proposals participated in many research projects: IST projects E-NEXT, NOBEL, and
EuQoS; and Spanish research projects SABA, SABA2, SAM, and TRIPODE.
and results described here apply to the optimization of
VoIP traffic, but the conception of blending randomization EVA MARIN-TORDERA (eva@ac.upc.edu) received M.S. degrees in physics in 1993
with an SRC strategy is general in scope so our work can and electronic engineering in 1998, both from Barcelona University, and a Ph.D.
be extended to control other kinds of traffic flows concur- from UPC in 2007, where she works as an assistant professor. She has pub-
lished many papers in national and international conferences. Her main interests
rently, as well as consider other performance metrics focus on QoS provisioning and optical networks. She is now actively participat-
besides the RTT. ing in the BONE and DICONET international projects, and in the national pro-
ject CATARO.
References
[1] A. Akella et al., “A Measurement-Based Analysis of Multihoming,” Proc. JORDI DOMINGO-PASCUAL (jordid@ac.upc.edu) is a full professor of computer sci-
ACM SIGCOMM, Karlsruhe, Germany, Aug. 2003. ence and communications at UPC. He is co-founder of and a researcher at the
[2] Avaya, Inc., “Converged Network Analyzer.” Advanced Broadband Communications Center (CCABA) of the university. His
[3] Cisco Systems, Inc., “Optimized Edge Routing.” research topics are broadband communications and applications, IP/ATM inte-
[4] Internap Networks, Inc., “Flow Control Platform.” gration, QoS management and provision, traffic engineering, IP traffic analysis
[5] R. Gao, C. Dovrolis, and E. W. Zegura, “Avoiding Oscillations Due to Intelli- and characterization, and QoS measurements.
gent Route Control Systems,” Proc. IEEE INFOCOM 2006, Barcelona, Spain,
Apr. 2006. ALEXANDRE FONTE (afonte@dei.uc.pt) graduated in electrical engineering from the
[6] C. Labovitz et al., “Delayed Internet Routing Convergence,” Proc. ACM SIG- University of Coimbra, Portugal, in 1995, and received his M.Sc. degree in elec-
COMM, Stockholm, Sweden, Aug. 2000. tronic and telecommunications engineering (distributed systems specialty) from
[7] M. Yannuzzi, X. Masip-Bruin, and O. Bonaventure, “Open Issues in Interdo- the University of Aveiro, Portugal, in 2000. He is currently a Ph.D. student in
main Routing: A Survey,” IEEE Network, vol. 19, no. 6, Nov.–Dec. 2005, computer engineering at the Department of Informatics Engineering, University of
pp. 49–56. Coimbra. His Ph.D. research activity is focused on interdomain quality of service
[8] A. Akella et al., “A Comparison of Overlay Routing and Multihoming Route routing and traffic engineering in IP networks.
Control,” Proc. ACM SIGCOMM, Portland, OR, Aug. 2004.
[9] D. K. Goldenberg et al., “Optimizing Cost and Performance for Multihom- EDMUNDO MONTEIRO (edmundo@dei.uc.pt) is an associate professor at the Univer-
ing,” Proc. ACM SIGCOMM, Portland, OR, Aug. 2004. sity of Coimbra, Portugal, from which he graduated in 1984 and received a
[10] F. Guo et al., “Experiences in Building a Multihoming Load Balancing Sys- Ph.D. in electrical engineering (computer specialty) in 1995. His research interests
tem,” Proc. IEEE INFOCOM ’04, Hong Kong, China, Mar. 2004. are computer communications, QoS, mobility, routing, resilience, and security.

64 IEEE Network • September/October 2008

You might also like