You are on page 1of 16

1

Architecture for Network Processing Applications

SEAH JIA CHEN

Research Proposal

Universiti Teknologi Malaysia

2
TABLE OF CONTENTS

CHAPTER

TITLE

PAGE

INTRODUCTION
1.1 Background of the Problem
1.2 Statement of the Problem
1.3 Objectives of the Study
1.4 Scope of the Study
1.5 Significance of the Study

1
1
2
2
3
3

LITERATURE REVIEW
2.1 Introduction
2.2 Complications in Packet Processors
2.3 Elements of a network switch
2.4 Literature 4

5
5
5
6
7

RESEARCH METHODOLOGY
3.1 Proposed Fast Path / Slow Path Platform
3.2 Design Tools and Environment
3.6 Research Planning and Schedule (Gantt Chart)

8
8
9
10

EXPECTED FINDINGS
4.1 Expected Findings

11
11

REFERENCES

12-13

CHAPTER 1

INTRODUCTION

1.1

Background of the Problem


Modern computer communication networks, require sophisticated packet

processing for security and quality of service (QoS) thus giving rise to the
deployment of software defined networking [1], middleboxes [2], software routers
[3] as well as hardware accelerated packet processing [4], Combined efforts of all
these network appliances altogether offers promising approaches to improve network
extensibility and scalability. To date, efforts have been almost exclusively focused on
traditional layer 2 and 3 functions such as forwarding and routing [5]. While these
fundamental tasks merit attention, it can be argued that the focus overlooks an
important factor in how network deployments evolve in response to changing
applications, workload and policy requirement.
Most of the earliest network security solutions attempt to secure internet users
by running anti-virus software and firewalls at end nodes. Unfortunately, these
solutions must be widely deployed within the network to protect against network
attacks. The problem with having an end node protection is that it does very little to
mitigate against bandwidth or Distributed Denial of Service (DDoS) attacks that can
be blocked with end node protection, but uses so much of the network bandwidth that
the network eventually would become unusable [6]. This argues for a cause for innetwork protection, specifically middleboxes. This was backed by recent surveys that
shows that the number of middleboxes in industries is on par with the number of
routers and switches [5]. Other than for in-network protection, modern networks
today deploy a wide range of network applications such as WAN optimizers and

4
proxies in the middlebox. In other words middlebox is already a critical part of our
network, and this is expected to remain so for a forseeable future, justifying a
requirement for innovation in middleboxes
Recent innovations however, have blurred the roles of middleboxes, with more and
more middlebox functions being incorporated into hardware as well as software
routers [7, 8]. The issue with incorporating middlebox functions into routers is that
routers have been traditionally implemented purely in software. The idea of a pure
software router was not feasible for high speed links as they are limited by the clock
speed of a general purpose processor. In order to accommodate higher packet rates, a
high speed interconnection mechanism was designed to provide a fast alternate data
path to move data among interfaces [9], while only leaving the standard CPU to
process exceptions and control related duties which gave birth to second generation
of network processors[10]. This particular innovation helps network processing to
continue to scale for the past few decades, motivating research to provide a good
framework for packet processing on middleboxes.

1.2

Statement of the Problem


Extant works shows that remote dynamic reconfiguration is able to cope with

rapid FPGA based system-level functional changes. The idea of using partial
reconfiguration to enhance the flexibility and extensibility has recently gained
momentum [11, 12, 13]. However, these efforts have been hampered in the domain
of network protection as partial reconfiguration often takes a long time to be
completed [14, 15]. Unless there is a reroute or there exist an alternate processing
path, a huge buffer would also be required to act as a temporary storage while
waiting for the middlebox to reconfigure. This indicates a scalability problem of
extending idea of partial reconfiguration into the terabit network processing,
requiring an alternative methodology to reintegrate dynamic reconfigurability into
network processing.

1.3

Objectives of the Study


Based on the background, the purpose of the thesis is to design as well as
implement the proposed fast path and slow path paradigm. The objectives of
this research work are:
1. To design and implement fast path and slow path processing on
NetFPGA. The developed platform would leave integration of an external
host to be optional, so that the design would be able to function as a
standalone device.
2. To design and implement a fast reconfigurable middlebox for
network processing.

1.4

Scope of the Study


Based on the search objectives and the resources made available, the scope of

this research are stated as below:


1. The design of the fast path / slow path excludes the integration of a GPP.
The software integration is not part of the scope as it is optional, and this feature can
easily be extended to the requiring platform as part of the slow path processing
whenever the application sees fit.
2. The remote dynamically reconfigurable platform would exclude the
investigation of an authenticating mechanism. The authentication mechanism itself is
an optional feature that would be briefly described in chapter 5 and would be a
worthy problem that will be left as future works. 4 3. Case study covered for the fast
path / slow path processing will only be stateless. However other application can still
be built with the proposed method as the fast path / slow path paradigm itself is
promotes extensibility

1.5

Significance of the Study

The first contribution of this thesis is proposal of the architecture for fast path
/ slow path co-design paradigm on NetFPGA. The architecture would allow for a
higher degree of parallelism and flexibility which will help in scaling packet
processing without an external compute unit or personal computer (PC). The second
contribution is the architecture to allow for remote dynamic reconfigurable firewall
on the NetFPGA platform. The proposed architecture would be able to reconfigure
the firewall as quick as 10 cycles as soon as the packet is received in the updating
mechanism, which allows faster adaptation to network attacks. The combination of
both contributions results in a better design which in turn hopes to continue the
scaling of packet processing in the terabit era.

CHAPTER 2

LITERATURE REVIEW

2.1

Introduction
This chapter focuses on the theoretical background as well as grounds that

have been covered in related works. This section begins by describing the
complications in packet processors, followed by providing coverage on the NetFPGA
platform, introduction to the fast path / slow path design paradigm and dynamic
reconfigurable architectures.
2.2

Complications in Packet Processors


Network processing can basically be divided into 3 categories [19]:
1. Basic: These operations are usually compulsory services that must be
performed in office/home network equipments. Some of the operations
include switching, packet routing, and MAC address learning.
2. Byte manipulation: These are operations that transform a certain or most
of the bytes of the packets such as compression [20] and cryptography
[21]. The commonality shared between different byte manipulation
operations is that these operations often repetitive and has a certain
pattern such as being iterative or 6 recursive, which is also the reason that
these are also operations that would benefit the most from hardware
accelerators.

8
3. Control-flow intensive: Modern network processing task goes far
beyond routing with network applications such as storage virtualization,
load balancing, and deep packet inspection. These are operations that
reach deeper into payload of the packets [22] to look through Layer 2 to
Layer 7 or at least to Layer 4 of the OSI model [23] in order to perform
content-based routing [24], access control, or bandwidth allocation. These
operations also include operations to perform sophisticated network
protection such as firewall, intrusion detection and bandwidth
management systems that will need to recognize applications and scan for
known malicious patterns and identify incoming network attacks.
However, different control-flow intensive processing functions consume
vastly different amounts of these resources. Works have shown that
intrusion detection functionality is often bounded by processing speed
[25], while software routers bottlenecks on memory bandwidth when
processing small packets [26] while forwrading of large packets with
small headers are often limited by the link bandwidth
2.3

Elements of a Network Switch


This section discuss on how the netfpga relates to a network switch. From a

functional perspective, a network switch can be logically viewed as a collection of


modules whereby each module implements a set of related function to perform
packet forwarding. A switch consists of several functional module, which is [32]:
1. Network Interfaces: The network interface contains several ports that
provide connectivity to the physical layer. When a packet arrives, it decodes
the electrical signals into 1s and 0s bit and frames them into a layer 2
packet. On the NetFPGA the network interface consists of two logical
components, the first which is the MAC interface that converts the SGMII
Signals and the second a bus adapter which converts the data stream to
comply with on chip bus protocol for processing by other functional units.
2. Forward Engines: The forwarding engine is responsible for deciding to
which network interface the incoming packet should be forwarded to.
Whenever a packet reaches the network interface, the port decapsulates the

9
Layer 2 headers and sends the entire IP packet or the IP header to the
forwarding engine. The forwarding engine in turn consults the forwarding
table, which is actually a MAC address to port mapping to determine which
network interface the packet should be forward to. Other functions of the
forward engine includes decrement of the time to live (TTL) and updating the
checksum. This functionality in implemented in the output port lookup
module in the NetFPGA.
3. Queue Manager: This component provides a buffer which acts as a
temporary storage for packets when an incoming or outgoing queue is
overloaded. If the buffer queues overflows, the queue manager would
selectively drop packets. The queue manager is not implemented on any of
the NetFPGA reference designs, however, if the input or output buffers are
filled, any incoming packets would be dropped.

2.4

The Fast Path / Slow Path Paradigm

This part describes the evolution of the fast path / slow path in packet processing.
Conventional Fast Path and Slow Path
In order to overcome the limitations of a software router, a high speed
interconnect was introduced to provide an alternate processing path to process time
critical packets, while non-time-critical operations are continued to be processed in
software. The time critical operations are operations that affects the majority of the
packets that if affected would reduce the overall forwarding rates. Non-time critical
packet are packets that are usually packets that are used for maintenance and
management.
1. Roles of fast path:

Examples of fast path processing are header

processing, packet forwarding (Unicast, Multicast and Broadcast), packet


classification and packet scheduling. These processes are usually implemented in
custom ASICs even while such implementations are less flexible as there have been
only relatively small changes in the IP packet format.

10
2. Roles of slow path: Examples of slow path processing includes address
resolution protocol (ARP) processing, fragmentation and reassembly and advanced
IP processing such as route recording, time stamping and ICMP error generation. The
main reasons for implementing such function in slow path is that the packets
requiring these functions are rare and exceptions that seldom occurs
Hardware / software co-design has gained so much momentum in the
research field throughout the last decade that it has moved to become a mainstream
technology [34]. By now it is understood that timing problems with simple, repetitive
and recursive processes such as digital signal processing, encryption, matrix
operations, multimedia coding and decoding are best to be solved with hardware
based implementation [35] while variable and complex computations such as
graphical user interface display, and operations that require user input processing are
better off with software processing. It can be observed that while different tasks
require different solutions, these parts are not clearly disjointed. A versatile and
effective system design would use both hardware and software to achieve a better
performance and flexibility.
The hardware / software co-design paradigm looks to achieve better
performance and time to market through synergy of both partitions. The fast path /
slow path have also looked to achieve similar synergy and can be argued to be a
flavor of hardware / software co-design. Recent works by [36] has redefined the roles
of fast path and slow path processing. The fast path works as a high performance
flowthrough coprocessor for the slow path. This is done by having to fast path to
forwards the packet or a copy of packet into slow path for critical packet processing.
In IXP 2XXX series [37], the fast path consists of a programmable microengine
while the slow path provides execution instruction and reprograms the microengine
acting as a control plane

11

CHAPTER 3

RESEARCH METHODOLOGY

3.1

Proposed Fast Path / Slow Path Platform


In order to achieve our research objectives, we would propose two types of

fast path / slow path processing paradigm.


Type 1: The fast path forwards a copy or manipulated packet
(features/signatures) into the slow path for processing. With this, the slow path which
is usually complex can have the luxury of working on a higher level of abstraction.
Type 1 fast path / slow path network processing looks increase efficiency by using a
custom hardware solution to preprosess packets before sending it to the slow path for
further processing.
Type 2: The slow path acts as a co-processing engine or an offload engine to
assist the fast path. This paradigm looks to accelerate the computation through
parallel processing with additional logic resources. Besides that, by having an
alternate processing path, this opens up to opportunity such as hardware context
switching and multiple context processing.

3.2

Design Tools and Environment


This section describes the design environment and various design tools used

for the project development. 1. ModelSim: ModelSim [47] is a software tool that is
commonly used for hardware design simulation and verification. ModelSim provides
a simulation environment for hardware designs descibed in Verilog, VHDL, SystemC
or SystemVerilog. Output of the simulation results can be shown in waveform

12
viewer, or a text output. 2. Python: Python [48] is an open sourced programming
language which has a wealth of library supports. Most importantly python is
developed under an OSI-approved license, meaning that it is freely usable and
distributable even for commercial purposes. 19 3. Wireshark: Wireshark [46] is an
open sourced packet analyzer typically used in university for research. Wireshark
captures network packets in network interfaces and displays the packet content for
user to be analyzed. 4. IPerf: IPerf [49] was developed as a tool to measure the
maximum TCP bandwidth capable of a particular system. The tool includes tuning of
various parameters and TCP characteristics. At the end of the test, IPerf reports the
bandwidth, delay and datagram loss (if any). 5. Xilinx Integrated Software
Environment (ISE): Xilinx ISE is a platform for hardware design development.
Xilinx ISE interprets hardware descriptive languages or schematics and converts to
bitstreams for Xilinx FPGA. Xilinx ISE is bundled with a set of tools, and for this
research, the tools used are CORE Generator (COREGEN) and iMPACT.
COREGEN is used for Xilinx IP core generation used in the NetFPGA framework
while iMPACT is used to load bitstreams into the FPGA.
3.3

Research Planning and Schedule (Gantt Chart)

13
Table 3.1. Research planning and schedule (Gantt Chart)
No.

Activities

2016/20
9

Literatures review

Analysis of existing study

Submission of research proposal

Appointment of supervisor

Research proposal presentation

Questionnaires/interviews

Preparation of compliance checklist

Site assessment using the checklist

Data collection and statistical analysis

10

Analysis and evaluation

11

Project presentation

12

Writing up

13

Final editing of thesis

14

Final thesis submission

10

11

12

14

CHAPTER 4

EXPECTED FINDINGS

4.1

Expected Findings
The fast path can act as a preprocessor for slow path by providing hardware

acceleration for repetitive tasks or to provide higher abstractions for the slow path to
act as a control plane.
A multipath processing architecture opens up possibilities for multi-context
processing application and enables operations at different OSI layers to be executed
simultaneously.
Remote dynamic reconfiguration allows for adaptation to ever changing
environment, where security policy can be updated at run time.

15

REFERENCES
1.
2.
3.

4.
5.

6.
7.
8.

9.
10.
11.
12.
13.
14.

15.
16.

Kreutz, D., Ramos, F. M., Esteves Verissimo, P., Esteve Rothenberg, C.,
Azodolmolky, S. and Uhlig, S. Software-defined networking: A
comprehensive survey. proceedings of the IEEE, 2015. 103(1): 1476.
Srisuresh, P., Kuthan, J., Rayhan, A., Rosenberg, J. and Molitor, A.
Middlebox communication architecture and framework. 2002.
Dobrescu, M., Egi, N., Argyraki, K., Chun, B.-G., Fall, K., Iannaccone,
G., Knies, A., Manesh, M. and Ratnasamy, S. RouteBricks: exploiting
parallelism to scale software routers. Proceedings of the ACM SIGOPS
22nd symposium on Operating systems principles. ACM. 2009. 1528.
Jang, K., Han, S., Han, S., Moon, S. B. and Park, K. SSLShader: Cheap
SSL Acceleration with Commodity Processors. NSDI. 2011.
Sekar, V., Ratnasamy, S., Reiter, M. K., Egi, N. and Shi, G. The
middlebox manifesto: enabling innovation in middlebox deployment.
Proceedings of the 10th ACM Workshop on Hot Topics in Networks.
ACM. 2011. 21.
Kompella, R. R., Singh, S. and Varghese, G. On scalable attack detection
in the network. Proceedings of the 4th ACM SIGCOMM conference on
Internet measurement. ACM. 2004. 187200.
Labrecque, M., Steffan, J. G., Salmon, G., Ghobadi, M. and Ganjali, Y.
NetThreads: Programming NetFPGA with threaded software. NetFPGA
Dev. Workshop 09. 2009.
Lockwood, J. W., McKeown, N., Watson, G., Gibb, G., Hartke, P., Naous,
J., Raghuraman, R. and Luo, J. NetFPGAAn Open Platform for GigabitRate Network Switching and Routing. Microelectronic Systems
Education, 2007. MSE07. IEEE International Conference on. IEEE.
2007. 160161.
Comer, D. E. Network systems design using network processors. 2006.
Aweya, J. IP router architectures: an overview. International Journal of
Communication Systems, 2001. 14(5): 447475.
Labrecque, M. Overlay Architectures for FPGA-Based Software Packet
Processing. Ph.D. Dissertation. University of Toronto. 2011.
Deutsch, L. P. GZIP file format specification version 4.3. 1996.
Frankel, S., Glenn, R. and Kelly, S. The AES-CBC cipher algorithm and
its use with IPsec. Technical report. RFC 3602, September. 2003.
Dharmapurikar, S., Krishnamurthy, P., Sproull, T. and Lockwood, J. Deep
packet inspection using parallel bloom filters. High Performance
Interconnects, 2003. Proceedings. 11th Symposium on. IEEE. 2003. 44
51.
Raiciu, C., Iyengar, J., Bonaventure, O. et al. Recent advances in reliable
transport protocols. SIGCOMM ebook on Recent Advances in
Networking, 36 2013.
Carzaniga, A., Rutherford, M. J. and Wolf, A. L. A routing scheme for
contentbased networking. INFOCOM 2004. Twenty-third AnnualJoint

16

17.
18.

19.
20.
21.

22.
23.
24.
25.
26.
27.
28.
29.
30.
31.

Conference of the IEEE Computer and Communications Societies. IEEE.


2004, vol. 2. 918 928.
Dreger, H., Feldmann, A., Paxson, V. and Sommer, R. Predicting the
resource consumption of network intrusion detection systems. Recent
Advances in Intrusion Detection. Springer. 2008. 135154.
Argyraki, K., Fall, K., Iannaccone, G., Knies, A., Manesh, M. and
Ratnasamy, S. Understanding the packet forwarding capability of generalpurpose processors. Technical report. Technical Report IRB-TR-08-44,
Intel Research Berkeley. 2008.
Kachris, C. and Kulkarni, C. Configurable transactional memory.
FieldProgrammable Custom Computing Machines, 2007. FCCM 2007.
15th Annual IEEE Symposium on. IEEE. 2007. 6572.
Medhi, D. Network routing: algorithms, protocols, and architectures.
Morgan Kaufmann. 2010.
Lombardo, A., Panarello, C., Reforgiato, D., Santagati, E. and Schembra,
G. A module for packet hijacking in NetFPGA platform. Digital System
Design (DSD), 2011 14th Euromicro Conference on. IEEE. 2011. 283
286.
Wolf, W. A decade of hardware/software codesign. Computer, 2003.
36(4): 3843.
Nematbakhsh, S., Stitt, G. and Vahid, F. Hardware/Software Partitioning.
Varghese, G., Fingerhut, J. A. and Bonomi, F. Detecting evasion attacks at
high speeds without reassembly. ACM SIGCOMM Computer
Communication Review, 2006. 36(4): 327338. 37
Johnson, E. and Kunze, A. IXP 2xxx Programming, 2003.
Hennessy, J. L. and Patterson, D. A. Computer architecture: a quantitative
approach. Elsevier. 2011.
George, L. and Blume, M. Taming the IXP network processor. ACM
SIGPLAN Notices. ACM. 2003, vol. 38. 2637.
Combs, G. et al. Wireshark. Web page: http://www. wireshark. org/last
modified, 2007: 1202.
Graphics, M. ModelSim, 2007.
Van Rossum, G. and Drake, F. L. Python language reference manual.
Network Theory. 2003.
Tirumala, A., Qin, F., Dugan, J., Ferguson, J. and Gibbs, K. Iperf: The
TCP/UDP bandwidth measurement tool. htt p://dast. nlanr. net/Projects,
2005.

You might also like