You are on page 1of 59

Research Challenges in

SDN & NFV


Raouf Boutaba
David R. Cheriton School of Computer Science
University of Waterloo

CS 856, October 09, 2014

Outline
! Control & Data Plane
! Programming Languages, Verification, Consistent
Network Update
! Debugging, Fault-Tolerance, Security
! Monitoring & Traffic Engineering
! SDN in WAN
! Network Function Virtualization (NFV)

Control & Data Plane


! Control plane
! Decides how to handle network flows
! Implemented in a logically centralized control plane

! Data plane
! Switches are considered to be simple forwarding devices
! Forwards flows according to the decision made by the
control plane

Conventional Architecture
3. Consistent, up-to-date global network view

2. At least one Network OS


probably many.
Open- and closed-source

1. Open interface for packet forwarding

Packet
Forwarding

Packet
Forwarding

Packet
Forwarding
Packet
Forwarding

Packet
Forwarding
4

Limitations of Conventional
Architecture
! Scalability
! Controller: flow setup requests served per second
! Control plane traffic: the first packet of each new flow is
forwarded to the controller
! Flow table size: each switch has a limited flow table size & if
the table is full the switch will drop packets

! Flow setup delay


! Processing delay at the controller
! Propagation delay between the controller and switch

Limitations of Conventional
Architecture (cont)
! Single vs. multiple controllers
! A single controller causes
! Single point of failure
! Scaling limitations
! In case of multiple controllers
! How to maintain a consistent state across the network?
! How many controllers are sufficient to meet user
requirements?
! Where to place the controllers?

Limitations of Conventional
Architecture (cont)
! Distribution of network state and configuration data
! How to store this information?
! Strong vs. weak consistency?
! What level of security is required?

! Failure resilience
! Controller and switch failures

! What level of abstraction should be provided by NOS?


! It should abstract device-specific characteristics and
provide a common set of functionality

Proposals
! Hierarchically distributed controllers
! Kandoo

! Distributed controllers where each controller is responsible


for a specific zone
! Onix, DCP, ONOS, OpenDaylight

! Push down some functionality back to the data plane


! DevoFlow, DIFANE

Kandoo
! Two level of
controllers
! Local controllers
handle frequent
events
! Root (centralized)
controller handles
rare events

! Root controller
installs rules through
local controllers
9

Onix

! Onix is a distributed controller that runs on a cluster of physical


servers
! Provides a general purpose APIs that can be used to implement
control applications
10

Onix (cont)

! Onix stores device statistics in a Distributed Hash Table (DHT)


called the Network Information Base (NIB)
! The NIB is replicated and distributed between multiple controller
instances
11

DIFANE
! DIFANE recognizes some switches as authority switches
! They can cache rules for other switches

! The DIFANE controller distributes rules to authority switches


! The authority switch handles the packet in the data plane and
sends feedback to the ingress switch to cache the relevant
rule(s) locally.
! Subsequent packets
matching the cached rules
can be encapsulated and
forwarded directly to the
egress switch
12

Research Challenges
! Achieving scalability
! Controller, control traffic, switch flow table

! Understanding the tradeoff between different


architectures
! Centralized vs. distributed vs. functionality offloading

! Ensuring QoS
! Translating user requirements into suitable rule distribution

13

Research Challenges (cont)


! Reactive vs. proactive policies
! Pre-install rules to reduce flow setup requests

! Ensuring high availability


! For controller, switch and link failures

! Managing device status and statistics


! Consistency model
! Storage model

14

Outline
! Control & Data Plane
! Programming Languages, Verification, Consistent
Network Update
! Debugging, Fault-Tolerance, Security
! Monitoring & Traffic Engineering
! SDN in WAN
! Network Function Virtualization (NFV)

15

Programming Languages
! Programming languages provide higher-level
abstractions
! Control applications can be developed much easily in a
higher-level language (compared to OpenFlow) like:
! Pyretic, Nettle, Procera, etc.

! Some languages can also detect overlapping rules


! Frenetic, Pyretic, NetCore, etc.

16

Verification
! Verification tools can detect and avoid
! Forwarding loops
! Black holes

! Verification can be done at different layers


! Control app
! Controller
! Network device

17

Verification Tools
! Test packet generators test all possible events, corner
cases, and race conditions
! NICE, OFLOPS

! Tools to verify correctness property violations


! Header Space Analysis (HSA), FlowChecker, OFTEN, VeriFlow
! They can check for
! Reachability issues
! Configuration updates
! Forwarding loops
! Black holes

18

VeriFlow
! VeriFlow sits between the
controller and the network
devices
! Checks every rule entering the
network
! If a rule violates any invariants
then it is rejected and the
violation logged.

19

Network Update
! Traffic flows within a network are constantly changing
due to
! Switch failure/upgrade
! VM migration
! Traffic engineering
! Optimization

! Updating an old flow rule to a new one may cause


transient inconsistent forwarding of packets due to:
! Communication delay between controller and switch

20

Research Challenges
! OpenFlow is similar to assembly language
! Mimics the behavior of the hardware
! Developers need to spend too much time on details
! Overlapping rules
! Priority ordering of rules
! Transient inconsistencies when flow rules are being installed

! High-level languages can overcome these difficulties, but


they have problems of their own
! Yet another level of abstraction (overhead)
! Might cause slow response to events
! Portability to different platforms
21

Research Challenges (cont)


! Existing verification tools
! Do not scale to large networks
! Processing is slow
! Different tools offer different features
! No one tool to rule them all

! Network update
! Tradeoff between fast update & sporadic packet forwarding
! How to perform atomic updates across multiple switches?

22

Outline
! Control & Data Plane
! Programming Languages, Verification, Consistent Network
Update
! Debugging, Fault-Tolerance, Security
! Monitoring & Traffic Engineering
! Software Defined Infrastructure (SDI)
! SDN in WAN
! Network Function Virtualization (NFV)

23

Debugging
! A Software Defined Network (SDN) is run by software
! We should be able to debug a SDN similar to a program
! Available tools
! ndb
! NetSight
! DEFINED, etc.

24

ndb
! ndb supports
! Network breakpoint
! Packet backtracking
! Network breakpoints are implemented as special rules that
forward a copy of a matching packet to the postcard
collector
! The postcard collector stores the packet headers along with
the timestamp
! Programmer can analyze the postcards to find the root
cause

25

Fault-Tolerance
! There are multiple approaches to fault-tolerance
! Hardware (server, switch or link) failure
! A highly explored area in traditional networking
! SDN might provide more flexible ways to achieve faulttolerance
! Software failure
! The controller itself may fail
! One or more modules of the controller may fail
! One or more control applications may fail

26

FatTire
! A programming language for writing fault-tolerant network
programs
! Regular expression based Programming constructs for
! Specifying a set of paths packets may take
! The degree of required fault tolerance
! Utilizes the in-network fast-failover mechanisms of OpenFlow
! A FatTire program is compiled to OpenFlow switch
configurations

27

FatTire (cont)
! An example FatTire program

! This program has three components:


! Line 1: (Security) all SSH traffic must traverse the IDS
! Line 2: (Fault-tolerance) forwarding must be resilient to a single
link failure
! Line 3: (Routing) traffic from the gateway (GW) must be
forwarded to the access switch (A), along any path.

28

FatTire (cont)
! OpenFlow fast-failover

! Fast-failover is a conditional rule whose forwarding behavior


depends on the local state of the switch
! In this example
!
!
!
!

SSH traffic from port GW is forwarded to Group 1


Group 1 is a Fast-Failover (FF) group
If the out-port IDS Is active then traffic is forwarded through it
Otherwise traffic is forwarded through the out-port S2

29

Security
! There are two aspects of security in SDN
! Using SDN to improve network security
! Improving the security of SDN itself

! SDN allows policy enforcement at the entry point of the


network
! Malicious actions can be blocked before entering the critical
region of the network

! SDN itself opens up many security issues


! Attack on control plane communication can halt the network
! If the controller is compromised the entire network is
compromised
30

Research Challenges
! Debugging
! Tradeoff between accuracy & network load
! Depending on debugging objectives what kind of statistics
should be collected
! Scalability for large scale networks

! Fault-Tolerance
! Detect faults quickly
! Reactive vs. proactive selection of failover element
! Tradeoff between fast failover vs. optimal failover element

31

Research Challenges (cont)


! Security
! Are existing security measures adequate for ensuring control
plane security?
! Identifying the security loopholes in OpenFlow messaging
! Preventing information exposure
! Side channel attack targeting the flow setup process
! How to avoiding resource depletion attacks?
! An attacker can simply keep sending random packets to
the controller to overload it

32

Outline
! Control & Data Plane
! Programming Languages, Verification, Consistent Network
Update
! Debugging, Fault-Tolerance, Security
! Monitoring & Traffic Engineering
! Software Defined Infrastructure (SDI)
! SDN in WAN
! Network Function Virtualization (NFV)

33

Monitoring
! Monitoring is crucial for network management
! SDN (OpenFlow) provides flexible mechanisms to collect
statistics from the network devices
! One common feature provided by almost all OpenFlow
controllers is the collection of connectivity information
! An OpenFlow switch maintains two counters
! How many packets matched a flow entry
! How many bytes were forwarded by a flow entry

34

Proposals
! Most of the works in SDN monitoring focus on optimizing
resource utilization
! FlowSense
! Measures link utilization with zero overhead
! Collected data is not accurate
! Payless
! Provides adaptive sampling for measuring link utilization
! Tradeoff between accuracy & network overhead

! Some works change the data plane for better monitoring


! OpenSketch

35

FlowSense
! Passive monitoring
! Parser module captures control traffic and
sends it to the utilization monitor
! The utilization monitor updates utilization
values at every checkpoint
! The utilization table keeps track of link
utilization for all links

36

FlowSense

! Check pointing is done at each FlowRem event


! The flow_removed message contains two things
! How many packets matched the rule
! How many bytes were forwarded by this rule

! Link utilization is calculated accordingly


37

Payless
! Active monitoring framework
! Payless provides API to develop
monitoring apps
! It collects flow statistics at different
aggregation levels
! flow, packet and port

! It uses an adaptive polling algorithm


! Same level of accuracy as continuous
polling
! With much less communication
overhead
38

Traffic Engineering
! Traffic Engineering (TE) is done for different objectives
!
!
!
!

Maximizing network utilization


Ensuring QoS
Load balancing
Minimizing power consumption

! SDN is a very good fit for TE as


! the controller maintains the state of the entire network
! the optimal path for a traffic flow can be determined at a
central point
! traffic can be re-routed easily by installing new rules

39

Proposals
! Google deployed the B4 network for their inter data
center communication
! Other proposals include:
! Hedera
! QNOX
! Aster*x
! MicroTE

40

Aster*x
! Aster*x is a load balancer
! Content requests are load balanced
among the servers
! It has 3 main components
! Host Manager: tracks server state
and load.
! Net Manager: tracks topology
and link utilization
! Flow Manager: routes flows based
on network state and server load

Client

Content Servers

41

Research Challenges for Monitoring


! Tradeoff between monitoring accuracy & monitoring
traffic volume
! Active vs. passive monitoring
! How to extend OpenFlow monitoring features
! Are the packet and byte counters enough?

! Lack of proper monitoring APIs for


! Application aware network control plane
! Network aware applications
! or both

42

Research Challenges for TE


! Scale
! Many switches, hosts, and virtual machines

! Dynamism
! Large number of component failures
! Virtual Machine (VM) migration

! Traffic characteristics
! High traffic volume and dense traffic matrix
! Volatile, unpredictable traffic patterns

! Performance requirements
! Delay-sensitive applications
! Resource isolation between tenants

43

Outline
! Control & Data Plane
! Programming Languages, Verification, Consistent
Network Update

! Debugging, Fault-Tolerance, Security


! Monitoring & Traffic Engineering
! SDN in WAN
! Network Function Virtualization (NFV)
44

SDN in WAN
! SDN mostly used in data center networks
! Possibility of SDN in a WAN gained a lot of momentum
after Google deployed the B4 network
! In a WAN environment SDN is used for
! Traffic steering
! Achieving high link utilization
! Implementing advanced policies
! e.g., application specific peering between ISPs

45

Proposals
! Googles B4 network for inter data center
communication
! MSRs Software-Driven WAN (SWAN)
! A system to boost the utilization of inter-datacenter network
! Re-configures the networks data plane to match current
traffic demand
! Avoids transient congestion during rule updates by
leveraging a small amount of scratch capacity on links
! Updates are applied in a congestion-free manner

46

B4
! World-wide deployment

47

B4 (cont)
! Each site represents a DC
! Traffic demands and
current network state is
collected by Central TE
! Central TE analyses the
data and takes traffic
steering decisions
! These decisions are
forwarded to the site
controllers through the
gateway

48

Research Challenges
! SDN in WANs faces new challenges
! Switches can be located at far away locations in the
network
! Controller to switch propagation latency can be huge

! In a WAN environment, the SDN controller may become


overloaded and complex due to
! Huge volume of traffic
! Diversity in traffic types

! Number of forwarding rules supported by commodity


switches may not be enough for WAN traffic
49

Outline
! Control & Data Plane
! Programming Languages, Verification, Consistent Network
Update
! Debugging, Fault-Tolerance, Security
! Monitoring & Traffic Engineering
! Software Defined Infrastructure (SDI)
! SDN in WAN
! Network Function Virtualization (NFV)

50

Network Function Virtualization (NFV)


! Replace proprietary and vertically integrated hardware
middleboxes with software running on commodity servers
! Within VMs or containers or as processes

! Software middleboxes are called Virtualized Network


Functions (VNFs)
! Research areas include
! Infrastructure and VNF management
! VNF chain orchestration
! Virtualization platforms for NFV

51

ETSI Reference Architecture


VNF Chain
Orchestration

Infra & VNF


Management

Virtualization
Platform

Source: ETSI document no. ETSI GS NFV 002 V1.1.1 (2013-10)


52

Infrastructure and VNF Management


! Challenges in infrastructure and VNF management
include:
! Ensure fault tolerance for VNFs
! Auto-scaling VNFs to match traffic load
! VNF state management
! State synchronization across VNF instances
! State migration during VNF scaling
! State restoration during failure recovery
! Resource isolation between VNFs
! Orchestrate network and host resources

53

Proposals
! OpenNF [SIGCOMM 14]
! A control plane for NFV
! Triggers auto scaling
! Orchestrates VNF state migration
! Collaborates with SDN controller for network provisioning

! Elastic Edge (E2) [SOSP 15]


! A framework for deploying VNFs, inspired by cloud applications
! Frameworks allow users not to repeat common tasks
! E2 provides common functionality such as VNF placement,
scaling, resource isolation, failure recovery, etc.
! Unlike OpenNF, E2 is an integrated network and VNF control
plane

54

VNF Chain Orchestration


! VNFs can be chained to provide higher level services
! Firewall ! IDS ! Web Cache service for web hosting
! Chains can have Service level agreements (SLAs)
! e.g., 100Mbps minimum bandwidth along a chain

! Research problems include


! VNF chain placement
! Place a VNF chain to ensure SLA is met with minimal cost
! Cost: energy, bandwidth, CPU, etc.
! Routing in a VNF chain
! Properly steer subset of traffic through VNF chains

55

Proposals
! Bari et al. On orchestrating Virtual Network Functions,
CNSM 15
! Optimally place a set of VNF chains while satisfying traffic
demand (NP-hard problem)
! Dynamically adapt chain placement to network traffic dynamics

! Qazi et al., SIMPLE-fying middlebox policy enforcement using


SDN, SIGCOMM 13
! Steer traffic through VNF chains using SDN
! Tag packets with custom labels to identify middlebox chains and
traversal state
! Centralized controller makes tag management easier

56

Virtualization Platform for NFV


! Hardware middleboxes perform packet processing at line
speed
! VNFs running in VMs should be able to work at line speed
! VMs are still far from achieving line rate throughput on current
hypervisors

! Hypervisors need to support a high density (order of 100s) of


VMs in contrast to cloud applications
! Current hypervisors can support tens of VMs per machine

! NFV needs a virtualization platform where


! VMs can achieve near line rate throughput
! Hypervisors can host a high density of VMs
57

Proposals
! NetVM [NSDI 14]
! Modified hypervisor that reduces VM-to-VM communication
overhead by offloading the NIC

! ClickOS [NSDI 14]


! Modified Xen hypervisor paired with Click based modified MiniOS
! Provides close to line rate throughput and can run hundreds of
ClickOS VMs at the same time

58

Summary
! Covered some key research topics in SDN & NFV
! Detailed exploration in coming weeks
! SDN & NFV are both active research areas
! Many key issues need to be solved
! New research directions are yet to be discovered

59