You are on page 1of 63

CMPE 252A : Computer Networks

An Introduction to
Software Defined Networking (SDN)
&
OpenFlow

Huazhe Wang
huazhe.wang@ucsc.edu

Some slides are borrowed from http://www.cs.princeton.edu/courses/archive/spr12/cos461/


http://www-net.cs.umass.edu/kurose-ross-ppt-6e/
The Internet: A Remarkable Story

 Tremendous success
 From research experiment
to global infrastructure
 Brilliance of under-specifying
 Network: best-effort packet delivery
 Hosts: arbitrary applications

 Enables innovation in applications


 Web, P2P, VoIP, social networks, virtual worlds

 But, change is easy only at the edge… 

1-2
Inside the ‘Net: A Different Story…

 Closed equipment
 Software bundled with hardware
 Vendor-specific interfaces

 Over specified
 Slow protocol standardization

 Few people can innovate


 Equipment vendors write the code
 Long delays to introduce new features

Impacts performance, security, reliability, cost…


1-3
Networks are Hard to Manage

 Operating a network is expensive


 More than half the cost of a network
 Yet, operator error causes most outages

 Buggy software in the equipment


 Routers with 20+ million lines of code
 Cascading failures, vulnerabilities, etc.

 The network is “in the way”


 Especially a problem in data centers
 … and home networks

1-4
Traffic engineering: difficult for traditional
routing
5
3
2 v w 5

u 2 1
3 z
1
2
x 1 y

Q: what if network operator wants u-to-z traffic to flow


along uvwz, x-to-z traffic to flow xwyz?
A: need to define link weights so traffic routing
algorithm computes routes accordingly (or need a new
routing algorithm)!
1-5
Traffic engineering: difficult

5
3
2 v w 5

u 2 1
3 z
1
2
x 1 y

Q: what if network operator wants to split u-to-z


traffic along uvwz and uxyz (load balancing)?

A: can’t do it (or need a new routing algorithm)

1-6
Traffic engineering: difficult

5
3
v
v
w
w
2 5

zz
u 2 1
3
1
2
xx yy
1

Q: what if w wants to route blue and red traffic


differently?

A: can’t do it (with destination based forwarding, and


LS, DV routing)

1-7
Rethinking the “Division of Labor”
Network-layer functions

Recall: two network-layer functions:


 forwarding: move packets
from router’s input to data plane
appropriate router output
 routing: determine route
taken by packets from source control plane
to destination

Two approaches to structuring network control plane:


 per-router control (traditional)
 logically centralized control (software defined networking)

1-9
Traditional Computer Networks

Data plane:
Packet
streaming

Forward, filter, buffer, mark,


rate-limit, and measure packets 1-10
Traditional Computer Networks

Per-router control plane:


Distributed algorithms

Track topology changes, compute


routes, install forwarding rules 1-11
Traditional Computer Networks

Management plane:
Human time scale

Collect measurements and configure the equipment


1-12
Software Defined Networking (SDN)

Logically-centralized control

Smart,
slow
API to the data plane
(e.g., OpenFlow)

Dumb,
fast

Switches

1-13
Software defined networking (SDN) 3. control plane
functions
4.programmable external to
control routing
access
control
… load
balance data-plane
applications switches
Remote Controller

control
plane
data
plane
CA
CA CA CA CA 2. control,
data plane
separation
1: generalized“
flow-based”
forwarding (e.g.,
OpenFlow) 1-14
Software defined networking (SDN)
Why a logically centralized control plane?
 easier network management: avoid router
misconfigurations, greater flexibility of
traffic flows
 table-based forwarding allows
“programming” routers
 centralized “programming” easier: compute
tables centrally and distribute
 distributed “programming: more difficult:
compute tables as result of distributed
algorithm (protocol) implemented in each and
every router
 open (non-proprietary) implementation of
control plane
1-15
Analogy: mainframe to PC evolution *

Ap Ap Ap Ap Ap Ap Ap Ap Ap Ap
App
Specialized p p p p p p p p p p
Applications Open Interface

Specialized Windows Mac


Operating (OS)
or Linux or OS
System
Open Interface
Specialized
Hardware
Microprocessor

Vertically integrated Horizontal


Closed, proprietary Open interfaces
Slow innovation Rapid innovation
Small industry Huge industry
1-16
* Slide courtesy: N. McKeown
SDN perspective: data plane switches
Data plane switches network-control applications
 fast, simple, commodity …
switches implementing routing
generalized data-plane access load
control balance
forwarding in hardware control
 switch flow table computed, northbound API plane

installed by controller
SDN Controller
 API for table-based switch (network operating system)
control (e.g., OpenFlow)
 defines what is controllable southbound API

and what is not


 protocol for communicating data
plane
with controller (e.g.,
OpenFlow) SDN-controlled switches
1-17
SDN perspective: SDN controller

SDN controller (network


network-control applications


OS): routing
 maintain network state access load
information control balance
 interacts with network northbound API
control
plane
control applications “above”
via northbound API
SDN Controller
 interacts with network (network operating system)
switches “below” via
southbound API southbound API

 implemented as distributed
system for performance, data
scalability, fault-tolerance, plane

robustness
SDN-controlled switches
1-18
SDN perspective: control applications

network-control apps: network-control applications


 “brains” of control: …
implement control routing
functions using lower-level access load
services, API provided by control balance
SND controller control
plane
 unbundled: can be
northbound API

provided by 3rd party:


distinct from routing SDN Controller
vendor, or SDN controller
(network operating system)

southbound API

data
plane

SDN-controlled switches
1-19
SDN: control/data plane interaction example
Dijkstra’s link-state
1 S1, experiencing link
Routing failure using OpenFlow
port status message to
4 5 notify controller
network RESTful … intent
2 SDN controller
graph API

receives OpenFlow

3 Application interface layer

statistics flow tables message, updates link


status info

3 Dijkstra’s routing
Link-state info host info switch info
2 Network-wide management layer algorithm application
OpenFlow
… SNMP has previously
registered to be called
Communication layer when ever link status
1 changes. It is called.

s2 4 Dijkstra’s routing
algorithm access
s1 network graph info, link
s4
s3 state info in controller,
computes new routes 1-20
SDN: control/data plane interaction example
Dijkstra’s link-state
Routing
5 link state routing app
4 5 interacts with flow-
network RESTful … intent table-computation
graph API
component in SDN
controller, which

3 Application interface layer

statistics flow tables computes new flow


tables needed
Link-state info host info … switch info
2 Network-wide management layer 6 Controller uses
OpenFlow
… OpenFlow to install new
tables in switches that
SNMP

Communication layer need updating


1

s2
s1
s4
s3
1-21
OpenFlow (OF) defines the communication
protocol that enables the SDN Controller
to directly interact with the forwarding
plane of network devices.
OpenFlow data plane abstraction
 flow: defined by header fields
 generalized forwarding: simple packet-handling rules
 Pattern: match values in packet header fields
 Actions: drop, forward, modify, matched packet or send
matched packet to controller
 Priority: disambiguate overlapping patterns
 Counters: #bytes and #packets

* : wildcard
1. src=1.2.*.*, dest=3.4.5.*  drop
2. src = *.*.*.*, dest=3.4.*.*  forward(2)
3. src=10.1.2.3, dest=*.*.*.*  send to controller 1-23
OpenFlow data plane abstraction

 match+action: unifies different kinds of devices


 Router  Firewall
• match: longest match: IP addresses
destination IP prefix and TCP/UDP port
• action: forward out numbers
a link action: permit or
 Switch deny
• match: destination  NAT
MAC address match: IP address
• action: forward or and port
flood action: rewrite
address and port 1-24
Examples
Destination-based forwarding:
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Action
Port src dst type ID Src Dst Prot sport dport
* * * * * * 51.6.0.8 * * * port6
IP datagrams destined to IP address 51.6.0.8 should
be forwarded to router output port 6
Firewall:
Switch MAC MAC Eth VLAN IP IP IP TCP TCP
Forward
Port src dst type ID Src Dst Prot sport dport
* * * * * * * * * 22 drop
do not forward (block) all datagrams destined to TCP port 22

Switch MAC MAC Eth VLAN IP IP IP TCP TCP


Forward
Port src dst type ID Src Dst Prot sport dport
* * * * * 128.119.1.1
* * * * drop
do not forward (block) all datagrams sent by host 128.119.1.1 1-25
OpenFlow protocol

 operates between
controller, switch
 TCP used to
exchange messages
 OpenFlow messages

1-26
OpenFlow: controller-to-switch messages
 features: controller
queries switch features,
switch replies
 configure: controller
queries/sets switch
configuration parameters
 modify-state: add, delete,
modify flow entries in the
OpenFlow tables
 packet-out: controller can
send this packet out of
specific switch port 1-27
OpenFlow: switch-to-controller messages

 packet-in: transfer packet


(and its control) to controller.
See packet-out message from
controller
 flow-removed: flow table entry
deleted at switch
 port status: inform controller
of a change on a port.
Fortunately, network operators don’t “program”
switches by creating/sending OpenFlow messages
directly. Instead use higher-level abstraction at
controller 1-28
OpenFlow in the Wild
 Open Networking Foundation
 Google, Facebook, Microsoft, Yahoo, Verizon, Deutsche
Telekom, and many other companies
 Commercial OpenFlow switches
 HP, NEC, Quanta, Dell, IBM, Juniper, …
 Network operating systems
 NOX, Beacon, Floodlight, OpenDaylight, POX, Pyretic

 Network deployments
 Eight campuses, and two research backbone networks
 Commercial deployments (e.g., Google backbone)

1-29
Challenges
Heterogeneous Switches

 Number of packet-handling rules


 Range of matches and actions
 Multi-stage pipeline of packet processing
 Offload some control-plane functionality (?)

access MAC IP
control look-up look-up

1-31
Controller Delay and Overhead
 Controller is much slower the the switch
 Processing packets leads to delay and
overhead
 Need to keep most packets in the “fast path”

packets

1-32
Distributed Controller
For scalability
Controller and reliability Controller
Application Application
Partition and replicate state
Network OS Network OS

1-33
Testing and Debugging
 OpenFlow makes programming possible
 Network-wide view at controller
 Direct control over data plane

 Plenty of room for bugs


 Still a complex, distributed system

 Need for testing techniques


 Controller applications
 Controller and switches
 Rules installed in the switches

1-34
Evolution of OpenFlow

1-35
Evolution of OpenFlow

P4 OpenFlow 2.0

1-36
Run OpenFlow Experiments
 Mininet

 Open vSwitch (OVS)

1-37
Hedera: Dynamic Flow Scheduling
for Data Center Networks

Mohammad Al-Fares Sivasankar Radhakrishnan


Barath Raghavan* Nelson Huang Amin Vahdat

UC San Diego *Williams College

Slides modified from https://people.eecs.berkeley.edu/~istoica/classes/cs294/.../21-Hedera-TD.pptx


Problem
 MapReduce/Hadoop style workloads have substantial
BW requirements
 ECMP-based multi-paths load balancing often leads
oversubscription --> jobs bottlenecked by network

Key insight/idea
 Identify large flows and periodically rearrange them
to balance the load across core switches

Challenges
 Estimate bandwidth demand of flows
 Find optimal allocation network paths for flows
ECMP Paths
 Many equal cost paths going up to the core switches
 Only one path down from each core switch
 Randomly allocate paths to flows using hash
 Agnostic to available resources
 Long lasting collisions between long (elephant) flows

S D
Collisions of elephant flows

 Collisions possible in two different ways


 Upward path
 Downward path

S1 S2 S3 D2 S4 D1 D4 D3
Hedera Scheduler
1. Detect 2. Estimate 3. Schedule
Large Flows Flow Demands Flows

 Detect Large Flows


 Flows that need bandwidth but are network-limited
 Estimate Flow Demands
 Use min-max fairness to allocate flows between src-
dst pairs
 Place Flows
 Use estimated demands to heuristically find better
placement of large flows on the ECMP paths
Elephant Detection

 Scheduler continually polls edge switches for


flow byte-counts
 Flows exceeding B/s threshold are “large”
• > %10 of hosts’ link capacity (i.e. > 100Mbps)

 What if only mice on host?


 Default ECMP load-balancing efficient for
small flows
Demand Estimation

 Measured flow rate is misleading

 Need to find a flow’s “natural” bandwidth


requirement when not limited by the network
 find max-min fair bandwidth allocation
Demand Estimation

 Given traffic matrix of large flows, modify


each flow’s size at it source and destination
iteratively…
 Sender equally distributes bandwidth among
outgoing flows that are not receiver-limited
 Network-limited receivers decrease exceeded
capacity equally between incoming flows
 Repeat until all flows converge

Guaranteed to converge in O(|F|) time


Demand Estimation
Demand Estimation
Demand Estimation
Demand Estimation
Flow Placement Heuristics

 Two approaches

 Global First Fit: Greedily choose path that has


sufficient unreserved b/w

 Simulated Annealing: Iteratively find a globally


better mapping of paths to flows
Global First-Fit
Scheduler
? ? ?
Flow A
Flow B 0 1 2 3
Flow C

 New flow detected, linearly search all possible paths


from SD
 Place flow on first path whose component links can fit
that flow
Global First-Fit
Scheduler
Flow A
Flow B 0 1 2 3
Flow C

 Once flow ends, entries + reservations time out


Simulated Annealing

 Simulated Annealing:
 a numerical optimization technique for searching for a
solution in a space otherwise too large for ordinary
search methods to yield results

 Probabilistic search for good flow-to-core mappings


Simulated Annealing

 State: A set of mappings from destination hosts to


core switches.

 Neighbor State: Swap core switches between 2 hosts


 Within same pod,
 Within same edge,
 etc
Simulated Annealing

 Function/Energy: Total exceeded b/w capacity


 Using the estimated demand of flows
 Minimize the exceeded capacity

 Temperature: Iterations left


 Fixed number of iterations (1000s)

 Achieves good core-to-flow mapping


 Sometimes very close to global optimal
Simulated Annealing
Scheduler
Core ? ? ? ?
Flow A 2
Flow B 10 0 1 2 3
Flow C 03
2

 Example run: 3 flows, 3 iterations


Simulated Annealing
Scheduler
Core ? ? ? ?
Flow A 2
Flow B 0 0 1 2 3
Flow C 3

 Final state is published to the switches and


used as the initial state for next round
Simulated Annealing

 Optimizations
 Assign a single core switch to each destination host
 Incremental calculation of exceeded capacity
 Using previous iterations best result as initial state
Evaluation
Evaluation

Data Shuffle
ECMP GFF SA Control
Total Shuffle Time (s) 438.4 335.5 336.0 306.4
Avg. Completion Time (s) 358.1 258.7 262.0 226.6
Avg. Bisection BW (Gbps) 2.81 3.89 3.84 4.44
Avg. host goodput (MB/s) 20.9 29.0 28.6 33.1

 16-hosts: 120 GB all-to-all in-memory shuffle

 Hedera achieves 39% better bisection BW over


ECMP, 88% of ideal non-blocking switch
Reactiveness
 Demand Estimation:

 27K hosts, 250K flows, converges < 200ms


 Simulated Annealing:

 Asymptotically dependent on # of flows + #


iterations
• 50K flows and 10K iter: 11ms
 Most of final bisection BW: first few hundred
iterations
 Scheduler control loop:

 Polling + Estimation + SA = 145ms for 27K hosts


Limitations

 Dynamic workloads,
ECMP Hedera large flow turnover
faster than control
loop
Stable

Scheduler will be
Matrix Stability


continually chasing the
traffic matrix
Unstable

 Need to include
penalty term for
unnecessary SA flow
Flow Size re-assignments
Conclusion

 Rethinking networking
 Open interfaces to the data plane
 Separation of control and data
 Leveraging techniques from distributed systems

 Significant momentum
 In both research and industry

1-64

You might also like