Professional Documents
Culture Documents
Contents
Switch Fabric
1.1 Overview
Switch fabric, one of the most important modern communication technologies, is used to
transmit information from the sender to the receiver. The simplest method of transmitting
information from one point to another is to connect the two points with a single link. When
multiple terminals need to communicate with each other in point-to-point mode, any two of
the terminals must be connected. As the number of terminals to be interconnected increases,
the required links multiply. If N terminals need to be interconnected, the number of required
links is N x (N-1) / 2. For example, if 100 terminals need to be interconnected, up to 4950
links are required. To relieve the workload of installing links, a device that can automatically
connect terminals was introduced. After a terminal is connected to the device, the device
automatically connects the terminal to other terminals. This device is called a switch. A
switch reduces the number of required links from N x (N-1) / 2 to N, greatly reducing the
costs in installing links.
Routers are the core devices on IP networks. The switch fabric unit (SFU) is the core
component that determines the performance of a router. Generally a switch fabric technology
is determined before a new router is designed. In the more than twenty years' history of
routers, the switch fabric technology plays a significantly important role in expanding routers'
capacity and upgrading routers' performance.
The switch fabric technology has gone through three phases: shared bus switch, shared
memory switch, and crossbar switch. Correspondingly, routers have gone through the three
phases: shared bus router, shared memory router, and crossbar router. With information
explosion on networks, core routers are required to have increasingly larger capacity. To meet
such requirements, a single router is evolved into a cluster, and the single stage switch is
evolved into the multi-stage switch.
The word fabric itself means a cloth produced especially by knitting, weaving, or felting fibers.
Fabric in the switch fabric technology refers to the switch unit or chip. The words switching,
switching fabric, switch fabric, and fabric in this document have the same meaning and all refer to
the SFU on routers.
NE80Es/NE40Es are used as an example in this document to illustrate the principles of the switch
fabric. The principles of the SFUs on the CX600-8 and NE40E-8 are similar, so are those on the
CX600-16 and NE80E, on the CX600-X3/ME60-X3 and NE40E-X3, on the CX600-X8/ME60-X8
and NE40E-X8, and on the CX600-X16/ME60-X16 and NE40E-X16.
This document describes the switch fabric on routers from the following aspects:
Switch fabric indicators
Switch fabric classification
Introduction to various switch fabric technologies
Introduction to Huawei switch fabric
As the main road of a highway carries various vehicles coming to and fro, the backplane carries various
services transmitted to and fro. Therefore, the backplane must provide satisfying scalability. Planning 8
lanes based on a 4-lane requirement is reasonable. With the traffic development, the reserved 4 lanes
will be used one day.
Currently SerDes links are used to transmit data on the backplane in this industry. The rate of
SerDes links varies with design. For example, the rate of SerDes links can be 2.5 Gbit/s,
3.125 Gbit/s, 6.25 Gbit/s, and 12.5 Gbit/s.
Due to engineering restriction, engineers cannot freely add physical links on the backplane. To extend
the link capacity, upgrade the link rate.
Backplane capacity = Number of SerDes links between LPUs and SFUs x Rate of each SerDes link
In Figure 1-1, the backplane houses 16 LPUs and four SFUs. An LPU is connected to an SFU
through 18 SerDes links (nine SerDes links for data receiving and the other nine SerDes links
for data sending). The rate of each SerDes link is 6.25 Gbit/s.
The backplane capacity is calculated as 7.2 Tbit/s in the format of [2 x (9 x 4 x 16)] x
6.25 Gbit/s. The value 2 indicates the bidirectional (receiving and sending) capacity. The
value 9 indicates the number of SerDes links connecting each LPU and SFU. The value 4
indicates the number of SFUs. The value 16 indicates the number of 16 LPUs. The value 6.25
Gbit/s indicates the rate of each SerDes link.
SFUN0
LPU0
SFUN1
Backplane
SFUN2
LPU15
SFUN3
The backplane connects the LPUs to the SFUs, transmits signals along various control channels, and
provides power output by power modules for LPUs.
A Serializer/Deserializer (SerDes pronounced sir-deez) is a pair of functional blocks commonly
used in high speed communications to compensate for limited input/output. These blocks convert
data between serial data and parallel interfaces in each direction. The term "SerDes" generically
refers to interfaces used in various technologies and applications. The primary use of a SerDes is to
provide data transmission over a single/differential line in order to minimize the number of I/O pins
and interconnects.
As an analogy, the SFU on the backplane functions the same as the toll gate on the highway. The toll
gate stops, charges, and shows green lights on vehicles to effectively relieve traffic congestion.
Switching capacity of a device = Number of interfaces on SFUs x Rate of each SerDes link x SerDes
coding efficiency
In Figure 1-1, the switching capacity is calculated as 5.76 Tbit/s in the format of [2 x (9 x 4
x 16)] x 6.25 Gbit/s x 0.8. The value 0.8 is the SerDes coding efficiency.
Backplane
9x6.25G
Serdes
SFU0
LPU
GE0 SFU1
Packet Fabric
Processor Interface
GE99 SFU2
SFU3
A greater speedup factor not necessarily indicates higher SFU performance. Sometimes a great speedup
factor is calculated due to an over-simple switching algorithm.
the services on the faulty SFUs to prevent service interruption. If the number of faulty SFUs
exceeds M, the switch fabric performance of the system deteriorates.
Multicast
Multicast allows a router to copy data packets to multiple channels. In multicast mode, a
server can forward one data packet to a large number of clients that request the data packet at
the same time. A data packet can be forwarded to any client to greatly reduce the number of
data packets transmitted on networks. Therefore, multicast improves network usage and
reduces transmission costs.
Multicast in chips on an SFU is also called spatial multicast. In spatial multicast mode, the
SFU copies one data packet from one interface to multiple interfaces based on the multicast
group ID.
Backpressure
Backpressure is a method of unidirectional flow control. By notifying the upstream interface
of traffic congestion on the downstream interface, backpressure prevents traffic congestion
from deteriorating. For example, interfaces A and B are communicating with each other. If
interface A detects traffic congestion in its memory, interface A sends a special data frame, a
backpressure frame, to interface B. After receiving the backpressure frame, interface B does
not send data packets to interface A until the memory resources of interface A are available
again. As a public component on a router, the SFU is prone to traffic congestion. Different
internal backpressure mechanisms are designed for SFUs with different implementations. In
addition, the backpressure mechanism applies to the upstream or downstream LPUs
connected to the SFU.
Backpressure cannot prevent traffic congestion, but functions as a response to traffic congestion.
Actually, when backpressure occurs on a router, traffic congestion has occurred on the router.
Backpressure is used to prevent traffic congestion from deteriorating and help the upstream interfaces to
process traffic based on the traffic status on the downstream interface. As an analogy, fever is a response
to virus invasion. When a man got a fever, the viruses, indeed, had invaded. Fever is suppressing viruses
by means of high temperature and helps a man to recover.
Figure 1-3 Switch fabric classification based on the type of packets sent to the switch fabric
Cell switch
Based on the type
of packets sent to
the switch fabric
Packet switch
Virtual output queuing: Cells destined for different outbound interfaces are put in different queues on the
inbound interfaces, protecting cells destined for different outbound interfaces against HOL. Virtual
output queuing is not a new caching method but an improvement for IQ.
CIOQ fabric
Cells are cached partly on the inbound interface and partly on the outbound interface of
an SFU. The CIOQ fabric resolves the HOL problem on the inbound interface and does
not require high for the memory of the outbound interface. Therefore, the CIOQ fabric is
widely used.
Figure 1-4 Based on the location of the memory on the switch fabric
IQ
CIOQ
Multi-stage switch
for static routes
1.3.4 Based on the Mode in Which Data Packets Pass the Switch
Fabric
Based on the mode in which data packets pass the switch fabric, switch fabrics can be
classified as cut through switching fabric and store and forward switching fabric.
The cut through switching fabric does not wait for the completion of receiving data packets
(cells of specific length or packets with additional fields) but sends received data packets to
the outbound interface. Theoretically, the cut through switching fabric boasts high forwarding
rate and low switch latency.
The store and forward switching fabric sends the received data packets to the outbound
interface only after verifying these data packets. The store and forward switching fabric
boasts excellent fault tolerance performance.
The cut through switching fabric has defects in interface rate adaptation and fault tolerance.
Therefore, the store and forward switching fabric is more commonly used.
Routing
CPU
table unit
Shared bus
In shared bus switch, non-blocking switch means that the sum of all interface bandwidth must
be smaller than the shared bus bandwidth. In other words, the switch performance of a router
is determined by the shared bus bandwidth. In addition, the switch performance of a router is
affected by the CPU capability. In Figure 1-6, when LPU 1 is communicating with LPU N,
shared bus resources are occupied. As a result, LPU 2 cannot communicate with other LPUs.
Therefore, the router performance is determined by the shared bus capacity.
The 1st generation router generally uses the shared bus switch technology, such as the Huawei
NE16E.
Memory
Memory
Memory
controller
Inbound Outbound
interface interface
Figure 1-8 shows a typical shared memory router architecture. After LPU 1 receives data, the
distributed memory management unit writes the data into the distributed memory. Based on
the result of searching the routing table for a destination outbound interface, LPU 1 sends the
data to LPU N.
Distributed memory
Backplane
management unit
Routing
table
For a non-blocking shared memory switch architecture, the bandwidth for writing data into
the memory must be greater than the sum bandwidth of all inbound interfaces, and the
bandwidth for reading data from the memory must be greater than the sum bandwidth of all
outbound interfaces.
crossbar switch is a good choice. With the number of inbound interfaces increases, the
crossbar controller complexity grows in proportion to N.
Control
signals: Cross-point for N x N
control switch matrix
application
and
authorization
Switch matrix
reconfiguration
Switch matrix controller
Crossbar controller
The crossbar switch technology is generally used on single-stage switch fabrics, especially in
a single chassis, such as the Huawei NE80E.
The single-stage crossbar switch is simple in design and reduces costs, but cannot meet
requirements for next-generation Internet expansion. The multi-stage crossbar switch is
complex in operation, but supports thousands of interfaces. The multi-stage crossbar is
necessary for multi-chassis routers.
Different single-stage crossbar switch fabrics and inter-stage connections mark different
multi-stage crossbar switch fabrics. The most common multi-stage crossbar switch fabrics are
Benes and Clos, both of which are named after the inventors.
Benes switch fabric
The Benes switch fabric was invented by Benes in 1964. In the Benes switch architecture,
each single-stage switch fabric uses the N x N matrix, and N/d connections are provided
between inbound and outbound interfaces. The middle stage can provide nonstop
maintenance. The Benes switch fabric, however, cannot ensure that cells are transmitted
in sequence. Therefore, additional packet sequence controlling operations are required.
On a common 3-stage Benes switch fabric, N = d. The first stage fragments packets into
cells, and the second and third stages send the cells to the destination interfaces.
Plane
algorithm
Non-square Square
switch matrix switch
matrix
The crossbar switch without internal memories includes the Tandem Banyan. The crossbar switch
without internal memories is equipped with a small number of logics, but requires a large number of
intra-chip or inter-chip connections if it needs to achieve the same performance as that of the crossbar
switch with internal memories.
interface quantity
Extend the
switching
Single-stage
Multi-plane
single-plane
switching
switching
Multi-stage multi-plane
switching
Each FIC is connected to all switch planes to ensure that cells can be evenly allocated to each switch
plane. This not only facilitates load balancing but also is conducive to system fault tolerance.
2. After the cells reach the crossbar unit, the crossbar schedules the cells to the outbound
interfaces of the SFU and sends them to the FIC on the LPU. Then the cell switch is
complete.
3. After the cells reach the FIC, the FIC reassembles the cells into IP data packets and
sends them to the outbound interface on the LPU. Then the single-stage IP data packet
switch is complete on the router.
1. After a data packet reaches a physical interface of the LPU on the CLC A, the data
processing unit on the LPU processes the packet and sends it to the traffic manager (TM).
The TM fragments the packet into cells. After being cached and scheduled in queues, the
cells are sent to S1 (SFU on the CLC A).
Each TM is connected to the switch planes through one or more connections so that cells can be evenly
allocated to various switch planes.
2. After cells reach S1, the switch fabric evenly allocates the cells to S2 (SFU on the CCC).
S2 sends the cells to the SFU on the destination CLC, S3 on CLC B. After receiving the
cells, S3 sends the cells to the destination LPU.
The principle of processing data packets on the single-stage switch is similar to that on the multi-stage
switch. The multi-stage switch is easier to establish a large-capacity switching network and improve the
switching performance of the system.
3. After the cells reach the TM on the destination LPU, the TM reassembles the cells into
IP data packets and sends them to the outbound interface on the LPU. Then the 3-stage
IP data packet switch is complete on the router.
1.6 FAQ
1.6.1 How to Calculate the LPU Capacity Based on the SFU
Capacity?
Generally the backplane capacity of a router is greater than the SFU capacity, and the SFU
capacity is greater than the LPU capacity. The ratio of the SFU capacity to the LPU capacity
is the speedup factor.
To reduce costs and improve system scalability, the backplane capacity is generally planned
quite large for further LPU expansion.
Usually the SFU capacity is barely enough for the current LPU specifications and further LPU
expansion. For example, if the NE40E-8 houses two SFUDs and two SRUs, the SFU capacity
is calculated as 327.68 Gbit/s in the format of 4 (number of SFUs) x 8 (number of interfaces
on each SFU) x 4 (number of SerDes links on each interface) x 3.2 Gbit/s (rate of each SerDes
link) x 0.8 (SerDes coding efficiency). The capacity allocated to each LPU is 40.96 Gbit/s in
the format of 327.68G / 8. Use the speedup factor 2 as an example. Each slot of the SFU on
the NE40E-8 supports the LPU with a maximum of 20G capacity. For example, if the NE40E-
8 houses four SFUGs, the SFU capacity is calculated as 1.31 Tbit/s in the format of 4 (number
of SFUs) x 16 (number of interfaces on each SFU) x 8 (number of SerDes links on each
interface) x 3.2 Gbit/s (rate of each SerDes link) x 0.8 (SerDes coding efficiency). The
capacity allocated to each LPU is 81.92 Gbit/s in the format of 1.31T / 16. Use the speedup
factor 2 as an example. Each slot of the SFU on the NE40E-8 supports the LPU with a
maximum of 40G capacity.
The capacity generally refers to the bidirectional capacity. For example, the receiving and sending
rates of a common GE interface are respectively 1 Gbit/s. Therefore, the GE interface capacity is 2
Gbit/s in terms of the bidirectional capacity and is 1 Gbit/s in terms of the unidirectional capacity. If
not specially described, the capacity in this document refers to the unidirectional capacity.
If the unidirectional capacity is to be calculated, the backplane, SFU, and LPU capacity that will be
used in calculation must be unidirectional. If the bidirectional capacity is to be calculated, the
backplane, SFU, and LPU capacity that will be used in calculation must be bidirectional. The
capacity standard must be consistent.