Basic Router Hardware Concept - Switch Fabric (V2.0)

Basic Router Hardware Concept - Switch Fabric Contents
Contents
Switch Fabric ..................................................................................................................................... 3

1.1 Overview .......................................................................................................................................................... 3
1.2 Switch Fabric Indicators................................................................................................................................... 4
1.2.1 Backplane Capacity................................................................................................................................. 4
1.2.2 Switching Capacity ................................................................................................................................. 5
1.2.3 Speedup Factor........................................................................................................................................ 5
1.2.4 Backup Mode of SFUs ............................................................................................................................ 6
1.2.5 Fabric Throughput................................................................................................................................... 7
1.2.6 Fabric Latency ........................................................................................................................................ 7
1.2.7 Fabric Scalability .................................................................................................................................... 7
1.2.8 Unicast, Multicast, and Backpressure ..................................................................................................... 7
1.2.9 Switch Fabric Performance Indicators .................................................................................................... 8
1.3 Switch Fabric Classification............................................................................................................................. 8
1.3.1 Based on the Type of Packets Sent to the Switch Fabric......................................................................... 8
1.3.2 Based on the Location of the Memory on the Switch Fabric .................................................................. 9
1.3.3 Based on the Times of Data Exchanges ................................................................................................ 10
1.3.4 Based on the Mode in Which Data Packets Pass the Switch Fabric ..................................................... 10
1.4 History and Trend of Switch Fabric ............................................................................................................... 11
1.4.1 Shared Bus Switch (1st Generation) ..................................................................................................... 11
1.4.2 Shared Memory Switch (2nd Generation) ............................................................................................ 12
1.4.3 Crossbar Switch (3rd Generation) ......................................................................................................... 13
1.4.4 Trend of Switch Fabric .......................................................................................................................... 17
1.5 Introduction to Huawei Switch Fabric ........................................................................................................... 18
1.5.1 NE80E/NE40E SFU.............................................................................................................................. 18
1.5.2 NE5000E SFU ...................................................................................................................................... 19
1.6 FAQ ................................................................................................................................................................ 20
1.6.1 How to Calculate the LPU Capacity Based on the SFU Capacity?....................................................... 20
1.6.2 Can the SFUs on the NE40E, NE80E, and NE5000E Be Installed on the Same Device? .................... 21
1.6.3 Can the SFUs not Be Fully Configured on an NE40E/NE80E/NE5000E? ........................................... 21
1.6.4 Do the SFUs on an NE40E/NE80E/NE5000E Support Hot Swapping? ............................................... 21
1.6.5 Does an NE40E-X3 house SFUs? ......................................................................................................... 22
2014-08-26 Huawei confidential. No spreading without permission. Page 1 of 22

1.7 Document Change History ............................................................................................................................. 22

Switch Fabric
1.1 Overview
Switch fabric, one of the most important modern communication technologies, is used to
transmit information from the sender to the receiver. The simplest method of transmitting
information from one point to another is to connect the two points with a single link. When
multiple terminals need to communicate with each other in point-to-point mode, any two of
the terminals must be connected. As the number of terminals to be interconnected increases,
the required links multiply. If N terminals need to be interconnected, the number of required
links is N x (N-1) / 2. For example, if 100 terminals need to be interconnected, up to 4950
links are required. To relieve the workload of installing links, a device that can automatically
connect terminals was introduced. After a terminal is connected to the device, the device
automatically connects the terminal to other terminals. This device is called a switch. A
switch reduces the number of required links from N x (N-1) / 2 to N, greatly reducing the
costs in installing links.
Routers are the core devices on IP networks. The switch fabric unit (SFU) is the core
component that determines the performance of a router. Generally a switch fabric technology
is determined before a new router is designed. In the more than twenty years' history of
routers, the switch fabric technology plays a significantly important role in expanding routers'
capacity and upgrading routers' performance.
The switch fabric technology has gone through three phases: shared bus switch, shared
memory switch, and crossbar switch. Correspondingly, routers have gone through the three
phases: shared bus router, shared memory router, and crossbar router. With information
explosion on networks, core routers are required to have increasingly larger capacity. To meet
such requirements, a single router is evolved into a cluster, and the single stage switch is
evolved into the multi-stage switch.
The word fabric itself means a cloth produced especially by knitting, weaving, or felting fibers.
Fabric in the switch fabric technology refers to the switch unit or chip. The words switching,
switching fabric, switch fabric, and fabric in this document have the same meaning and all refer to
the SFU on routers.
NE80Es/NE40Es are used as an example in this document to illustrate the principles of the switch
fabric. The principles of the SFUs on the CX600-8 and NE40E-8 are similar, so are those on the
CX600-16 and NE80E, on the CX600-X3/ME60-X3 and NE40E-X3, on the CX600-X8/ME60-X8
and NE40E-X8, and on the CX600-X16/ME60-X16 and NE40E-X16.
This document describes the switch fabric on routers from the following aspects:
Switch fabric indicators
Switch fabric classification
Introduction to various switch fabric technologies
Introduction to Huawei switch fabric

History and trend of switch fabric

FAQs
1.2 Switch Fabric Indicators

This section describes the switch fabric indicators to help you better understand switch fabric.
1.2.1 Backplane Capacity

The backplane is an important component that connects internal units on a router. The
backplane capacity refers the total bandwidth of channels between service slots and switching
units. The backplane capacity is generally greater than the capacity that is calculated based on
the router throughput and actual performance test results. The backplane capacity reflects the
techniques of manufacturers and the further capacity capability of the router. The backplane
capacity cannot be directly tested.
A router with a 400G backplane means that the 400 Gbit/s bandwidth is supported on each
slot of the backplane. A 100G or 200G backplane may be used on the live network at the
beginning. After the technology is improved, if a 400G backplane is required, engineers do
not need to replace the original backplane but upgrade the bandwidth of each slot to 400
Gbit/s on the original backplane. In actual design, to provide 400G access capability, the
backplane capacity needs to be much greater than 400G.
As the main road of a highway carries various vehicles coming to and fro, the backplane carries various
services transmitted to and fro. Therefore, the backplane must provide satisfying scalability. Planning 8
lanes based on a 4-lane requirement is reasonable. With the traffic development, the reserved 4 lanes
will be used one day.
Currently SerDes links are used to transmit data on the backplane in this industry. The rate of
SerDes links varies with design. For example, the rate of SerDes links can be 2.5 Gbit/s,
3.125 Gbit/s, 6.25 Gbit/s, and 12.5 Gbit/s.
Due to engineering restriction, engineers cannot freely add physical links on the backplane. To extend
the link capacity, upgrade the link rate.
Backplane capacity = Number of SerDes links between LPUs and SFUs x Rate of each SerDes link
In Figure 1-1, the backplane houses 16 LPUs and four SFUs. An LPU is connected to an SFU
through 18 SerDes links (nine SerDes links for data receiving and the other nine SerDes links
for data sending). The rate of each SerDes link is 6.25 Gbit/s.
The backplane capacity is calculated as 7.2 Tbit/s in the format of [2 x (9 x 4 x 16)] x
6.25 Gbit/s. The value 2 indicates the bidirectional (receiving and sending) capacity. The
value 9 indicates the number of SerDes links connecting each LPU and SFU. The value 4
indicates the number of SFUs. The value 16 indicates the number of 16 LPUs. The value 6.25
Gbit/s indicates the rate of each SerDes link.

Figure 1-1 Backplane capacity/switching capacity

(9+9)xSerDes
SFUN0
LPU0
SFUN1
Backplane
SFUN2
LPU15
SFUN3
The backplane connects the LPUs to the SFUs, transmits signals along various control channels, and
provides power output by power modules for LPUs.
A Serializer/Deserializer (SerDes pronounced sir-deez) is a pair of functional blocks commonly
used in high speed communications to compensate for limited input/output. These blocks convert
data between serial data and parallel interfaces in each direction. The term "SerDes" generically
refers to interfaces used in various technologies and applications. The primary use of a SerDes is to
provide data transmission over a single/differential line in order to minimize the number of I/O pins
and interconnects.
1.2.2 Switching Capacity

The switching capacity refers to the maximum capacity of an SFU on routers. The switching
capacity of a non-blocking SFU is equal to the sum of all the interfaces' capacities. After a
packet is transmitted to a router through an LPU, the router adds some additional information
(Overhead) to the packet for protocol switchover. Therefore, the volume of traffic processed
by the SFU is greater than that processed by the LPU. The switching capacity is not directly
associated with the user bandwidth, but is an indicator that combines the user bandwidth,
Overhead, and Speedup factor to reflect the overall performance of an SFU.
The switching capacity of a router is equal to the sum of all SFU's switching capacities.
As an analogy, the SFU on the backplane functions the same as the toll gate on the highway. The toll
gate stops, charges, and shows green lights on vehicles to effectively relieve traffic congestion.
Switching capacity of a device = Number of interfaces on SFUs x Rate of each SerDes link x SerDes
coding efficiency
In Figure 1-1, the switching capacity is calculated as 5.76 Tbit/s in the format of [2 x (9 x 4
x 16)] x 6.25 Gbit/s x 0.8. The value 0.8 is the SerDes coding efficiency.
1.2.3 Speedup Factor

The speedup factor, S for short, is an important indicator that evaluates the switch fabric
performance. The format of calculating the speedup factor is as follows:

S = SFU interface bandwidth / Physical interface bandwidth = (Number of SerDes links on

the SFU x Rate of each SerDes link x SerDes coding efficiency) / (Number of physical
interfaces x Rate of each physical interface)
A greater speedup factor indicates smaller internal block or egress block on the SFU.
Therefore, a great speedup factor usually indicates higher SFU performance. The SFU with a
better performance achieves more in the support for multicast and QoS operations.
In Figure 1-2, a router houses one LPU with 100 1GE interfaces and 4 SFUs. Each LPU is
connected to an SFU through nine SerDes links at the rate of 6.25 Gbit/s. The SerDes coding
efficiency is 0.8. Therefore the speedup factor is calculated in the following format:
S = (6.25G x 9 x 4 x 0.8) / (1G x 100) = 1.8
Figure 1-2 Speedup factor
Backplane
9x6.25G
Serdes
SFU0
LPU
GE0 SFU1
Packet Fabric
Processor Interface
GE99 SFU2
SFU3
A greater speedup factor not necessarily indicates higher SFU performance. Sometimes a great speedup
factor is calculated due to an over-simple switching algorithm.
1.2.4 Backup Mode of SFUs

SFUs are core components on routers. All LPUs connect to SFUs through the backplane. If an
SFU fails, service interruption occurs on all the LPUs connecting to the SFU. Therefore,
backing up SFUs is designed to improve router reliability. The common backup modes
include the master/slave mode and load balancing mode. In master/slave mode, when the
master SFU is working, the slave SFU does not work but stands by. Once the system detects
that the master SFU fails, the fault SFU becomes the slave SFU, and the original slave SFU
becomes the master SFU. The master/slave mode is also called the N:M mode. N indicates the
number of the master SFUs, and M indicates the number of the slave SFUs. The process of
switching between the master and slave SFUs is called the master/slave SFU switchover.
In load balancing mode, all SFUs work at the same time. If one SFU fails, the other SFUs
automatically take over the services on the faulty SFU to prevent service interruption. The
load balancing mode is also called the N+M mode. M indicates the maximum number of
faulty SFUs that the system can tolerate. N indicates the number of SFUs that can take over

the services on the faulty SFUs to prevent service interruption. If the number of faulty SFUs
exceeds M, the switch fabric performance of the system deteriorates.
1.2.5 Fabric Throughput

The fabric throughput is an indicator used to evaluate the capability of a cell-based SFU in
processing cells in one second or a packet-based SFU in processing packets in one second.
The fabric throughput, expressed in Million Cells per second (Mcps) or Million Packets per
second (Mpps), is greater than the router forwarding capability.
1.2.6 Fabric Latency

The fabric latency indicates the period from the time when an SFU applies for data exchange
to the time when the destination interface outputs the data. SFUs are public components on a
router, and the LPUs on the router have to compete for SFU resources. Therefore, the fabric
latency of the router refers to the sum of the periods on all LPUs. The scheduling algorithm of
the SFU will affect the system bandwidth allocation. Therefore, the fabric latency will affect
the QoS. The shorter the fabric latency, the higher the switch fabric performance.
1.2.7 Fabric Scalability

The number of interfaces on an SFU is usually recorded as N x M. N indicates the number of
inbound interfaces, and M indicates the number of outbound interfaces. Generally N is equal
to M on an SFU. The fabric scalability is generally evaluated in terms of the interface rate,
system capacity, and service scalability.
1.2.8 Unicast, Multicast, and Backpressure

Unicast
Unicast means that each data packet sent from a server can be forwarded to only one client.
An SFU transmits unicast traffic from an interface to the specified interface.
Multicast
Multicast allows a router to copy data packets to multiple channels. In multicast mode, a
server can forward one data packet to a large number of clients that request the data packet at
the same time. A data packet can be forwarded to any client to greatly reduce the number of
data packets transmitted on networks. Therefore, multicast improves network usage and
reduces transmission costs.
Multicast in chips on an SFU is also called spatial multicast. In spatial multicast mode, the
SFU copies one data packet from one interface to multiple interfaces based on the multicast
group ID.
Backpressure
Backpressure is a method of unidirectional flow control. By notifying the upstream interface
of traffic congestion on the downstream interface, backpressure prevents traffic congestion
from deteriorating. For example, interfaces A and B are communicating with each other. If
interface A detects traffic congestion in its memory, interface A sends a special data frame, a
backpressure frame, to interface B. After receiving the backpressure frame, interface B does
not send data packets to interface A until the memory resources of interface A are available
again. As a public component on a router, the SFU is prone to traffic congestion. Different

internal backpressure mechanisms are designed for SFUs with different implementations. In
addition, the backpressure mechanism applies to the upstream or downstream LPUs
connected to the SFU.
Backpressure cannot prevent traffic congestion, but functions as a response to traffic congestion.
Actually, when backpressure occurs on a router, traffic congestion has occurred on the router.
Backpressure is used to prevent traffic congestion from deteriorating and help the upstream interfaces to
process traffic based on the traffic status on the downstream interface. As an analogy, fever is a response
to virus invasion. When a man got a fever, the viruses, indeed, had invaded. Fever is suppressing viruses
by means of high temperature and helps a man to recover.
1.2.9 Switch Fabric Performance Indicators

The switch fabric performance indicators include the following:
Fabric throughput
Fabric latency
Fabric scalability (interface number, interface rate, and service types)
Speedup factor
Costs
Multi-stage switching
QoS scalability
1.3 Switch Fabric Classification

Switch fabrics can be classified based on different dimensions, such as based on the type of
packets sent to the switch fabric, location of the memory on the switch fabric, times of data
exchanges, and mode in which data packets pass the switch fabric.
1.3.1 Based on the Type of Packets Sent to the Switch Fabric

Based on the type of packets sent to the switch fabric, the switch fabric can be classified as
cell switch fabric and packet switch fabric.
Like ATM switch, cell switch allows data packets to be fragmented into cells of specific
length. After being added specific cell headers, the cells are sent to the switch fabric. The
switch fabric uses high-speed hardware units to process the cells of specific length and is
prevailing on routers. After the cells reach the outbound interface, the outbound interface
reassembles the cells into data packets.
Packet switch allows the switch fabric to forward the data packets with additional fields,
without fragmenting. Package switch is generally used for shared memory switch. Instead of
fragmenting or reassembling, the switch fabric directly forwards data packets with additional
fields. Package switch is generally applied to Ethernet switches.

Figure 1-3 Switch fabric classification based on the type of packets sent to the switch fabric
Cell switch
Based on the type
of packets sent to
the switch fabric
Packet switch
1.3.2 Based on the Location of the Memory on the Switch Fabric

The switch fabric is a public component on a router, and all LPUs have to compete for switch
fabric resources. The switch fabric uses a specific algorithm to schedule switch fabric
resources for LPUs. Data packets are cached on the inbound or outbound interface of a switch
fabric and wait to be scheduled. Based on the location of the memory on a switch fabric,
switch fabrics can be classified as output queuing (OQ) fabric, input queuing (IQ) fabric, and
combined input and output queuing (CIOQ) fabric.
OQ fabric
All cells are cached in the memory of the outbound interface on the SFU and wait to be
forwarded. On the OQ fabric, the memory of the outbound interface must be allocated
sufficient bandwidth so that the outbound interface can process the traffic burst if all
traffic is sent to the same outbound interface.
IQ fabric
All cells are cached in the memory of the inbound interface on the SFU. Once the cells
are scheduled, they are sent to the outbound interface for forwarding. The inbound
interfaces on the IQ fabric are prone to traffic congestion such as Head of Line (HOL).
Virtual output queuing: Cells destined for different outbound interfaces are put in different queues on the
inbound interfaces, protecting cells destined for different outbound interfaces against HOL. Virtual
output queuing is not a new caching method but an improvement for IQ.
CIOQ fabric
Cells are cached partly on the inbound interface and partly on the outbound interface of
an SFU. The CIOQ fabric resolves the HOL problem on the inbound interface and does
not require high for the memory of the outbound interface. Therefore, the CIOQ fabric is
widely used.

Figure 1-4 Based on the location of the memory on the switch fabric
Based on the location OQ

of the memory on the
switch fabric
IQ
CIOQ
1.3.3 Based on the Times of Data Exchanges

Based on the times of data exchanges, switch fabrics can be classified as single-stage switch
fabric and multi-stage switch fabric. The multi-stage switch fabric can be further classified
based on different implementations, as shown in Figure 1-5.
Figure 1-5 Based on the times of data exchanges

Shared
Memory
Based on the times of
data exchanges Single
switch
fabric Crossbar
Switch fabric Benes switch

classification
Two-stage
switch Multi-stage switch
with buffer
Clos switch
Multi-stage
Multi-stage switch
switch fabric
for dynamic routes
Multi-stage switch
without buffer Tandem Banyan
Three-or-more-
stage switch
Multi-stage switch
for static routes
1.3.4 Based on the Mode in Which Data Packets Pass the Switch
Fabric
Based on the mode in which data packets pass the switch fabric, switch fabrics can be
classified as cut through switching fabric and store and forward switching fabric.
The cut through switching fabric does not wait for the completion of receiving data packets
(cells of specific length or packets with additional fields) but sends received data packets to
the outbound interface. Theoretically, the cut through switching fabric boasts high forwarding
rate and low switch latency.

The store and forward switching fabric sends the received data packets to the outbound
interface only after verifying these data packets. The store and forward switching fabric
boasts excellent fault tolerance performance.
The cut through switching fabric has defects in interface rate adaptation and fault tolerance.
Therefore, the store and forward switching fabric is more commonly used.
1.4 History and Trend of Switch Fabric

1.4.1 Shared Bus Switch (1st Generation)
The shared bus switch, the first-generation switch architecture, is developed based on the
shared bus technology in the computing industry. All inbound and outbound interfaces are
located on the same shared bus, and a specific algorithm is used to allow only one pair of
inbound and outbound interfaces at one time to use the bus for communication. To be specific,
the inbound and outbound interfaces apply for the use of the bus, and the CPU allocates the
resources to interfaces to prevent conflicts.Figure 1-6 shows the shared bus switch
architecture.
Figure 1-6 Shared bus switch architecture
Routing
CPU
table unit
Shared bus
LPU 1 LPU 2 LPU N
In shared bus switch, non-blocking switch means that the sum of all interface bandwidth must
be smaller than the shared bus bandwidth. In other words, the switch performance of a router
is determined by the shared bus bandwidth. In addition, the switch performance of a router is
affected by the CPU capability. In Figure 1-6, when LPU 1 is communicating with LPU N,

shared bus resources are occupied. As a result, LPU 2 cannot communicate with other LPUs.
Therefore, the router performance is determined by the shared bus capacity.
The 1st generation router generally uses the shared bus switch technology, such as the Huawei
NE16E.
1.4.2 Shared Memory Switch (2nd Generation)

With increasing requirements for interface bandwidth, the shared bus switch technology fails
to meet such requirements. Firstly, the shared bus switch technology cannot resolve the
internal conflicts. Secondly, the load of the shared bus increases the design difficulty. In
1990s, a shared-memory-based switch architecture is burgeoning.
The shared-memory-based switch architecture allows a separate memory controller to control
access to each memory. The memory controller stores data in the memory of the inbound
interface on the SFU based on a specific algorithm, and instructs the outbound interface to
obtain data in a specific memory. A data switch is thus complete. In the shared memory
switch, a data switch consists of a memory write operation and a memory read operation.
Figure 1-7 shows a typical shared memory switch architecture. The system houses K memory
controllers, each of which controls a memory. Each memory controller connects to all the
inbound and outbound interfaces. After an inbound interface receives data, the memory
controller writes the data into the shared memory and reads the data from the shared memory
if the destination outbound interface is available, and then sends the data to the outbound
interface. This is a data switch process.
Figure 1-7 Shared memory switch architecture
Memory
Inbound Memory Outbound

interface controller interface
Memory
Memory
controller
Inbound Outbound
interface interface
Figure 1-8 shows a typical shared memory router architecture. After LPU 1 receives data, the
distributed memory management unit writes the data into the distributed memory. Based on
the result of searching the routing table for a destination outbound interface, LPU 1 sends the
data to LPU N.

Figure 1-8 Shared memory router architecture
Data receiving Data sending
LPU 1 LPU 2 LPU N
Distributed Distributed Distributed

memory memory memory
Distributed memory
Backplane
management unit
Routing
table
For a non-blocking shared memory switch architecture, the bandwidth for writing data into
the memory must be greater than the sum bandwidth of all inbound interfaces, and the
bandwidth for reading data from the memory must be greater than the sum bandwidth of all
outbound interfaces.
1.4.3 Crossbar Switch (3rd Generation)

At present the crossbar switch technology is widely used on the core routers in the industry.
Single-Stage Crossbar Switch fabric

The crossbar switch uses a type of switch to connect N inbound interfaces to N outbound
interfaces. The crossbar controller checks the status of queues on the inbound interface, and
determines the connection between inbound interfaces and outbound interfaces in each
scheduling period, and enables or disables the switch to control the access of inbound
interfaces to the outbound interfaces. The crossbar controller is a core component on the
crossbar switch, and the processing rate of the crossbar controller determines the crossbar
switch performance. The crossbar switch can also connect one inbound interface to multiple
outbound interfaces. Therefore, multicast is easily implemented in this manner.
Crossbar switch advantages: Compared with other switch architectures, the crossbar switch
can concurrently transmit N cells within a cell period and boasts high throughput. In addition,
the cable routing is rather simple. Therefore, the crossbar switch is widely applied.
Crossbar switch disadvantages: The crossbar controller complexity increases with the
interface number. When the number of inbound interfaces is smaller than or equal to 64, the

crossbar switch is a good choice. With the number of inbound interfaces increases, the
crossbar controller complexity grows in proportion to N.
Figure 1-9 Single-stage crossbar switch
Control
signals: Cross-point for N x N
control switch matrix
application
and
authorization
Switch matrix
reconfiguration
Switch matrix controller
Crossbar controller
The crossbar switch technology is generally used on single-stage switch fabrics, especially in
a single chassis, such as the Huawei NE80E.
Multi-stage Crossbar Switch fabric

The multi-stage crossbar switch consists of multiple single-stage crossbar switches. Each
switch has its own inbound and outbound interfaces and provides connections between
inbound and outbound interfaces. Multiple single-stage crossbar switches are connected to
form a large-scale multi-stage crossbar switch. Figure 1-10 shows a common multi-stage
switch fabric that houses Am inbound interfaces and Bm outbound interfaces.

Figure 1-10 Multi-stage crossbar switch
Inter-stage Inter-stage Inter-stage

connection connection connection
Stage 1 Stage 2 Stage N
The single-stage crossbar switch is simple in design and reduces costs, but cannot meet
requirements for next-generation Internet expansion. The multi-stage crossbar switch is
complex in operation, but supports thousands of interfaces. The multi-stage crossbar is
necessary for multi-chassis routers.
Different single-stage crossbar switch fabrics and inter-stage connections mark different
multi-stage crossbar switch fabrics. The most common multi-stage crossbar switch fabrics are
Benes and Clos, both of which are named after the inventors.
Benes switch fabric
The Benes switch fabric was invented by Benes in 1964. In the Benes switch architecture,
each single-stage switch fabric uses the N x N matrix, and N/d connections are provided
between inbound and outbound interfaces. The middle stage can provide nonstop
maintenance. The Benes switch fabric, however, cannot ensure that cells are transmitted
in sequence. Therefore, additional packet sequence controlling operations are required.
On a common 3-stage Benes switch fabric, N = d. The first stage fragments packets into
cells, and the second and third stages send the cells to the destination interfaces.

Figure 1-11 Multi-stage Benes switch
Each switch unit uses the d x d matrix
Clos switch fabric

The Clos switch fabric was invented by Charles Clos in 1953. Charles Clos uses
mathematics to prove important Clos switch features, strictly non-blocking and
rearrangeably non-blocking. Unlike the Benes switch fabric, the Clos switch fabric uses
square switch units in stage 2, and uses non-square switch units in stages 1 and 3. Non-
square switch indicates that the number of inbound interfaces can be different from that
of outbound interfaces. To create a switch with a specific capacity, the Clos switch needs
fewer cross-points than the crossbar switch and boasts better scalability. For example, the
100 x 100 non-blocking crossbar switch needs 10000 cross-points, whereas the Clos
switch needs only 5700 cross-points. With the growth of networks, the Clos switch
reduces more cross-points. Therefore, the Clos switch is widely applied.

Figure 1-12 Multi-stage Clos switch fabric
Plane
algorithm
Non-square Square
switch matrix switch
matrix
The Clos switch fabric is applied to Huawei NE5000Es.
1.4.4 Trend of Switch Fabric

With increasingly higher requirements of network services for physical interface capacity,
routers also require upgrade of SFU interfaces as follows:
The rate of each interface is upgraded from 3.125 Gbit/s to 6.25 Gbit/s, and then from
6.25 Gbit/s to 10 Gbit/s, and even higher.
The number of interfaces is extended from 16 to 64, and even much more.
Various switch fabric technologies are combined and developed.
For example, the shared memory switch and crossbar switch are combined to develop
various new crossbar switch with internal memories.
The crossbar switch without internal memories includes the Tandem Banyan. The crossbar switch
without internal memories is equipped with a small number of logics, but requires a large number of
intra-chip or inter-chip connections if it needs to achieve the same performance as that of the crossbar
switch with internal memories.

Figure 1-13 Trend of switch fabric

Single-stage switching
NE40E/NE80E
Single-stage multi-plane Extend the interface rate

Single-plane switching
interface quantity
Extend the
switching
Single-stage
Multi-plane
single-plane
switching
switching
Multi-stage multi-plane
switching
NE5000E Multi-stage switching
1.5 Introduction to Huawei Switch Fabric

Specific SFUs are used on Huawei routers. The SFUs work in N+M load balancing mode.
Before being sent to the SFU, the Fabric Interface Controller (FIC) fragments the packets into
cells.
1.5.1 NE80E/NE40E SFU

The NE80E/NE40E uses the single-stage multi-plane crossbar architecture. Crossbar is
implemented on the SFU, and FIC is implemented on the LPU. The switch fabric consists of
eight planes, which are located on four SFUs. A data packet passes an SFU as follows:
1. After an IP data packet reaches a physical interface on an LPU, the FIC on the LPU
fragments the packet into cells and then caches the cells in the memory and sends them
to the crossbar switch unit on the SFU.
Each FIC is connected to all switch planes to ensure that cells can be evenly allocated to each switch
plane. This not only facilitates load balancing but also is conducive to system fault tolerance.
2. After the cells reach the crossbar unit, the crossbar schedules the cells to the outbound
interfaces of the SFU and sends them to the FIC on the LPU. Then the cell switch is
complete.
3. After the cells reach the FIC, the FIC reassembles the cells into IP data packets and
sends them to the outbound interface on the LPU. Then the single-stage IP data packet
switch is complete on the router.

On the NE80E/NE40E, the FIC is a chip on an LPU.

The NE40E-8 houses only two separate SFUs, and the other two SFUs are located on two switch and
routing units (SRUs). Therefore, altogether four SFUs that work in 3+1 load balancing mode are
located on the NE40E-8.
Figure 1-14 NE40E/NE80E SFU architecture
1.5.2 NE5000E SFU

The NE5000E uses the CIOQ fabric with an internal memory. A single NE5000E and the
NE5000E CCC-0 system use the single-stage multi-plane switch fabric. The NE5000E CCC-1
and CCC-2 systems use the multi-stage multi-plane switch fabric, as shown in Figure 1-15. A
3-stage switch fabric, with the switch units named S1, S2, and S3, processes data packets as
follows:

Figure 1-15 NE5000E switch fabric architecture
1. After a data packet reaches a physical interface of the LPU on the CLC A, the data
processing unit on the LPU processes the packet and sends it to the traffic manager (TM).
The TM fragments the packet into cells. After being cached and scheduled in queues, the
cells are sent to S1 (SFU on the CLC A).
Each TM is connected to the switch planes through one or more connections so that cells can be evenly
allocated to various switch planes.
2. After cells reach S1, the switch fabric evenly allocates the cells to S2 (SFU on the CCC).
S2 sends the cells to the SFU on the destination CLC, S3 on CLC B. After receiving the
cells, S3 sends the cells to the destination LPU.
The principle of processing data packets on the single-stage switch is similar to that on the multi-stage
switch. The multi-stage switch is easier to establish a large-capacity switching network and improve the
switching performance of the system.
3. After the cells reach the TM on the destination LPU, the TM reassembles the cells into
IP data packets and sends them to the outbound interface on the LPU. Then the 3-stage
IP data packet switch is complete on the router.
1.6 FAQ
1.6.1 How to Calculate the LPU Capacity Based on the SFU
Capacity?
Generally the backplane capacity of a router is greater than the SFU capacity, and the SFU
capacity is greater than the LPU capacity. The ratio of the SFU capacity to the LPU capacity
is the speedup factor.

To reduce costs and improve system scalability, the backplane capacity is generally planned
quite large for further LPU expansion.
Usually the SFU capacity is barely enough for the current LPU specifications and further LPU
expansion. For example, if the NE40E-8 houses two SFUDs and two SRUs, the SFU capacity
is calculated as 327.68 Gbit/s in the format of 4 (number of SFUs) x 8 (number of interfaces
on each SFU) x 4 (number of SerDes links on each interface) x 3.2 Gbit/s (rate of each SerDes
link) x 0.8 (SerDes coding efficiency). The capacity allocated to each LPU is 40.96 Gbit/s in
the format of 327.68G / 8. Use the speedup factor 2 as an example. Each slot of the SFU on
the NE40E-8 supports the LPU with a maximum of 20G capacity. For example, if the NE40E-
8 houses four SFUGs, the SFU capacity is calculated as 1.31 Tbit/s in the format of 4 (number
of SFUs) x 16 (number of interfaces on each SFU) x 8 (number of SerDes links on each
interface) x 3.2 Gbit/s (rate of each SerDes link) x 0.8 (SerDes coding efficiency). The
capacity allocated to each LPU is 81.92 Gbit/s in the format of 1.31T / 16. Use the speedup
factor 2 as an example. Each slot of the SFU on the NE40E-8 supports the LPU with a
maximum of 40G capacity.
The capacity generally refers to the bidirectional capacity. For example, the receiving and sending
rates of a common GE interface are respectively 1 Gbit/s. Therefore, the GE interface capacity is 2
Gbit/s in terms of the bidirectional capacity and is 1 Gbit/s in terms of the unidirectional capacity. If
not specially described, the capacity in this document refers to the unidirectional capacity.
If the unidirectional capacity is to be calculated, the backplane, SFU, and LPU capacity that will be
used in calculation must be unidirectional. If the bidirectional capacity is to be calculated, the
backplane, SFU, and LPU capacity that will be used in calculation must be bidirectional. The
capacity standard must be consistent.
1.6.2 Can the SFUs on the NE40E, NE80E, and NE5000E Be

Installed on the Same Device?
The SFUs on the NE80E and NE5000E are almost the same in appearance. As the previous
description says, however, the switch architectures of the NE40E, NE80E, and NE5000E are
different, and the SFUs on the three models cannot be installed on the same device.
1.6.3 Can the SFUs not Be Fully Configured on an

NE40E/NE80E/NE5000E?
The SFUs on an NE40E-8/NE80E/NE5000E work in 3+1 load balancing mode. If one SFU
fails or is removed, the other three SFUs automatically take over the services on the faulty
SFU. If two or more SFUs fail or are removed, the system can still work, but the switching
performance deteriorates. Therefore, fully configuring four SFUs in 3+1 load balancing mode
is recommended on an NE40E-8/NE80E/NE5000E.
The SFUs on an NE40E-X8 work in 2+1 load balancing mode.
1.6.4 Do the SFUs on an NE40E/NE80E/NE5000E Support Hot

Swapping?
The SFUs on an NE40E/NE80E/NE5000E support hot swapping in terms of both software
and hardware designs. To ensure the system reliability when an SFU is removed and installed
and reduce the packet loss during switching, an OFFLINE design is added on the SFU. Before
removing an SFU, press the OFFLINE button to instruct the system to start necessary
reliability mechanism. After the OFFLINE indicator comes on, you can remove the SFU.

1.6.5 Does an NE40E-X3 house SFUs?

An NE40E-X3 uses the full-mesh architecture and houses no separate SFUs. The LPUs and
subcards on an NE40E-X3 are fully meshed through high-speed buses. Due to full-mesh
architecture limitations, the routers in full-mesh mode support a limited number of LPUs.
1.7 Document Change History

Version Published Date Change History
01 2013-09-12 Initial Release.
02 2014-08-26 5.76Gbps is corrected to 5.76Tbps.

Basic Router Hardware Concept - Switch Fabric (V2.0)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basic Router Hardware Concept - Switch Fabric (V2.0)

Uploaded by

Copyright:

Available Formats

Basic Router Hardware Concept - Switch Fabric Contents

Switch Fabric ..................................................................................................................................... 3

2014-08-26 Huawei confidential. No spreading without permission. Page 1 of 22

1.7 Document Change History ............................................................................................................................. 22

2014-08-26 Huawei confidential. No spreading without permission. Page 2 of 22

2014-08-26 Huawei confidential. No spreading without permission. Page 3 of 22

History and trend of switch fabric

1.2 Switch Fabric Indicators

1.2.1 Backplane Capacity

2014-08-26 Huawei confidential. No spreading without permission. Page 4 of 22

Figure 1-1 Backplane capacity/switching capacity

1.2.2 Switching Capacity

1.2.3 Speedup Factor

2014-08-26 Huawei confidential. No spreading without permission. Page 5 of 22

S = SFU interface bandwidth / Physical interface bandwidth = (Number of SerDes links on

Figure 1-2 Speedup factor

1.2.4 Backup Mode of SFUs

2014-08-26 Huawei confidential. No spreading without permission. Page 6 of 22

1.2.5 Fabric Throughput

1.2.6 Fabric Latency

1.2.7 Fabric Scalability

1.2.8 Unicast, Multicast, and Backpressure

2014-08-26 Huawei confidential. No spreading without permission. Page 7 of 22

1.2.9 Switch Fabric Performance Indicators

1.3 Switch Fabric Classification

1.3.1 Based on the Type of Packets Sent to the Switch Fabric

2014-08-26 Huawei confidential. No spreading without permission. Page 8 of 22

1.3.2 Based on the Location of the Memory on the Switch Fabric

2014-08-26 Huawei confidential. No spreading without permission. Page 9 of 22

Based on the location OQ

1.3.3 Based on the Times of Data Exchanges

Figure 1-5 Based on the times of data exchanges

Switch fabric Benes switch

2014-08-26 Huawei confidential. No spreading without permission. Page 10 of 22

1.4 History and Trend of Switch Fabric

Figure 1-6 Shared bus switch architecture

LPU 1 LPU 2 LPU N

2014-08-26 Huawei confidential. No spreading without permission. Page 11 of 22

1.4.2 Shared Memory Switch (2nd Generation)

Figure 1-7 Shared memory switch architecture

Inbound Memory Outbound

2014-08-26 Huawei confidential. No spreading without permission. Page 12 of 22

Figure 1-8 Shared memory router architecture

Data receiving Data sending

LPU 1 LPU 2 LPU N

Distributed Distributed Distributed

1.4.3 Crossbar Switch (3rd Generation)

Single-Stage Crossbar Switch fabric

2014-08-26 Huawei confidential. No spreading without permission. Page 13 of 22

Figure 1-9 Single-stage crossbar switch

Multi-stage Crossbar Switch fabric

2014-08-26 Huawei confidential. No spreading without permission. Page 14 of 22

Figure 1-10 Multi-stage crossbar switch

Inter-stage Inter-stage Inter-stage

Stage 1 Stage 2 Stage N

2014-08-26 Huawei confidential. No spreading without permission. Page 15 of 22

Figure 1-11 Multi-stage Benes switch

Each switch unit uses the d x d matrix

Clos switch fabric

2014-08-26 Huawei confidential. No spreading without permission. Page 16 of 22

Figure 1-12 Multi-stage Clos switch fabric

The Clos switch fabric is applied to Huawei NE5000Es.

1.4.4 Trend of Switch Fabric