Professional Documents
Culture Documents
V800R010C00
Issue 02
Date 2018-06-20
and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.
All other trademarks and trade names mentioned in this document are the property of their respective
holders.
Notice
The purchased products, services and features are stipulated by the contract made between Huawei and the
customer. All or part of the products, services and features described in this document may not be within the
purchase scope or the usage scope. Unless otherwise specified in the contract, all statements, information,
and recommendations in this document are provided "AS IS" without warranties, guarantees or
representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.
Website: http://www.huawei.com
Email: support@huawei.com
Contents
3 LMSP..............................................................................................................................................46
3.1 Introduction.................................................................................................................................................................. 46
3.2 Principles...................................................................................................................................................................... 47
4 MPLS OAM.................................................................................................................................. 62
4.1 MPLS OAM Introduction.............................................................................................................................................62
4.2 MPLS OAM Principles................................................................................................................................................ 64
4.2.1 Basic Detection..........................................................................................................................................................64
4.2.2 Auto Protocol.............................................................................................................................................................68
4.3 MPLS OAM Applications............................................................................................................................................ 69
4.3.1 MPLS OAM Application in the IP RAN Layer 2 to Edge Scenario......................................................................... 69
4.3.2 Application of MPLS OAM in VPLS Networking................................................................................................... 70
4.4 MPLS OAM Terms and Abbreviations........................................................................................................................ 71
5 MPLS-TP OAM............................................................................................................................ 73
5.1 Introduction.................................................................................................................................................................. 73
5.2 Principles...................................................................................................................................................................... 76
5.2.1 Basic Concepts.......................................................................................................................................................... 76
5.2.2 Continuity Check and Connectivity Verification.......................................................................................................78
5.2.3 Packet Loss Measurement......................................................................................................................................... 79
5.2.4 Frame Delay Measurement........................................................................................................................................81
5.2.5 Remote Defect Indication.......................................................................................................................................... 83
5.2.6 Loopback................................................................................................................................................................... 84
5.3 Applications..................................................................................................................................................................85
5.3.1 MPLS-TP OAM Application in the IP RAN Layer 2 to Edge Scenario................................................................... 85
5.3.2 Application of MPLS-TP OAM in VPLS Networking............................................................................................. 86
5.4 Terms and Abbreviations..............................................................................................................................................87
6 VRRP..............................................................................................................................................89
6.1 VRRP Introduction....................................................................................................................................................... 89
6.2 Principles...................................................................................................................................................................... 93
6.2.1 Basic VRRP Concepts............................................................................................................................................... 93
6.2.2 VRRP Packets............................................................................................................................................................94
6.2.3 VRRP Operating Principles.......................................................................................................................................97
6.2.4 Basic VRRP Functions............................................................................................................................................ 102
6.2.5 mVRRP....................................................................................................................................................................105
6.2.6 Association Between VRRP and a VRRP-disabled Interface................................................................................. 107
6.2.7 VRRP Tracking an Interface Monitoring Group..................................................................................................... 108
6.2.8 BFD for VRRP.........................................................................................................................................................110
Purpose
This document describes the network reliability feature in terms of its overview, principles,
and applications.
Related Version
The following table lists the product version related to this document.
U2000 V200R017C60
eSight V300R009C00
Intended Audience
This document is intended for:
l Network planning engineers
l Commissioning engineers
l Data configuration engineers
l System maintenance engineers
Security Declaration
l Encryption algorithm declaration
The encryption algorithms DES/3DES/RSA (RSA-1024 or lower)/MD5 (in digital
signature scenarios and password encryption)/SHA1 (in digital signature scenarios) have
a low security, which may bring security risks. If protocols allowed, using more secure
encryption algorithms, such as AES/RSA (RSA-2048 or higher)/SHA2/HMAC-SHA2 is
recommended.
l Password configuration declaration
– Do not set both the start and end characters of a password to "%^%#". This causes
the password to be displayed directly in the configuration file.
– To further improve device security, periodically change the password.
l Personal data declaration
Your purchased products, services, or features may use users' some personal data during
service operation or fault locating. You must define user privacy policies in compliance
with local laws and take proper measures to fully protect personal data.
l Feature declaration
– The NetStream feature may be used to analyze the communication information of
terminal customers for network traffic statistics and management purposes. Before
enabling the NetStream feature, ensure that it is performed within the boundaries
permitted by applicable laws and regulations. Effective measures must be taken to
ensure that information is securely protected.
– The mirroring feature may be used to analyze the communication information of
terminal customers for a maintenance purpose. Before enabling the mirroring
function, ensure that it is performed within the boundaries permitted by applicable
laws and regulations. Effective measures must be taken to ensure that information is
securely protected.
– The packet header obtaining feature may be used to collect or store some
communication information about specific customers for transmission fault and
error detection purposes. Huawei cannot offer services to collect or store this
information unilaterally. Before enabling the function, ensure that it is performed
within the boundaries permitted by applicable laws and regulations. Effective
measures must be taken to ensure that information is securely protected.
l Reliability design declaration
Network planning and site design must comply with reliability design principles and
provide device- and solution-level protection. Device-level protection includes planning
principles of dual-network and inter-board dual-link to avoid single point or single link
of failure. Solution-level protection refers to a fast convergence mechanism, such as FRR
and VRRP.
Special Declaration
l This document serves only as a guide. The content is written based on device
information gathered under lab conditions. The content provided by this document is
intended to be taken as general guidance, and does not cover all scenarios. The content
provided by this document may be different from the information on user device
interfaces due to factors such as version upgrades and differences in device models,
board restrictions, and configuration files. The actual user device information takes
precedence over the content provided by this document. The preceding differences are
beyond the scope of this document.
l The maximum values provided in this document are obtained in specific lab
environments (for example, only a certain type of board or protocol is configured on a
tested device). The actually obtained maximum values may be different from the
maximum values provided in this document due to factors such as differences in
hardware configurations and carried services.
l Interface numbers used in this document are examples. Use the existing interface
numbers on devices for configuration.
l The pictures of hardware in this document are for reference only.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol Description
Change History
Updates between document issues are cumulative. Therefore, the latest document issue
contains all updates made in previous issues.
l Changes in Issue 03 (2018-04-10)
This issue is the third official release. The software version of this issue is
V800R010C00SPC200.
l Changes in Issue 02 (2018-02-28)
This issue is the second official release. The software version of this issue is
V800R010C00SPC200.
l Changes in Issue 01 (2017-11-30)
This issue is the first official release. The software version of this issue is
V800R010C00SPC100.
2 BFD
Purpose
To minimize the impact of device faults on services and improve network availability, a
network device must be able to quickly detect faults in communication with adjacent devices.
Measures can then be taken to promptly rectify the faults to ensure service continuity.
On a live network, link faults can be detected using either of the following mechanisms:
l Hardware detection: For example, the Synchronous Digital Hierarchy (SDH) alarm
function can be used to quickly detect link faults.
l Hello detection: If hardware detection is unavailable, Hello detection can be used to
detect link faults.
However, the two mechanisms have the following issues:
l Only certain media support hardware detection.
l Hello detection takes more than 1 second to detect a fault. When traffic is transmitted at
gigabit rates, such slow detection causes packet loss.
l On a Layer 3 network, the Hello packet detection mechanism cannot detect faults for all
routes, such as static routes.
Benefits
BFD offers the following benefits:
l BFD rapidly monitors link or IP route connectivity to improve network performance.
l Adjacent systems running BFD rapidly detect communication failures and establish a
backup channel to restore communications, which improves network reliability.
l Asynchronous mode: a major BFD detection mode. In this mode, both systems
periodically send BFD control packets to each other. If one system fails to receive BFD
control packets consecutively, the system considers the BFD session Down.
The echo function is used for two modes. When the echo function is activated, the local
system sends a BFD control packet and the remote system loops back the packet through the
forwarding channel. If several consecutive echo packets are not received, the session is
declared to be Down.
Static mode BFD session parameters, such as the local and remote
discriminators, are manually configured and delivered for BFD
session establishment.
NOTE
In static mode, configure unique local and remote discriminators for each
BFD session. This mode prevents incorrect discriminators from affecting
BFD sessions that have correct discriminators and prevents BFD sessions
from alternating between Up and Down.
l Down: A BFD session is in the Down state or a request has been sent.
l Init: The local end can communicate with the remote end, and the local end expects the
BFD session to go Up.
l Up: A BFD session is successfully established.
l AdminDown: A BFD session is in the AdminDown state.
Session status changes are transmitted using the State field carried in a BFD control packet.
The system changes its session status based on the local session status and received remote
session status from the peer system.
When a BFD session is to be established or deleted, the BFD state machine implements a
three-way handshake to ensure that the two systems detect the status change.
Figure 2-1 shows the status change process of the state machine during the establishment of a
BFD session.
Device A Device B
1. BFD configured on both Device A and Device B independently starts state machines.
The initial status of BFD state machines is Down. Device A and Device B send BFD
control packets with the State field set to Down. If BFD sessions are established in static
mode, the value of Your Discriminator in BFD control packets is manually specified. If
BFD sessions are established in dynamic mode, the value of Your Discriminator is set to
0.
2. After receiving a BFD control packet with the State field set to Down, Device B switches
the session status to Init and sends a BFD control packet with the State field set to Init.
NOTE
After the local BFD session status of Device B changes to Init, Device B no longer processes the
received BFD control packets with the State field set to Down.
3. The BFD session status change of Device A is the same as that of Device B.
4. After receiving a BFD control packet with the State field set to Init, Device B changes
the local session status to Up.
5. The BFD session status change of Device A is the same as that of Device B.
BFD for IP detects single- and multi-hop IPv4 and IPv6 links:
l Single-hop BFD checks the IP continuity between directly connected systems. The single
hop refers to a hop on an IP link. Single-hop BFD allows only one BFD session to be
established for a specified data protocol on a specified interface.
l Multi-hop BFD detects all paths between two systems. Each path may contain multiple
hops, and these paths may partially overlap.
If1 If1
10.1.1.1/25 10.1.1.2/25
Device A Device B
BFD session
Typical application 2:
As shown in Figure 2-3, BFD monitors the multi-hop IPv4 path between Device A and
Device C, and BFD sessions are bound only to peer IP addresses.
BFD session
Device A Device B
If1 If1
2001::1/64 2001::2/64
BFD session
Typical application 4:
As shown in Figure 2-5, BFD monitors the multi-hop IPv6 path between Device A and
Device C, and BFD sessions are bound only to peer IP addresses.
BFD session
In BFD for IP scenarios, BFD for PST is configured on a device. If a link fault occurs, BFD
detects the fault and triggers the PST to go Down. If the device restarts and the link fault
persists, BFD is in the AdminDown state and does not notify the PST of BFD Down. As a
result, the PST is not triggered to go Down and the interface bound to BFD is still Up.
After multicast BFD is configured, multicast BFD packets are sent using the IP layer. If the
link is reachable, the remote interface receives the multicast BFD packets and forwards them
to the BFD module. In this manner, the BFD module detects that the link is normal. If
multicast BFD packets are sent over a trunk member link, they are delivered to the data link
layer for link continuity check. The remote IP address used in a multicast BFD session is the
default known multicast IP address (224.0.0.107 to 224.0.0.250). Any packet with the default
known multicast IP address is sent to the BFD module for IP forwarding.
Usage Scenario
If1 If1
Device A Device B
BFD session
As shown in Figure 2-6, multicast BFD is configured on both Device A and Device B. BFD
sessions are bound to the outbound interface If1, and the default multicast address is used.
After the configuration is complete, multicast BFD quickly checks the continuity of the link
between interfaces.
Usage Scenario
BFD session
If1 If1
Device A Device B
BFD session
In Figure 2-7, a BFD session is established between Device A and Device B, and the default
multicast address is used to check the continuity of the single-hop link connected to the
interface If1. After BFD for PIS is configured and BFD detects a link fault, BFD immediately
sends a message indicating the Down state to the associated interface. The interface then
enters the BFD Down state.
Eth-Trunk Eth-Trunk
BFD sub-session 1
BFD sub-session 2
BFD sub-session 3
On the network shown in Figure 2-8, a BFD for link-bundle session consists of one main
session and multiple sub-sessions.
l Each sub-session independently monitors an Eth-Trunk member interface and reports the
monitoring results to the main session. Each sub-session uses the same monitoring
parameters as the main session.
l The main session creates a BFD sub-session for each Eth-Trunk member interface,
summarizes the sub-session monitoring results, and determines the status of the Eth-
Trunk.
– The main session is Up so long as a sub-session is Up.
– If no sub-session is available, the main session goes Down and the Unknown state
is reported to applications. The status of the Eth-Trunk port is not changed.
– If the Eth-Trunk has only one member interface and the corresponding sub-session
is Up, the main session goes Down when the member interface exits the Eth-Trunk.
The status of the Eth-Trunk is Up.
The main session's local discriminator is allocated from the range from 0x00100000 to
0x00103fff without occupying the original BFD session discriminator range. The main
session does not learn the remote discriminator because it does not send or receive packets. A
sub-session's local discriminator is allocated from the original dynamic BFD session
discriminator range using the same algorithm as a dynamic BFD session.
Only sub-sessions consume BFD session resources per board. A sub-session must select the
board on which the physical member interface bound to this sub-session resides as a state
machine board. If no BFD session resources are available on the board, board selection fails.
In this situation, the sub-session's status is not used to determine the main session's status.
Port1 Port1
10.1.1.1/25 10.1.1.2/25
Device A Device B
Echo Passive Echo
The process of establishing a passive BFD echo session as shown in Figure 2-9 is as follows:
1. Device B functions as a BFD session initiator and sends an asynchronous BFD packet to
Device A. The Required Min Echo RX Interval field carried in the packet is a nonzero
value, which specifies that Device A must support BFD echo.
2. After receiving the packet, Device A finds that the value of the Required Min Echo RX
Interval field carried in the packet is a nonzero value. If Device A has passive BFD echo
enabled, it checks whether any ACL that restricts passive BFD echo is referenced. If an
ACL is referenced, only BFD sessions that match specific ACL rules can enter the
asynchronous echo mode. If no ACL is referenced, BFD sessions immediately enter the
asynchronous echo mode.
3. Device B periodically sends BFD echo packets, and Device A sends BFD echo packets
(the source and destination IP addresses are the local IP address, and the destination
physical address is Device B's physical address) at the interval specified by the Required
Min RX Interval field. Both Device A and Device B start a receive timer, with a receive
interval that is the same as the interval at which they each send BFD echo packets.
4. After Device A and Device B receive BFD echo packets from each other, they
immediately loop back the packets at the forwarding layer. Device A and Device B also
send asynchronous BFD packets to each other at an interval that is much less than that
for sending echo packets.
Similarities and Differences Between Passive BFD Echo and One-Arm BFD Echo
To ensure that passive BFD echo or one-arm BFD echo can take effect, disable strict URPF
on devices that send BFD echo packets.
Strict URPF prevents attacks that use spoofed source IP addresses. If strict URPF is enabled
on a device, the device obtains the source IP address and inbound interface of a packet and
searches the forwarding table for an entry with the destination IP address set to the source IP
address of the packet. The device then checks whether the outbound interface for the entry
matches the inbound interface. If they do not match, the device considers the source IP
address invalid and discards the packet. After a device enabled with strict URPF receives a
BFD echo packet that is looped back, it checks the source IP address of the packet. As the
source IP address of the echo packet is a local IP address of the device, the packet is sent to
the platform without being forwarded at the lower layer. As a result, the device considers the
packet invalid and discards it.
Table 2-3 Differences between BFD echo sessions and common static single-hop sessions
BFD Suppor Session Descripto Negotiation IP Header
Session ted IP Type r Prerequisite
Type
– If a single-hop BFD session is established and the session is bound to a board that is
BFD-incapable in hardware but BFD-capable in software, the BFD session can be
processed by this board.
l Integrated mode
If single-hop BFD sessions are established and the sessions are bound to boards that are
BFD-incapable in hardware but BFD-capable in software, the sessions will be distributed
to the two load-balancing integrated boards. The load-balancing integrated board with
more available BFD resources will be preferentially selected.
NOTE
Boards that are BFD-incapable in hardware but BFD-capable in software are selected in the following
conditions:
l Boards that are BFD-capable in hardware are unavailable.
l The integrated mode is not configured, and BFD for IP sessions bound to a physical interface or its
sub-interfaces are single-hops.
If boards that are BFD-incapable in hardware but BFD-capable in software are already selected and the
integrated mode is configured, sessions will enter the AdminDown state and then be bound to an
integrated board.
Table 2-4 describes the board selection rules for BFD sessions.
Multi-hop session The board with the interface that receives BFD
negotiation packets is preferentially selected. If
the board does not have available BFD
resources, a load-balancing integrated board
will be selected. If no load-balancing integrated
board is available, board selection fails.
Single-hop session bound to a physical l If the board on which the bound interface or
interface or its sub-interfaces sub-interfaces reside is BFD-capable in
hardware, this board is selected. If the board
does not have available BFD resources,
board selection fails.
l If the board on which the bound interface or
sub-interfaces reside is BFD-incapable in
hardware but BFD-capable in software and
the integrated mode is configured, a load-
balancing integrated board will be selected.
If no load-balancing integrated board is
available, board selection fails.
l If the board on which the bound interface or
sub-interfaces reside is BFD-incapable in
hardware but BFD-capable in software, the
board is still selected. If the board does not
have available BFD resources, board
selection fails.
Single-hop session bound to a trunk A board is selected from the boards on which
interface trunk member interfaces reside. If none of the
boards has available BFD resources, board
selection fails.
l If none of these boards is BFD-incapable in
hardware, a specified integrated board will
be selected based on load balancing.
l If any of these boards are BFD-capable in
hardware, and the others are BFD-incapable
in hardware, a specified integrated board
will be selected. If board selection fails, a
board is selected from those that are BFD-
capable in hardware.
l If all of these boards are BFD-capable in
hardware, one will be selected based on
load balancing.
BFD for VLANIF session The board with the interface that receives BFD
negotiation packets is selected. If the board
does not have available BFD resources, board
selection fails.
NOTE
For details about BFD, see the HUAWEI NetEngine40E Universal Service Router Feature Description -
Reliability.
Table 2-5 Differences before and after BFD for RIP is configured
Item Link Fault Detection Mechanism Convergence
Speed
Related Concepts
The BFD mechanism bidirectionally monitors data protocol connectivity over the link
between two routers. After BFD is associated with a routing protocol, BFD can rapidly detect
a fault (if any) and notify the protocol module of the fault, which speeds up route convergence
and minimizes traffic loss.
BFD is classified into the following modes:
l Static BFD
In static BFD mode, BFD session parameters (including local and remote discriminators)
must be configured, and requests must be delivered manually to establish BFD sessions.
Static BFD is applicable to networks on which only a few links require high reliability.
l Dynamic BFD
In dynamic BFD mode, the establishment of BFD sessions is triggered by routing
protocols, and the local discriminator is dynamically allocated, whereas the remote
discriminator is obtained from BFD packets sent by the neighbor.
When a new neighbor relationship is set up, a BFD session is established based on the
neighbor and detection parameters, including source and destination IP addresses. When
a fault occurs on the link, the routing protocol associated with BFD can detect the BFD
session Down event. Traffic is switched to the backup link immediately, which
minimizes data loss.
Dynamic BFD is applicable to networks that require high reliability.
Implementation
For details about BFD implementation, see "BFD" in Universal Service Router Feature
Description - Reliability. Figure 2-10 shows a typical network topology for BFD for RIP.
l Dynamic BFD for RIP implementation:
a. RIP neighbor relationships are established among Device A, Device B, and Device
C and between Device B and Device D.
b. BFD for RIP is enabled on Device A and Device B.
c. Device A calculates routes, and the next hop along the route from Device A to
Device D is Device B.
d. If a fault occurs on the link between Device A and Device B, BFD will rapidly
detect the fault and report it to Device A. Device A then deletes the route whose
next hop is Device B from the routing table.
e. Device A recalculates routes and selects a new path Device C → Device B →
Device D.
f. After the link between Device A and Device B recovers, a new BFD session is
established between the two routers. Device A then reselects an optimal link to
forward packets.
l Static BFD for RIP implementation:
a. RIP neighbor relationships are established among Device A, Device B, and Device
C and between Device B and Device D.
b. Static BFD is configured on the interface that connects Device A to Device B.
c. If a fault occurs on the link between Device A and Device B, BFD will rapidly
detect the fault and report it to Device A. Device A then deletes the route whose
next hop is Device B from the routing table.
d. After the link between Device A and Device B recovers, a new BFD session is
established between the two routers. Device A then reselects an optimal link to
forward packets.
C
10
os
t=
t =
os
1
C
DeviceC
Usage Scenario
BFD for RIP is applicable to networks that require high reliability.
Benefits
BFD for RIP improves network reliability and enables devices to rapidly detect link faults,
which speeds up route convergence on RIP networks.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol
convergence must be as quick as possible to improve network availability. Link faults are
inevitable, and therefore a solution must be provided to quickly detect faults and notify
routing protocols.
BFD for Open Shortest Path First (OSPF) associates BFD sessions with OSPF. After BFD for
OSPF is configured, BFD quickly detects link faults and notifies OSPF of the faults. BFD for
OSPF accelerates OSPF response to network topology changes.
Table 2-6 describes OSPF convergence speeds before and after BFD for OSPF is configured.
Table 2-6 OSPF convergence speeds before and after BFD for OSPF is configured
Item Link Fault Detection Mechanism Convergence
Speed
Principles
DeviceA DeviceB
interface 1
interface 2
DeviceC
Figure 2-11 shows a typical network topology with BFD for OSPF configured. The principles
of BFD for OSPF are described as follows:
Definition
Bidirectional Forwarding Detection (BFD) is a mechanism to detect communication faults
between forwarding engines.
To be specific, BFD detects the connectivity of a data protocol along a path between two
systems. The path can be a physical link, a logical link, or a tunnel.
In BFD for OSPFv3, a BFD session is associated with OSPFv3. The BFD session quickly
detects a link fault and then notifies OSPFv3 of the fault, which speeds up OSPFv3's response
to network topology changes.
Purpose
A link fault or a topology change causes routers to recalculate routes. Routing protocol
convergence must be as quick as possible to improve network availability. Link faults are
inevitable, and therefore a solution must be provided to quickly detect faults and notify
routing protocols.
BFD for Open Shortest Path First version 3 (OSPFv3) associates BFD sessions with OSPFv3.
After BFD for OSPFv3 is configured, BFD quickly detects link faults and notifies OSPFv3 of
the faults. BFD for OSPFv3 accelerates OSPFv3 response to network topology changes.
Table 2-7 describes OSPFv3 convergence speeds before and after BFD for OSPFv3 is
configured.
Table 2-7 OSPFv3 convergence speeds before and after BFD for OSPFv3 is configured
Principles
DeviceA DeviceB
interface 1
interface 2
DeviceC
Figure 2-12 shows a typical network topology with BFD for OSPFv3 configured. The
principles of BFD for OSPFv3 are described as follows:
A device can detect neighbor faults at the second level only. As a result, link faults on a high-
speed network may cause a large number of packets to be discarded.
BFD, which can be used to detect link faults on lightly loaded networks at the millisecond
level, is introduced to resolve the preceding issue. With BFD, two systems periodically send
BFD packets to each other. If a system does not receive BFD packets from the other end
within a specified period, the system considers the bidirectional link between them Down.
BFD for IS-IS enables BFD sessions to be dynamically established. After detecting a fault,
BFD notifies IS-IS of the fault. IS-IS sets the neighbor status to Down, quickly updates link
state protocol data units (LSPs), and performs the partial route calculation (PRC). BFD for IS-
IS implements fast IS-IS route convergence.
NOTE
Instead of replacing the Hello mechanism of IS-IS, BFD works with IS-IS to rapidly detect the faults
that occur on neighboring devices or links.
BFD detects only the one-hop link between IS-IS neighbors because IS-IS establishes only one-
hop neighbor relationships.
l Response to the Down event of a BFD session
When BFD detects a link failure, it generates a Down event and informs IS-IS of the
Down event through the GFD module. IS-IS then suppresses neighbor relationships and
recalculates routes. This process speeds up network convergence.
Usage Scenario
NOTICE
Dynamic BFD needs to be configured based on the actual network. If the time parameters are
not configured correctly, network flapping may occur.
BFD for IS-IS speeds up route convergence through rapid link failure detection. The
following is a networking example for BFD for IS-IS.
Primary path
Backup path
Device C
Networking
As shown in Figure 2-14, Device A and Device B belong to ASs 100 and 200, respectively.
The two routers are directly connected and establish an External Border Gateway Protocol
(EBGP) peer relationship.
BFD is enabled to detect the EBGP peer relationship between Device A and Device B. If the
link between Device A and Device B fails, BFD can quickly detect the fault and notify BGP.
Device A Device B
Background
If a node or link along an LDP LSP that is transmitting traffic fails, traffic switches to a
backup LSP. The path switchover speed depends on the detection duration and traffic
switchover duration. A delayed path switchover causes traffic loss. LDP fast reroute (FRR)
can be used to speed up the traffic switchover, but not the detection process.
As shown in Figure 2-15, a local label switching router (LSR) periodically sends Hello
messages to notify each peer LSR of the local LSR's presence and establish a Hello adjacency
with each peer LSR. The local LSR constructs a Hello hold timer to maintain the Hello
adjacency with each peer. Each time the local LSR receives a Hello message, it updates the
Hello hold timer. If the Hello hold timer expires before a Hello message arrives, the LSR
considers the Hello adjacency disconnected. The Hello mechanism cannot rapidly detect link
faults, especially when a Layer 2 device is deployed between the local LSR and its peer.
Primary LSP
FRR LSP
Transit
The rapid, light-load BFD mechanism is used to quickly detect faults and trigger a primary/
backup LSP switchover, which minimizes data loss and improves service reliability.
triggers a traffic switchover. When BFD monitors a unidirectional LDP LSP, the reverse path
of the LDP LSP can be an IP link, an LDP LSP, or a traffic engineering (TE) tunnel.
A BFD session that monitors LDP LSPs is negotiated in either static or dynamic mode:
l Static configuration: The negotiation of a BFD session is performed using the local and
remote discriminators that are manually configured for the BFD session to be
established. On a local LSR, you can bind an LSP with a specified next-hop IP address
to a BFD session with a specified peer IP address.
l Dynamic establishment: The negotiation of a BFD session is performed using the BFD
discriminator type-length-value (TLV) in an LSP ping packet. You must specify a policy
for establishing BFD sessions on a local LSR. The LSR automatically establishes BFD
sessions with its peers and binds the BFD sessions to LSPs using either of the following
policies:
– Host address-based policy: The local LSR uses all host addresses to establish BFD
sessions. You can specify a next-hop IP address and an outbound interface name of
LSPs and establish BFD sessions to monitor the specified LSPs.
– Forwarding equivalence class (FEC)-based policy: The local LSR uses host
addresses listed in a configured FEC list to automatically establish BFD sessions.
BFD uses the asynchronous mode to check LSP continuity. That is, the ingress and egress
periodically send BFD packets to each other. If one end does not receive BFD packets from
the other end within a detection period, BFD considers the LSP Down and sends an LSP
Down message to the LSP management (LSPM) module.
NOTE
Although BFD for LDP is enabled on a proxy egress, a BFD session cannot be established for the
reverse path of a proxy egress LSP on the proxy egress.
To address this issue, BFD for LDP tunnel is used. LDP tunnels include the primary LSP and
FRR bypass LSP. The BFD for LDP tunnel mechanism establishes a BFD session that can
simultaneously monitor the primary and FRR bypass LSPs or the primary and load-balancing
LSPs. If both the primary and FRR bypass LSPs fail or both the primary and load-balancing
LSPs fail, BFD rapidly detects the failures and instructs the LDP upper-layer application to
perform a protection switchover, which minimizes traffic loss.
BFD for LDP tunnel uses the same mechanism as BFD for LDP LSP to monitor the
connectivity of each LSP in an LDP tunnel. Unlike BFD for LDP LSP, BFD for LDP tunnel
has the following characteristics:
Usage Scenarios
BFD for LDP LSP can be used in the following scenarios:
l Primary and bypass LDP FRR LSPs are established.
l Primary and bypass virtual private network (VPN) FRR LSPs are established.
Benefits
BFD for LDP LSP provides a rapid, light-load fault detection mechanism for LDP LSPs,
which improves network reliability.
Benefits
No tunnel protection is provided in the NG-MVPN over P2MP TE function or VPLS over
P2MP TE function. If a tunnel fails, traffic can only be switched using route change-induced
hard convergence, which renders low performance. This function provides dual-root 1+1
protection for the NG-MVPN over P2MP TE function and VPLS over P2MP TE function. If a
P2MP TE tunnel fails, BFD for P2MP TE rapidly detects the fault and switches traffic, which
improves fault convergence performance and reduces traffic loss.
Principles
PE1 PE2
Root Backup Root
P1 P2
In Figure 2-16, BFD is enabled on the root PE1 and the backup root PE2. Leaf nodes UPE1
to UEP4 are enabled to passively create BFD sessions. Both PE1 and PE2 sends BFD packets
to all leaf nodes along P2MP TE tunnels. The leaf nodes receives the BFD packets transmitted
only on the primary tunnel. If a leaf node receives detection packets within a specified
interval, the link between the root node and leaf node is working properly. If a leaf node fails
to receive BFD packets within a specified interval, the link between the root node and leaf
node fails. The leaf node then rapidly switches traffic to a protection tunnel, which reduces
traffic loss.
Traditional detection mechanisms, such as RSVP Hello and Srefresh, detect faults slowly.
BFD rapidly sends and receives packets to detect faults in a tunnel. If a fault occurs, BFD
triggers a traffic switchover to protect traffic.
LSRB LSRD
LSRA LSRF
LSRC LSRE
On the network shown in Figure 2-17, BFD is disabled. If LSRE fails, LSRA or LSRF cannot
promptly detect the fault because a Layer 2 switch exists between them. Although the Hello
mechanism detects the fault, detection lasts for a long time.
If LSRE fails, LSRA and LSRF detect the fault rapidly, and traffic switches to the path LSRA
-> LSRB -> LSRD -> LSRF.
BFD for TE detects faults in a CR-LSP. After detecting a fault in a CR-LSP, BFD for TE
immediately notifies the forwarding plane of the fault to rapidly trigger a traffic switchover.
BFD for TE is usually used together with a hot-standby CR-LSP.
l Static BFD session: established by manually setting the local and remote discriminators.
The local discriminator on a local node must match the remote discriminator on a remote
node. The minimum intervals at which BFD packets are sent and received are
changeable after a static BFD session is established.
l Detection period: an interval at which the system checks the BFD session status. If no
packet is received from the remote end within a detection period, the BFD session is
considered Down.
A BFD session is bound to a CR-LSP. A BFD session is set up between the ingress and
egress. A BFD packet is sent by the ingress to the egress along a CR-LSP. Upon receipt, the
egress responds to the BFD packet. The ingress can rapidly monitor the status of links through
which the CR-LSP passes based on whether a reply packet is received.
If a link fault is detected, BFD notifies the forwarding module of the fault. The forwarding
module searches for a backup CR-LSP and switches traffic to the backup CR-LSP. In
addition, the forwarding module reports the fault to the control plane. If static BFD for TE
CR-LSP is used, a BFD session is created manually to detect faults in the backup CR-LSP if
necessary.
LSRD
LSRB
LSRA LSRC
LSRD
LSRA LSRC
LSRB
Primary Lsp
Backup Lsp
Bfd Session
On the network shown in Figure 2-18, a BFD session is set up to detect faults in the link
through which the primary CR-LSP passes. If a link fault occurs, the BFD session on the
ingress immediately notifies the forwarding plane of the fault. The ingress switches traffic to
the bypass CR-LSP and sets up a new BFD session to detect faults in the bypass CR-LSP.
LSRD
Primary Tunnel
LSRC
Background
When a Layer 2 device is deployed on a link between two RSVP nodes, an RSVP node can
only use the Hello mechanism to detect a link fault. For example, on the network shown in
Figure 2-20, a switch exists between P1 and P2. If a fault occurs on the link between the
switch and P2, P1 keeps sending Hello packets and detects the fault after it fails to receive
replies to the Hello packets. The fault detection latency causes seconds of traffic loss. To
minimize packet loss, BFD for RSVP can be configured. BFD rapidly detects a fault and
triggers TE FRR switching, which improves network reliability.
RS
llo VP
He H ello
VP
RS P3
P1
Switch
PE1 P2 PE2
BFD
for H ello
RS VP
VP RS
Faulty point
: Primary CR-LSP
: Bypass CR-LSP
Implementation
BFD for RSVP monitors RSVP neighbor relationships.
Unlike BFD for CR-LSP and BFD for TE that support multi-hop BFD sessions, BFD for
RSVP establishes only single-hop BFD sessions between RSVP nodes to monitor the network
layer.
BFD for RSVP, BFD for OSPF, BFD for IS-IS, and BFD for BGP can share a BFD session.
When protocol-specific BFD parameters are set for a BFD session shared by RSVP and other
protocols, the smallest values take effect. The parameters include the minimum intervals at
which BFD packets are sent, minimum intervals at which BFD packets are received, and local
detection multipliers.
Usage Scenario
BFD for RSVP applies to a network on which a Layer 2 device exists between the TE FRR
point of local repair (PLR) on a bypass CR-LSP and an RSVP node on the primary CR-LSP.
Benefits
BFD for RSVP improves reliability on MPLS TE networks with Layer 2 devices.
Bidirectional Forwarding Detection (BFD) can rapidly detect faults in links or IP routes. BFD
for VRRP enables a master/backup VRRP switchover to be completed within 1 second,
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
Asso A backup device Static BFD If the BFD session VRRP devices
ciati monitors the status sessions or static detects a fault and must be enabled
on of the master BFD sessions with goes Down, the with BFD.
betw device in a VRRP automatically BFD module
een a backup group. A negotiated notifies the VRRP
VRR common BFD discriminators backup group of
P session is used to the status change.
back monitor the link After receiving the
up between the notification, the
grou master and backup VRRP backup
p and devices. group changes
a VRRP priorities of
com devices and
mon determines
BFD whether to
sessi perform a master/
on backup VRRP
switchover.
Asso The master and Static BFD If the link or peer VRRP devices and
ciati backup devices sessions or static BFD session goes the downstream
on monitor the link BFD sessions with Down, BFD switch must be
betw and peer BFD automatically notifies the VRRP enabled with BFD.
een a sessions. A link negotiated backup group of
VRR BFD session is discriminators the fault. After
P established receiving the
back between the notification, the
up master and backup VRRP backup
grou devices. A peer group immediately
p and BFD session is performs a master/
link established backup VRRP
and between a switchover.
peer downstream
BFD switch and each
sessi VRRP device.
ons BFD helps the
VRRP backup
group detect faults
in the link
between a VRRP
device and the
downstream
switch.
Figure 2-21 Association between a VRRP backup group and a common BFD session
VRRP
Device A
(master) Device C
Device E
User IP/MPLS
network core
Device B Device D
(backup)
BFD control packet
Data flow
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
As shown in Figure 2-22, the master and backup devices monitor the status of link and peer
BFD sessions to identify local or remote faults.
Device A and Device B run VRRP. A peer BFD session is established between Device A and
Device B to detect link and device failures. Link BFD sessions are established between
Device A and Device E and between Device B and Device E to detect link and device
failures. After Device B detects that the peer BFD session goes Down and Link2 BFD session
goes Up, Device B's VRRP status changes from Backup to Master, and Device B takes over.
Figure 2-22 Association between a VRRP backup group and link and peer BFD sessions
Device A
VRRP (master) Device C
FD
1B
Link
Device E
Device B Device D
(backup) BFD control packet
Data flow
NOTE
A Link2 fault does not affect Device A's VRRP status, and Device A continues to forward upstream
traffic. However, Device B's VRRP status becomes Master if both the peer BFD session and Link2 BFD
session go Down, and Device B detects the peer BFD session status change before detecting the Link2
BFD session status change. After Device B detects the Link2 BFD session status change, Device B's
VRRP status becomes Initialize.
Figure 2-23 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.
Figure 2-23 State machine for the association between a VRRP backup group and link and
peer BFD sessions
Initialize
n
Th
n. sio
rit s
e go
io e
ow es
Th s U i s
lin es
pr go
go orit
y
D s
k
pr
es FD
5. P n
BF o
lin p a we
25 R sio
i
go k B
k nd r t
D n
is VR s
BF th ha
D
y
e se
se .
lin
D e n
w
ss
th D
e
lo
d F
se VR 25
Th
io
an k B
n
ss R 5 .
p n
io P
U e li
n
Th
Master Backup
The peer BFD session goes Down
and the link BFD session goes Up.
The preceding process shows that, after link and peer BFD for VRRP is deployed, the backup
device immediately preempts the Master state if a fault occurs. Link and peer BFD for VRRP
implements a millisecond-level master/backup VRRP switchover.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
– Static BFD for multi-segment PW (MS-PW): The remote peer address of the MS-
PW to be detected must be specified. BFD packets can pass through multiple
superstratum provider edge devices (SPEs) to reach the destination, regardless of
whether the control word is enabled for the PW.
l Static BFD for PW in non-TTL mode: The TTL of BFD packets is fixed at 255. BFD
packets are encapsulated with PW labels and transmitted over PWs. A PW must have the
control word enabled and differentiate control packets from data packets by checking
whether these packets carry the control word. Static BFD for PW in non-TTL mode can
detect only end-to-end (E2E) SS-PWs.
Networking Description
BFD Session
CSG1 ASG3 RSG5
RNC
BTS
Figure 2-24 shows an IP radio access network (RAN) that consists of the following device
roles:
l Cell site gateway (CSG): CSGs form the access network. On the IP RAN, CSGs function
as user-end provider edge devices (UPEs) to provide access services for NodeBs.
l Aggregation site gateway (ASG): On the IP RAN, ASGs function as SPEs to provide
access services for UPEs.
l Radio service gateway (RSG): ASGs and RSGs form the aggregation network. On the IP
RAN, RSGs function as network provider edge devices (NPEs) to connect to the radio
network controller (RNC).
The primary PW is along CSG1–ASG3–RSG5 and the secondary PW is along CSG1–CSG2–
ASG4-RSG6. If the primary PW fails, traffic switches to the secondary PW.
Feature Deployment
Configure static BFD for PW on the IP RAN as follows:
1. On CSG1, configure static BFD for the primary and secondary PWs.
2. On RSG5, configure static BFD for the primary PW.
3. On RSG6, configure static BFD for the secondary PW.
NOTE
When you configure static BFD for PW, note the following points:
l When you configure static BFD for the primary PW, ensure that the local discriminator on CSG1 is
the remote discriminator on RSG5 and that the remote discriminator on CSG1 is the local
discriminator on RSG5.
l When you configure static BFD for the secondary PW, ensure that the local discriminator on CSG1
is the remote discriminator on RSG6 and that the remote discriminator on CSG1 is the local
discriminator on RSG6.
After you configure static BFD for PW on CSG1 and primary/secondary RSGs, services can
quickly switch to the secondary PW if the primary PW fails.
Networking Description
Multicast
Source
PIM SM/SSM
DR
(NPE)
3
SPE1 SPE2
4
Figure 2-25 shows a dual-root 1+1 protection scenario in which PE-AGG1 is the master root
node and PE-AGG2 is the backup root node. Each root node sets up a complete MPLS
multicast tree to the UPEs (leaf nodes). The two MPLS multicast trees do not have
overlapping paths. After multicast flows reach PE-AGG1 and PE-AGG2, PE-AGG1 and PE-
AGG2 send the multicast flows along their respective P2MP tunnels to UPEs. Each UPE
receives two copies of multicast flows and selects one to send to users.
The network configurations are as follows:
1. An IGP runs between the UPEs, SPEs, and PE-AGGs to implement Layer 3 reachability.
2. Each PE-AGG sets up a P2P tunnel (a TE tunnel or LDP LSP) to each UPE. VPLS PWs
are set up using BGP-AD. In addition, BGP-AD is used to set up P2MP LSPs from PE-
AGG1 and PE-AGG2 to the UPEs. VPLS PWs are iterated to the P2MP LSPs.
3. A protection group is configured on each UPE for P2MP tunnels so that each UPE can
select one from the two copies of multicast flows it receives.
4. BFD for multicast VPLS is deployed for P2MP tunnels to implement protection
switching when BFD detects a fault. On the PE-AGGs, BFD is configured to track the
upstream AC interfaces. If the AC between NPE1 and PE-AGG1 fails, the UPEs receive
multicast flows from NPE2.
BFD for multicast VPLS sessions are set up as follows:
1. A root node triggers the establishment of a BFD session of the MultiPointHead type.
Once established, the BFD session is initially Up and requires no negotiation. BFD
triggers the root node to periodically send LSP ping packets along the P2MP tunnels and
to send BFD detection packets at a configured BFD detection interval.
2. A leaf node receives LSP ping packets and triggers the establishment of a BFD session
of the MultiPointTail type. Once established, the BFD session is initially Down. After
the leaf node receives BFD detection packets indicating that the BFD session on the root
node is Up, the leaf node changes its BFD session to the Up state and starts BFD
detection.
NOTE
BFD for multicast VPLS sessions support only one-way detection. The BFD session of the
MultiPointHead type on a root node only sends packets, whereas the BFD session of the
MultiPointTail type on a leaf node only receives packets.
On the network shown in Figure 2-25, if link 1 (an AC) fails, BFD on the master root node
detects that the AC interface is Down and stops sending BFD detection packets. The leaf
nodes cannot receive BFD detection packets, and therefore report the Down event, which
triggers protection switching. The leaf nodes then receive multicast flows from the backup
multicast tunnel. Similarly, if node 2, link 3, node 4, or link 5 fails, the leaf nodes also receive
multicast flows from the backup multicast tunnel. After the fault is rectified, BFD sessions are
reestablished. The leaf nodes then receive multicast flows from the master multicast tunnel
again.
For delay-sensitive services, such as voice, a delay of one second or more is also
unacceptable.
l Other detection mechanisms: Different protocols or manufacturers may provide
proprietary detection mechanisms, but it is difficult to deploy proprietary mechanisms
when systems are interconnected for interworking.
Bidirectional Forwarding Detection (BFD) is a unified detection mechanism that can detect a
fault in milliseconds on a network. BFD is compatible with all types of transmission media
and protocols. BFD implements the fault detection function by establishing a BFD session
and periodically sending BFD control packets along the path between them. If one system
does not receive BFD control packets within a specified period, the system regards it as a fault
occurrence on the path.
In multicast scenarios, if the DR on a shared network segment is faulty and the neighbor
relationship times out, other PIM neighbors start a new DR election. Consequently, multicast
data transmission is interrupted for a few seconds.
BFD for PIM can detect a link's status on a shared network segment within milliseconds and
respond quickly to a fault on a PIM neighbor. If the interface configured with BFD for PIM
does not receive any BFD packets from the current DR within a configured detection period,
the interface considers that a fault has occurred on the designated router (DR). The BFD
module notifies the route management (RM) module of the session status, and the RM
module notifies the PIM module. Then, the PIM module triggers a new DR election
immediately rather than waiting for the neighbor relationship to time out. This minimizes
service interruptions and improves the multicast network reliability.
NOTE
Currently, BFD for PIM can be used on both IPv4 PIM-SM/Source-Specific Multicast (SSM) and IPv6
PIM-SM/SSM networks.
Source PIM-SM
DeviceC
DeviceB
Port 2 Port 1
Ethernet
Receiver
As shown in Figure 2-26, on the shared network segment where user hosts reside, a PIM
BFD session is set up between the downstream interface Port 2 of Device B and the
downstream interface Port 1 of Device C. Both ports send BFD packets to detect the status of
the link between them.
Port 2 of Device B is elected as a DR for forwarding multicast data to the receiver. If Port 2
fails, BFD immediately notifies the RM module of the session status and the RM module then
notifies the PIM module. The PIM module triggers a new DR election. Port 1 of Device C is
then elected as a new DR to forward multicast data to the receiver.
3 LMSP
3.1 Introduction
3.2 Principles
3.3 Applications
3.1 Introduction
Definition
Linear multiplex section protection (LMSP) is an SDH interface-based protection technique
that uses an SDH interface to protect services on another SDH interface. If a link failure
occurs, LMSP enables a device to send a protection switching request over K bytes to its peer
device. The peer device then returns a switching bridge reply.
NOTE
LMSP is often referred to as low speed APS protection.
Purpose
Large numbers of low-speed links still exist on the user side. These links may be unstable due
to aging. These links have a small capacity and may fail to work properly due to congestion in
traffic burst scenarios. Therefore, a protection technique is required to provide reliability and
stability for these low-speed links.
LMSP is an inherent feature of an SDH network. When a mobile bearer network is deployed,
a router must be connected to an add/drop multiplexer (ADM) or RNC, both of which support
LMSP. As the original protection function of the router cannot properly protect the
communication channel between the router and ADM or RNC, LMSP is introduced to resolve
this issue.
Benefits
LMSP offers the following benefits:
l Improves the reliability and security of low-speed links and enhances product
creditability and market competitiveness by reducing labor costs (automatic switching)
and decreasing network interruption time (rapid switching).
l Improves user experience by increasing user access success rates.
3.2 Principles
LMSP is a redundancy protection mechanism that uses a backup channel to protect services
on a channel. LMSP is defined in ITU-T G.783 and G.841 and used to protect multiplex
section (MS) layers in linear networking mode. LMSP applies to point-to-point physical
networks.
NOTE
LMSP can protect services against disconnection of the optical fiber on which the working MS resides,
regenerator failures, and MS performance deterioration. It does not protect against node failures.
Linear MS Mode
Linear MS modes are classified as 1+1 or 1:N protection modes by protection structure (only
1:1 protection is implemented).
l In 1+1 protection mode, each working link has a dedicated protection link as its backup.
In a process called bridging, a transmit end transmits data on both the working and
protection links simultaneously. In normal circumstances, a receive end receives data
from the working link. If the working link fails and the receive end detects the failure,
the receive end receives data from the protection link. Generally, only a receive end
performs a switching action, along with single-ended protection. K1 and K2 bytes are
not required for LMSP negotiation.
The 1+1 protection mode has advantages such as rapid traffic switching and high
reliability. However, this mode has a low channel usage (about 50%). Figure 3-1 shows
the 1+1 protection mode.
Working link
Normal Condition:
Protection link
one signal is chosen per pair
Working link
Failture Condition:
Protection link the "best" signal is chosen
Source Destination
l In 1:N protection mode, a protection link provides traffic protection for N working links
(1 ≤ N ≤ 14). In normal circumstances, a transmit end transmits data on a working link.
The protection link can transmit low-priority data or it may not transmit any data. If the
working link fails, the transmit end bridges data onto the protection link. The receive end
then receives data from the protection link. If the transmit end is transmitting low-
priority data on the protection link, it will stop the data transmission and start
transmitting high-priority protected data. Figure 3-2 shows the 1:N protection mode.
Working link
Normal Condition:
protection channel is empty
Protection link
Working link
Failture Condition:
protection channel contains
failed link
Protection link
Source Destination
If several working links fail at the same time, only data on the working link with the
highest priority can be switched to the protection link. Data on other faulty working links
is lost.
When N is 1, the 1:N protection mode becomes the 1:1 protection mode.
The 1:N protection mode requires both a transmit end and a receive end to perform
switching. Therefore, K1 and K2 bytes are required for negotiation. The 1:N protection
mode has a high channel usage but poorer reliability than the 1+1 protection mode.
Linear MS K Bytes
LMSP uses APS to control bridging, switching, and recovery actions. APS information is
transmitted over the K1 and K2 bytes in the MS overhead in an SDH frame structure. Table
3-1 lists the bit layout of the K1 and K2 bytes.
Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit Bit
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
l Bits 3, 2, 1, and 0 of the K1 byte: switching request channel numbers. The value 0
indicates a protection channel. The values 1 to 14 indicate working channels (the value
can be only 1 in 1+1 protection mode). The value 15 indicates an extra service channel
(the value can be 15 only in 1:N protection mode).
l Bits 7, 6, 5, and 4 of the K2 byte: bridging/switching channel numbers. The value
meanings of a bridging channel number are the same as those of a switching request
channel number.
l Bit 3 of the K2 byte: protection mode. The value 0 indicates 1+1 protection, and the
value 1 indicates 1:1 protection.
l Bits 2, 1, and 0 of the K2 byte: MS status code. The values are as follows:
– 000: idle state
– 111: multiplex section alarm indication signal (MS-AIS)
– 110: multiplexing section remote degradation indication (MS-RDI)
– 101: dual-ended
– 100: single-ended (not defined by standards)
1. Device B receives a signal failure message and sends a bridge request to device A
through the protection channel.
2. After receiving the bridge request, device A sends a response to device B through the
protection channel.
3. After receiving the response, device B performs switching and bridging actions and
sends a switching acknowledgement to device A through the protection channel.
4. After receiving the switching acknowledgement, device A performs bridging and
switching actions. The switching is complete when LMSP enters the stable state.
Bridge
A B
request
Response A B
Switching
A B and
bridging
Bridging
and A B
switching
Switching
A B
request
Response A B
A B Switching
Switching A B
PGP
MC-LMSP is implemented between main control boards over PGP. The connection mode is
UDP. Figure 3-5 shows the communication process.
1. Interface board of the master device sends a message to the main control board through
the IPC.
2. The main control board of the master device constructs a PGP packet and sends the
packet from the main control board to interface board over the VP.
3. The master device sends the packet through an interface to the backup device.
4. The backup device sends the packet to the main control board over the VP.
5. The main control board of the backup device performs APS PGP processing, and sends a
message to interface board through the IPC.
6. The interface board of the backup device sends the packet back to the master device.
7. The master device sends the packet from the interface board to the main control board.
9
APS management APS management
PGP PGP
(main control board) (main control board)
4
Primary link failure/
recovery
3 10
5 8
Driver Driver
RNC
1 PW1-Seg2 LMSP
S eg
1-
PW MC-PW
APS MC-LMSP
PW
2- PW2-Seg2
CE TPE1 Se CE
g1
SPE TPE3
1. The interfaces on TPE2 and TPE3 form an MC-LMSP group. TPE2 and TPE3 are
configured as the working and protection NEs, respectively. The LMSP state machine
runs on TPE3.
2. PW1 and PW2 form an inter-device PW APS group.
3.3 Applications
RNC
Single-chassis
LMSP
Single-chassis
Single-chassis LMSP
LMSP
Router
Single-chassis
Single-chassis
LMSP
LMSP
Microwave
SDH network
network
Microwave
device SDH device
l On the access side, a NodeB/BTS is connected to the router over an E1 or SDH link, and
a microwave or SDH device is connected to the router over an optical fiber. Single-
chassis LMSP is configured for the STM-1 link between the router and microwave or
SDH device.
l On the network side, the router is connected to PEs. Single-chassis LMSP is configured
on POS or CPOS interfaces.
Access Side
Scenario 1: On the network shown in Figure 3-8, a base station is connected to the router
through the microwave devices and then over the IMA/TDM link (CPOS interface) that has
LMSP configured. The RNC is connected to the device over the IMA/TDM link (CPOS
interface). After base station data reaches the router, the base station can interwork with the
RNC over the PW between the router and device.
IMA/TDM IMA/TDM
E1 IMA E1 IMA (CPOS interface) PW
(CPOS interface)
Scenario 2: On the network shown in Figure 3-9, a base station is connected to the router
through the microwave devices and then over the IMA link (CPOS interface) that has LMSP
configured. The RNC is connected to the device over the ATM link. After base station data
reaches the router, the base station can interwork with the RNC over the PW between the
router and device.
IMA
E1 IMA E1 IMA (CPOS interface) PW
ATM
LMSP
Microwave devices Router Device RNC
Network Side
Scenario 1: On the network shown in Figure 3-10, the router's network-side interface is a
CPOS interface on which a global MP group is configured. Single-chassis LMSP is
configured on the CPOS interface. The router is connected to another device to carry PW/
L3VPN/MPLS/DCN services.
Scenario 2: On the network shown in Figure 3-11, the router's network-side interface is a
POS interface. Single-chassis LMSP is configured on the POS interface. The router is
connected to another device to carry PW/VPLS/L3VPN/MPLS/DCN services.
PW Device C
a ry
im
Pr
Bypass PW
MC-LMSP
1:1
Figure 3-13 Network with MC-LMSP 1+1 protection+two bypass PWs deployed
PW Device C
a ry
im
Pr
Bypass PW
Double
MC-LMSP
1+1
D
N
I-
E-PW APSPW MC-LMSP LMSP
1:1 1:1
Device B Port B
Port A
Port C
of traffic and multicasts them to the RNC through port B or to Device C through port C.
Device C has an implementation process similar to Device B.
Figure 3-15 shows E-PW APS and MC-LMSP 1+1 application.
D
N
I-
E-PW APS PW MC-LMSP LMSP
1+1 1+1
Device B Port B
Port A
Port C
Device A PE1
MPLS
OSPF
SDH Area 0
LMSP
BSC
MUX
Device B PE2
4 MPLS OAM
OAM is an important means to reduce network maintenance costs. The MPLS OAM
mechanism manages operation and maintenance of MPLS networks.
For details about the MPLS OAM background, see ITU-T Recommendation Y.1710. For
details about the MPLS OAM implementation mechanism, see ITU-T Recommendation Y.
1711.
Purpose
The server-layer protocols, such as Synchronous Optical Network (SONET)/Synchronous
Digital Hierarchy (SDH), is below the MPLS layer; the client-layer protocols, such as IP, and
ATM, is above the MPLS layer. These protocols have their own OAM mechanisms. Failures
in the MPLS network cannot be rectified completely through the OAM mechanism of other
layers. In addition, the network technology hierarchy also requires MPLS to have its
independent OAM mechanism to decrease dependency between layers on each other.
The MPLS OAM mechanism can detect, identify, and locate a defect at the MPLS layer
effectively. Then, the MPLS OAM mechanism reports and handles the defect. In addition, if a
failure occurs, the MPLS OAM mechanism triggers protection switching.
MPLS offers an OAM mechanism totally independent of any upper or lower layer. The
following OAM features are enabled on the MPLS user plane:
l Monitors links connectivity.
l Evaluates network usage and performance.
l Performs a traffic switchover if a fault occurs so that services meet service level
agreements (SLAs).
Benefit
l MPLS OAM can rapidly detect link faults or monitor the connectivity of links, which
helps measure network performance and minimizes OPEX.
l If a link fault occurs, MPLS OAM rapidly switches traffic to the standby link to restore
services, which shortens the defect duration and improves network reliability.
Reverse Tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse tunnel
can transmit BDI packets to notify the ingress of an LSP defect.
A reverse tunnel and the LSP to which the reverse tunnel is bound must have the same
endpoints.
The reverse tunnel transmitting BDI packets can be either of the following types:
l If OAM is enabled on the ingress of an LSP later than that on the egress or if OAM is
enabled on the egress but disabled on the ingress, the egress generates a loss of
connectivity verification defect (dLOCV) alarm.
l Before the OAM detection packet type or the interval at which detection packets are sent
are changed, OAM must be disabled on the ingress and egress.
l OAM parameters (such as a detection packet type and an interval at which detection
packets are sent) must be set on both the ingress and egress, which may cause parameter
inconsistency.
The NE40E implements the OAM auto protocol to resolve these drawbacks.
The OAM auto protocol is configured on the egress. With this protocol, the egress can
automatically start OAM functions after receiving the first OAM packet. In addition, the
egress can dynamically stop running the OAM state machine after receiving an FDI packet
sent by the ingress.
Background
The Multiprotocol Label Switching (MPLS) operation, administration and maintenance
(OAM) mechanism effectively detects and locates MPLS link faults. The MPLS OAM
mechanism also triggers a protection switchover after detecting a fault.
Related Concepts
l MPLS OAM packets
Table 4-1 describes MPLS OAM packets.
Backward defect indication (BDI) packet Sent by the egress to notify the ingress of
an LSP defect.
l Channel defects
Table 4-2 describes channel defects that MPLS OAM can detect.
l Reverse tunnel
A reverse tunnel is bound to an LSP that is monitored using MPLS OAM. The reverse
tunnel can transmit BDI packets to notify the ingress of an LSP defect. A reverse tunnel
and the LSP to which the reverse tunnel is bound must have the same endpoints, and
they transmit traffic in opposite directions. The reverse tunnels transmitting BDI packets
include private or shared LSPs. Table 4-3 lists the two types of reverse tunnel.
Private Bound to only one LSP. The binding between the private reverse LSP
reverse LSP and its forward LSP is stable but may waste LSP resources.
Implementation
MPLS OAM periodically sends CV or FFD packets to monitor TE LSPs, PWs, or ring
networks.
CV
/F
D
FD
/FF
CV
Egress LSR
Ingress LSR
BDI
BDI
CV/FFD
BDI
Figure 4-2 illustrates a network on which MPLS OAM monitors TE LSP connectivity.
The process of using MPLS OAM to monitor TE LSP connectivity is as follows:
a. The ingress sends a CV or FFD packet along a TE LSP to be monitored. The packet
passes through the TE LSP and arrives at the egress.
b. The egress compares the packet type, frequency, and TTSI in the received packet
with the locally configured values to verify the packet. In addition, the egress
collects the number of correct and incorrect packets within a detection interval.
c. If the egress detects an LSP defect, the egress analyzes the defect type and sends a
BDI packet carrying defect information to the ingress along a reverse tunnel. The
ingress can then be notified of the defect. If a protection group is configured, the
ingress switches traffic to a backup LSP.
l MPLS OAM for PWs
AC
PW signals
PW
Tunnel
On the NE40E, the OAM auto protocol can address the following problems, which occur
because of drawbacks of ITU-T Recommendations Y.1710 and Y.1711:
l A dLOCV defect occurs if the OAM function is enabled on the ingress on an LSP later
than that on the egress or if OAM is enabled on the egress and disabled on the ingress.
l The dLOCV defect also occurs when OAM is disabled. OAM must be disabled on the
ingress and egress before the OAM detection packet type or the interval at which
detection packets are sent can be changed.
l OAM parameters, including a detection packet type and an interval at which detection
packets are sent must be set on both the ingress and egress. This is likely to cause a
parameter inconsistency.
The OAM auto protocol enabled on the egress provides the following functions:
l Triggers OAM
– If the sink node does not support OAM CC and CC parameters (including the
detection packet type and interval at which packets are sent), upon the receipt of the
first CV or FFD packet, the sink node automatically records the packet type and
interval at which the packet is sent and uses these parameters in CC detection that
starts.
– If the OAM function-enabled sink node does not receive CV or FFD packets within
a specified period of time, the sink node generates a BDI packet and notifies the
NMS of the BDI defect.
l Dynamically stops running the OAM. If the detection packet type or interval at which
detection packets are sent is to be changed on the source node, the source node sends an
FDI packet to instruct the sink node to stop the OAM state machine. If an OAM function
is to be disabled on the source node, the source node also sends an FDI packet to instruct
the sink node to stop the OAM state machine.
FE/GE
MPLS
PE1
GE
IMA E1 STM-1
FE/GE
BTS/NodeB
Figure 4-4 illustrates an IP RAN in the Layer 2 to edge scenario. The MPLS OAM
implementation is as follows:
l The BTS, NodeB, BSC, and RNC can be directly connect to an MPLS network.
l A TE tunnel between PE1 and PE4 is established. PWs are established over the TE
tunnel to transmit various services.
l MPLS OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1 and
PE4 on both ends of a PW. These PEs are enabled to send and receive OAM detection
packets, which allows OAM to monitor the PW between PE1 and PE4. OAM can obtain
basic PW information. If OAM detects a default, PE4 sends a BDI packet to PE1 over a
reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC of fault
information so that the user-side devices can use the information to maintain networks.
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service
(VPLS) services require an operation, administration and maintenance (OAM) mechanism.
MultiProtocol Label Switching Transport Profile MPLS OAM provides a mechanism to
rapidly detect and locate faults, which facilitates network operation and maintenance and
reduces the network maintenance costs.
Networking Description
As shown in Figure 4-5, a user-end provider edge (UPE) on the access network is dual-
homed to SPE1 and SPE2 on the aggregation network. A VLL supporting access links of
various types is deployed on the access network. A VPLS is deployed on the aggregation
network to form a point-to-multipoint leased line network. Additionally, Fast Protection
Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection switching
(APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching
instances (VSIs) created on the two superstratum provider edges (SPEs).
SPE1
VSI
PW FPS
UPE PE
Tunnel
VLL
APS
Node B RNC
VSI
SPE2
Feature Deployment
To deploy MPLS OAM to monitor link connectivity of VLL and VPLS pseudo wires (PWs),
configure maintenance entity groups (MEGs) and maintenance entities (MEs) on the UPE,
SPE1, and SPE2 and then enable one or more of the continuity check (CC), loss measurement
(LM), and delay measurement (DM) functions. The UPE monitors link connectivity and
performance of the primary and secondary PWs.
l After the primary PW recovers, the UPE switches traffic from the secondary PW back to
the primary PW. Meanwhile, the UPE sends a MAC Withdraw packet, in which the
value of the PE-ID field is SPE2's ID, to SPE1. After receiving the MAC Withdraw
packet, SPE1 transparently forwards the packet to the NPE and the NPE deletes the
MAC address it has learned from SPE2. After that, the NPE learns a new MAC address
from the new primary PW.
reverse A direction opposite to the direction that traffic flows along the
monitored service link.
forward A direction that traffic flows along the monitored service link.
path merge LSR An LSR that receives the traffic transmitted on the protection path
in MPLS OAM protection switching.
If the path merge LSR is not the traffic destination, it sends and
merges the traffic transmitted on the protection path onto the
working path.
If the path merge LSR is the destination of traffic, it sends the
traffic to the upper-layer protocol for handling.
path switch LSR An LSR that switches or replicates traffic between the primary
service link and the bypass service link.
user plane A set of traffic forwarding components through which traffic flow
passes. An OAM CV or FFD packet is periodically inserted to this
traffic flow to monitor the forwarding component status. In IETF
drafts, the user plane is also called the data plane.
Ingress An LSR from which the forward LSP originates and at which the
reverse LSP terminates.
Egress An LSR at which the forward LSP terminates and from which the
reverse LSP originates.
CV connectivity verification
DM loss measurement
SD Signal deterioration
SF Signal failure
5 MPLS-TP OAM
5.1 Introduction
5.2 Principles
5.3 Applications
5.4 Terms and Abbreviations
5.1 Introduction
Definition
Multiprotocol Label Switching Protocol Transport Profile (MPLS-TP) is a transport technique
that integrates MPLS packet switching with traditional transport network features. MPLS-TP
networks are poised to replace traditional transport networks in the future. MPLS-TP
Operation, Administration, and Maintenance (MPLS-TP OAM) works on the MPLS-TP client
layer. It can effectively detect, identify, and locate faults in the client layer and quickly switch
traffic when links or nodes become defective. OAM is an important part of any plan to reduce
network maintenance expenditures.
Purpose
Both networks and services are part of an ongoing process of transformation and integration.
New services like triple play services, Next Generation Network (NGN) services, carrier
Ethernet services, and Fiber-to-the-x (FTTx) services are constantly emerging from this
process. Such services demand more investment and have higher OAM costs. They require
state of the art QoS, full service access, and high levels of expansibility, reliability, and
manageability of transport networks. Traditional transport network technologies such as
Multi-Service Transfer Platform (MSTP), Synchronous Digital Hierarchy (SDH), or
Wavelength Division Multiplexing (WDM) cannot meet these requirements because they lack
a control plane. Unlike traditional technologies, MPLS-TP does meet these requirements
because it can be used on next-generation transport networks that can process data packets, as
well as on traditional transport networks.
Because traditional transport networks or Optical Transport Node (OTN) networks have high
reliability and maintenance benchmarks, MPLS-TP must provide powerful OAM capabilities.
MPLS-TP OAM provides the following functions:
l Fault management
l Performance monitoring
l Triggering protection switching
Benefits
l MPLS-TP OAM can rapidly detect link faults or monitor the connectivity of links, which
helps measure network performance and minimizes OPEX.
l If a link fault occurs, MPLS-TP OAM rapidly switches traffic to the standby link to
restore services, which shortens the defect duration and improves network reliability.
ME1
ME2
LSP
ME
l MEG
A maintenance entity group (MEG) comprises one or more MEs that are created for a
transport link. If the transport link is a point-to-point bidirectional path, such as a
bidirectional co-routed LSP or pseudo wire (PW), a MEG comprises only one ME.
l MEP
A MEP is the source or sink node in a MEG. Figure 5-2 shows ME node deployment.
– For a bidirectional LSP, only the ingress label edge router (LER) and egress LER
can function as MEPs, as shown in Figure 5-2.
– For a PW, only user-end provider edges (UPEs) can function as MEPs.
MEPs trigger and control MPLS-TP OAM operations. OAM packets can be generated or
terminated on MEPs.
Fault Management
Table 5-1 lists the MPLS-TP OAM fault management functions supported by the NE40E.
Function Description
Performance Monitoring
Table 5-2 lists the MPLS-TP OAM performance monitoring functions supported by the
NE40E.
Loss measurement Collects statistics about lost frames. LM includes the following
(LM) functions:
l Single-ended frame loss measurement
l Dual-ended frame loss measurement
Delay measurement Collects statistics about delays and delay variations (jitter). DM
(DM) includes the following functions:
l One-way frame delay measurement
l Two-way frame delay measurement
5.2 Principles
LSR E
Node B
Section
layer
LSP
layer
PW layer
Section
OAM
LSP OAM
PW OAM
MEG End Point
MEG Intermediate Point
MEG end point (MEP) A MEP is the source or sink l Section layer: Each LSR
node in a MEG. can function as a MEP.
Each LSR functions as
an LSR.
l LSP layer: Only an LER
can function as a MEP.
LSRs A, D, E, and G are
LERs functioning as
MEPs.
l PW layer: Only PW
terminating provider
edge (T-PE) LSRs can
function as MEPs.
LSRs A and G are T-PEs
functioning as MEPs.
Usage Scenario
MPLS-TP OAM monitors the following types of links:
l Static bidirectional co-routed CR-LSPs
l Static VLL-PWs,VPLS-PWs
CC
CC is a proactive OAM operation. It detects LOC faults between any two MEPs in a MEG. A
MEP sends CC messages (CCMs) to a remote RMEP at specified intervals. If the RMEP does
not receive a CCM for a period 3.5 times provided that; if the specified interval, it considers
the connection between the two MEPs faulty. This causes the RMEP to report an alarm and
enter the Down state, and the RMEP triggers automatic protection switching (APS) on both
MEPs. After receiving a CCM from the MEP, the RMEP will clear the alarm and exit the
Down state.
CV
CV is also a proactive OAM operation. It enables a MEP to report alarms when unexpected or
error packets are received. For example, if a CV-enabled MEP receives a packet from an LSP
and finds that this packet has been transmitted in error along an LSP, the MEP will report an
alarm indicating a forwarding error.
l TxFCf: the local TxFCl value recorded when the local MEP sent a CCM.
l RxFCb: the local RxFCl value recorded when the local MEP received a CCM.
l TxFCb: the TxFCf value carried in a received CCM. This TxFCb value is the local
TxFCl when the local MEP receives a CCM.
MPLS-TP
MEP
After receiving CCMs carrying packet count information, both MEPs use the following
formulas to measure near- and far-end packet loss values:
l TxFCf[tc], RxFCb[tc], and TxFCb[tc] are the TxFCf, RxFCb, and TxFCb values,
respectively, which are carried in the most recently received CCM. RxFCl[tc] is the local
RxFCl value recorded when the local MEP received the CCM.
l TxFCf[tp], RxFCb[tp], and TxFCb[tp] are the TxFCf, RxFCb, and TxFCb values,
respectively, which are carried in the previously received CCM. RxFCl[tp] is the local
RxFCl value recorded when the local MEP received the previous CCM.
l tc is the time a current CCM was received.
l tp is the time the previous CCM was received.
After receiving an LMM, the RMEP responds to the local MEP with loss measurement replies
(LMRs) carrying the following information:
l TxFCf: equal to the TxFCf value carried in the LMM.
l RxFCf: the local RxFCl value recorded when the LMM was received.
l TxFCb: the local TxFCl value recorded when the LMR was sent.
MPLS-TP
LMM TxFCf
Single-end LM
LMR TxFCf RxFCb TxFCb
MEP
After receiving an LMR, the local MEP uses the following formulas to calculate near- and far-
end packet loss values:
l TxFCf[tc], RxFCf[tc], and TxFCb[tc] are the TxFCf, RxFCf, and TxFCb values,
respectively, which are carried in the most recently received LMR. RxFCl[tc] is the local
RxFCl value recorded when the most recent LMR arrives at the local MEP.
l TxFCf[tp], RxFCf[tp], and TxFCb[tp] are the TxFCf, RxFCf, and TxFCb values,
respectively, which are carried in the previously received LMR. RxFCl[tp] is the local
RxFCl value recorded when the previous LMR arrived at the local MEP.
l tc is the time a current LMR was received.
l tp is the time the previous LMR was received.
The link delay time can be measured using either one- or two-way frame delay measurement.
Table 5-5 describes these frame delay measurement functions.
One-way Measures the network delay time One-way frame delay measurement
frame delay on a unidirectional link between can be used only on a
measurement MEPs. unidirectional link. A MEP and its
RMEP on both ends of the link
must have synchronous time.
Two-way Measures the network delay time Two-way frame delay measurement
frame delay on a bidirectional link between can be used on a bidirectional link
measurement MEPs. between a local MEP and its
RMEP. The local MEP does not
need to synchronize its time with
its RMEP.
MPLS-TP
MEP
After the RMEP receives a 1DM, it subtracts the TxTimeStampf value from the RxTimef
value to calculate the delay time:
The frame delay value can be used to measure the delay variation that is the absolute
difference between two delay time values.
One-way frame delay measurement can only be performed when the two MEPs on both ends
of a link have synchronous time. If these MEPs have asynchronous time, they can only
measure the delay variation.
the DMR was sent). The value in every field of the DMM is copied exactly to the DMR, with
the exception that the source and destination MAC addresses are interchanged.
MPLS-TP
DMM TxTimeStampf
Two-way DM
DMR TxTimeStampb
MEP
Upon receipt of the DMR, the local MEP calculates the two-way frame delay time using the
following formula:
Frame delay = RxTimeb (the time the DMR was received) - TxTimeStampf
Two-way frame delay measurement supports both delay and delay variation measurement
even if these MEPs do not have synchronous time. The frame delay time is the round-trip
delay time. If both MEPs have synchronous time, the round-trip delay time can be calculated
by combining the two delay values using the following formulas:
l MEP-to-RMEP delay time = RxTimeStampf - TxTimeStampf
l RMEP-to-MEP delay time = RxTimeb - TxTimeStampb
l After a local MEP detects a link fault using the continuity check (CC) function, the local
MEP sets the RDI flag to 1 in CCMs and sends the CCMs along a reverse path to notify
its RMEP of the fault.
l After the fault is rectified, the local MEP sets the RDI flag to 0 in CCMs and sends them
to inform the RMEP that the fault is rectified.
NOTE
l The RDI function is associated with the proactive continuity check function and takes effect only after the
continuity check function is enabled.
l The RDI function applies only to bidirectional links. In the case of a unidirectional LSP, before RDI can
be used, a reverse path must be bound to the LSP.
5.2.6 Loopback
Background
On a multiprotocol label switching transport profile (MPLS-TP) network, a virtual circuit
may traverse muptiple exchanging devices (nodes), including maintenance association end
points (MEPs) and maintenance association intermediate points (MIPs). Any faulty node or
link fault in a virtual circuit may lead to the unavailability of the entire virtual circuit.
Moreover, the fault cannot be located. Loopback (LB) can be configured on a source device
(MEP) to detect or locate faults in links between the MEP and a MIP or between MEPs.
Related Concepts
LB and continuity check (CC) are both connectivity monitoring tools on an MPLS-TP
network. Table 5-6 describes differentces between CC and LB.
Implementation
The loopback function monitors the connectivity of bidirectional links between a MEP and a
MIP and between MEPs.
2. After the destination receives the LBM, it checks whether the target MIP ID or MEP ID
matches the local MIP ID or MEP ID. If they do not match, the destination discards the
LBM. If they match, the destination responds with a loopback reply (LBR).
3. If the source MEP receives the LBR within a specified period of time, it considers the
destination reachable and the loopback test successful. If the source MEP does not
receive the LBR after the specified period of time elapses, it records a loopback test
timeout and log information that is used to analyze the connectivity failure.
LBM
LBR
MEP
MIP
Figure 5-8 illustrates a loopback test. LSRA initiates a loopback test to LSRC on an LSP. The
loopback test process is as follows:
1. LSRA sends LSRC an LBM carrying a specified TTL and a MIP ID. LSRB
transparently transmits the LBM to LSRC.
2. Upon receipt, LSRC determines that the TTL carried in the LBM times out and checks
whether the target MIP ID carried in the LBM matches the local MIP ID. If they do not
match, LSRC discards the LBM. If they match, LSRC responds with an LBR.
3. If LSRA receives the LBR within a specified period of time, it considers LSRC
reachable. If LSRA fails to receive the LBR after a specified period of time elapses,
LSRA considers LSRC unreachable and records log information that is used to analyze
the connectivity failure.
5.3 Applications
FE/GE
MPLS-TP
PE1
GE
IMA E1 STM-1
FE/GE
BTS/NodeB
TE Tunnel
In Figure 5-9, in Layer 2 to edge scenario on an IP RAN, mature PWE3 techniques are used
to carry services. The process of transmitting services between a BST/NodeB and a
RNC/BSC is as follows:
l The BTS, NodeB, BSC, and RNC can be directly connect to an MPLS-TP network.
l A TE tunnel between PE1 and PE4 is established. PWs are established over the TE
tunnel to transmit various services.
l MPLS-TP OAM is enabled on PE1 and PE4 OAM parameters are configured on PE1
and PE4 on both ends of a PW. These PEs are enabled to send and receive OAM
detection packets, which allows OAM to monitor the PW between PE1 and PE4. OAM
can obtain basic PW information. If OAM detects a default, PE4 sends a RDI packet to
PE1 over a reverse tunnel. PEs notify the user-side BTS, NodeB, RNC, and BSC of fault
information so that the user-side devices can use the information to maintain networks.
Service Overview
The operation and maintenance of virtual leased line (VLL) and virtual private LAN service
(VPLS) services require an operation, administration and maintenance (OAM) mechanism.
MultiProtocol Label Switching Transport Profile (MPLS-TP) OAM provides a mechanism to
rapidly detect and locate faults, which facilitates network operation and maintenance and
reduces the network maintenance costs.
Networking Description
As shown in Figure 5-10, a user-end provider edge (UPE) on the access network is dual-
homed to SPE1 and SPE2 on the aggregation network. A VLL supporting access links of
various types is deployed on the access network. A VPLS is deployed on the aggregation
network to form a point-to-multipoint leased line network. Additionally, Fast Protection
Switching (FPS) is configured on the UPE; MPLS tunnel automatic protection switching
(APS) is configured on SPE1 and SPE2 to protect the links between the virtual switching
instances (VSIs) created on the two superstratum provider edges (SPEs).
SPE1
VSI
PW FPS
UPE PE
Tunnel
VLL
APS
Node B RNC
VSI
SPE2
Feature Deployment
To deploy MPLS-TP OAM to monitor link connectivity of VLL and VPLS pseudo wires
(PWs), configure maintenance entity groups (MEGs) and maintenance entities (MEs) on the
UPE, SPE1, and SPE2 and then enable one or more of the continuity check (CC), and
loopback (LB) functions. The UPE monitors link connectivity and performance of the primary
and secondary PWs.
Abbreviations
Abbreviation Full Name
CC Continuity Check
CV Connectivity Verification
DM Delay Measurement
LB Loopback
LM Loss Measurement
LT Linktrace
PW Pseudo-Wires
SPE Superstratum PE
TST Test
UPE Underlayer PE
6 VRRP
NOTE
In this document, if a VRRP function supports both IPv4 and IPv6, the implementation of this VRRP
function is the same for IPv4 and IPv6 unless otherwise specified.
VRRP is a fault-tolerant protocol defined in relevant standards . VRRP allows logical devices
to work separately from physical devices and implements route selection among multiple
egress gateways.
On the network shown in Figure 6-1, VRRP is enabled on two routers. One is the master and
the other is the backup. The two routers form a virtual router and this virtual router is assigned
a virtual IP address and a virtual MAC address. Hosts monitor only the presence of the virtual
router. The hosts communicate with devices on other network segments through the virtual
router.
A virtual router consists of a master router and one or more backup routers. Only the master
router forwards packets. If the master router fails, a backup router is elected as the master
router and takes over.
User Master
Network
Internet
User
Network
Backup
On a multicast or broadcast LAN (for example, an Ethernet), VRRP uses a logical VRRP
gateway to ensure reliability for key links. VRRP prevents service interruptions if a physical
VRRP gateway fails, providing high reliability. VRRP configuration is simple and takes effect
without modification in configurations, such as routing protocol configurations.
Purpose
As networks rapidly develop and applications become diversified, various value-added
services, such as Internet Protocol television (IPTV) and video conferencing, have become
widespread. Demands for network infrastructure reliability are increasing, especially in
nonstop network transmission.
Generally, hosts use one default gateway to communicate with external networks. If the
default gateway fails, communication between the hosts and external networks is interrupted.
System reliability can be improved using dynamic routing protocols (such as RIP and OSPF)
or ICMP Router Discovery Protocol (IRDP). However, this method requires complex
configurations and each host must support dynamic routing protocols.
VRRP resolves this issue by enabling several routers to be grouped into a virtual router, also
called a VRRP backup group. In normal circumstances, the master router in the VRRP backup
group functions as a default gateway and provides access services for users. If the master
router fails, VRRP elects a backup router from the VRRP backup group to provide access
services for users.
Hosts on a local area network (LAN) are usually connected to an external network through a
default gateway. When the hosts send packets destined for addresses out of the local network
segment, these packets follow a default route to an egress gateway. A provider edge (PE)
functions as an egress gateway on the network shown in Figure 6-2. The PE forwards packets
to the external network so that the hosts can communicate with the external network.
User
network
Gateway
Internet
PE
User
network
If the PE fails, the hosts connected to it cannot communicate with the external network. The
communication failure persists even if another router is added to the LAN. This is because
only a single default gateway can be configured for most hosts on a LAN and forward all data
packets destined for devices that are not on the local network segment. Hosts send packets
only through the default gateway though they are connected to multiple routers.
VRRP prevents communication failures in a better way than the preceding two methods.
VRRP is configured only on routers to implement gateway backup, without any networking
changes or burden on hosts.
Benefits
VRRP offers the following benefits to carriers:
User Internet
Network
Backup 1
Backup n
User Internet
Network
VRRP Backup
Group 2 PE2
Multiple VRRP backup groups can be configured to implement load balancing. A single
router can be a member of multiple backup groups. On the network shown in Figure 6-4, the
VRRP backup groups work in load balancing mode.
l PE1 is the master device in VRRP backup group 1 and the backup device in VRRP
backup group 2.
l PE2 is the master device in VRRP backup group 2 and the backup device in VRRP
backup group 1.
NOTE
VRRP load balancing is classified as multi-gateway or single-gateway load balancing. For details about
VRRP load balancing, see the chapter "VRRP" in HUAWEI NetEngine40E Universal Service Router
Feature Description - Network Reliability.
6.2 Principles
CE
Network
User
Backup
10.1.1.1/24
10.1.1.10
CE Virtual router
Network
User
– Virtual router ID (VRID): ID of a virtual router. Routers with the same VRID form
a virtual router.
– Virtual IP address: IP address of a virtual router. A virtual router can have one or
more virtual IP addresses, which are manually assigned.
– Virtual MAC address: MAC address generated by a virtual router based on a VRID.
A virtual router has one virtual MAC address, in the format of 00-00-5E-00-01-
{VRID} (VRRP for IPv4) or 00-00-5E-00-02-{VRID} (VRRP for IPv6). After a
virtual router receives an ARP (VRRP for IPv4) or NS (VRRP for IPv6) request, it
responds to the request with the virtual MAC address rather than the actual MAC
address.
l IP address owner: VRRP router that uses the virtual IP address as its interface IP address.
If an IP address owner is available, it functions as the master router.
l Primary IP address: IP address selected from actual interface IP addresses, which is
usually the first IP address that is configured. The primary IP address is used as the
source IP address in a VRRP Advertisement packet.
l VRRP router: device running VRRP. A VRRP router can join one or more VRRP backup
groups. A VRRP backup group consists of the following VRRP routers:
– Master router: forwards packets and responds to ARP requests.
– Backup router: does not forward packets when the master router is working
properly, but can be elected as the new master router if the master router fails.
l Priority: priority of a router in a VRRP backup group. A VRRP backup group elects the
master and backup routers based on router priorities.
l VRRP working modes:
– Preemption mode: A backup router with a higher priority than the master router
preempts the Master state.
– Non-preemption mode: When the master router is working properly, a backup
router does not preempt the Master state even if it has a priority higher than the
master router.
l VRRP timers:
– Adver_Interval timer: The master router sends a VRRP Advertisement packet each
time the Adver_Interval timer expires. The default timer value is 1 second.
– Master_Down timer: A backup router preempts the Master state after the
Master_Down timer expires. The Master_Down timer value (in seconds) is
calculated using the following equation:
Master_Down timer value = (3 x Adver_Interval timer value) + Skew_Time
where
Skew_Time = (256 - Backup router's priority)/256
For an IPv4 network, VRRP packets are encapsulated into IPv4 packets and sent to an IPv4
multicast address assigned to a VRRP4 backup group. In an IPv4 packet header:
l The source address is the primary IPv4 address of the interface that sends the packet.
l The destination address is 224.0.0.18.
l The time to live (TTL) value is 255.
l The protocol number is 112.
For an IPv6 network, VRRP packets are encapsulated into IPv6 packets and sent to an IPv6
multicast address assigned to a VRRP6 backup group. In an IPv6 packet header:
l The source address is the link-local address of the interface that sends the packet.
l The destination address is FF02::12.
l The hop count is 255.
l The protocol number is 112.
NOTE
The
NE40E allows you to manually switch a VRRP version. VRRP packets refer to VRRPv2 packets, unless
otherwise specified in this document.
Field Description
Field Description
As shown in Figure 6-6 and Figure 6-7, the main differences between VRRPv2 and VRRPv3
are as follows:
l VRRPv2 supports authentication, whereas VRRPv3 does not.
l VRRPv2 supports a second-level interval between sending VRRP Advertisement
packets, whereas VRRPv3 supports a centisecond-level interval.
Initialize
A
St
ar
tu
P
A
p
R
Sh
.
ed
VR
ev
ut
iv
en
e
do
ce
th
t
re
w
is low
5. nd
n
re er
s
ev
ti
25 d a
ce th
en
en
is e
iv an
ev
ti
ed
i
rit ce
s
n
an 255
re
io re
w
do
ce
d .
y
pr s
ti
th
ut
iv
en
ed
e
Sh
VR
ev
.
A
R
The priority carried in the received packet is higher than
tu
P
ar
pr
the local priority, or the priority carried in the received
St
io
rit
A
y
is
the packet is greater than the local IP address.
Master Backup
The Master_Down timer expires.
Master A router in the Master state provides The master router changes its status as
the following functions: follows:
l Sends a VRRP Advertisement l Changes from Master to Backup if
packet each time the the VRRP priority in a received
Adver_Interval timer expires. VRRP Advertisement packet is
l Responds to an ARP request with higher than the local VRRP
an ARP reply carrying the virtual priority.
MAC address. l Remains in the Master state if the
l Forwards IP packets sent to the VRRP priority in a received
virtual MAC address. VRRP Advertisement packet is the
same as the local VRRP priority.
l Allows ping to a virtual IP address
by default. l Changes from Master to Initialize
after it receives a Shutdown event,
indicating that the VRRP-enabled
interface has been shut down.
NOTE
If devices in a VRRP backup group are in
the Master state and a device receives a
VRRP Advertisement packet with the
same priority as the local VRRP priority,
the device compares the IP address in the
packet with the local IP address. If the IP
address in the packet is greater than the
local IP address, the device switches to
the Backup state. If the IP address in the
packet is less than or equal to the local IP
address, the device remains in the Master
state.
Backup A router in the Backup state provides A backup router changes its status as
the following functions: follows:
l Receives VRRP Advertisement l Changes from Backup to Master
packets from the master router and after it receives a Master_Down
checks whether the master router timer timeout event.
is working properly based on l Changes from Backup to Initialize
information in the packets. after it receives a Shutdown event,
l Does not respond to an ARP indicating that the VRRP-enabled
request carrying a virtual IP interface has been shut down.
address.
l Discards IP packets sent to the
virtual MAC address.
l Discards IP packets sent to virtual
IP addresses.
l If, in preemption mode, it receives
a VRRP Advertisement packet
carrying a VRRP priority lower
than the local VRRP priority, it
preempts the Master state after a
specified preemption delay.
l If, in non-preemption mode, it
receives a VRRP Advertisement
packet carrying a VRRP priority
lower than the local VRRP priority
it remains in the Backup state.
l Resets the Master_Down timer but
does not compare IP addresses if it
receives a VRRP Advertisement
packet carrying a VRRP priority
higher than or equal to the local
VRRP priority.
host. After the update is complete, user traffic is switched to the new master router. The
switching process is transparent to users.
5. If the original master router recovers and its priority is 255, it immediately switches to
the Master state. If the original master router recovers and its priority is lower than 255,
it switches to the Backup state and recovers the previously configured priority.
6. If a backup router's priority is higher than the master router's priority, VRRP determines
whether to reelect a new master router, depending on the backup router's working mode
(preemption or non-preemption).
To ensure that the master and backup routers work properly, VRRP must implement the
following functions:
l Master router election
VRRP determines the master or backup role of each router in a VRRP backup group
based on router priorities. VRRP selects the router with the highest priority as the master
router.
If routers in the Initialize state receive a Startup event and their priorities are lower than
255, they switch to the Backup state. The router whose Master_Down timer first expires
switches to the Master state. The router then sends a VRRP Advertisement packet to
other routers in the VRRP backup group to obtain their priorities.
– If a router finds that the VRRP Advertisement packet carries a priority higher than
or equal to its priority, this router remains in the Backup state.
– If a router finds that the VRRP Advertisement packet carries a priority lower than
its priority, the router may switch to the Master state or remain in the Backup state,
depending on its working mode. If the router is working in preemption mode, it
switches to the Master state; if the router is working in non-preemption mode, it
remains in the Backup state.
NOTE
l If multiple VRRP routers enter the Master state at the same time, they exchange VRRP
Advertisement packets to determine the master or backup role. The VRRP router with the highest
priority remains in the Master state, and VRRP routers with lower priorities switch to the Backup
state. If these routers have the same priority and the VRRP backup group is configured on a router's
interface with the largest primary IP address, that router becomes the master router.
l If a VRRP router is the IP address owner, it immediately switches to the Master state after receiving
a Startup event.
l Master router status advertisement
The master router periodically sends VRRP Advertisement packets to all backup routers
in the VRRP backup group to advertise its configurations (such as the priority) and
operating status. The backup routers determine whether the master router is operating
properly based on received VRRP Advertisement packets.
– If the master router gives up the master role (for example, the master router leaves
the VRRP backup group), it sends VRRP Advertisement packets carrying a priority
of 0 to the backup routers. Rather than waiting for the Master_Down timer to
expire, the backup router with the highest priority switches to the Master state after
a specified switching time. This switching time is called Skew_Time, in seconds.
The Skew_Time is calculated using the following equation:
Skew_Time = (256 - Backup router's priority)/256
– If the master router fails and cannot send VRRP Advertisement packets, the backup
routers cannot immediately detect the master router's operating status. In this
situation, the backup router with the highest priority switches to the Master state
after the Master_Down timer expires. The Master_Down timer value (in seconds) is
calculated using the following equation:
Master_Down timer value = (3 x Adver_Interval timer value) + Skew_Time
NOTE
If network congestion occurs, a backup router may not receive VRRP Advertisement packets from the
master router. If this situation occurs, the backup router proactively switches to the Master state. If the
new master router receives a VRRP Advertisement packet from the original master router, the new
master router will switch back to the Backup state. As a result, the routers in the VRRP backup group
frequently switch between Master and Backup. You can configure a preemption delay to resolve this
issue. After the configuration is complete, the backup router with the highest priority switches to the
Master state only when all of the following conditions are met:
l The Master_Down timer expires.
l The configured preemption delay elapses.
l The backup router does not receive VRRP Advertisement packets.
VRRP Authentication
VRRP supports different authentication modes and keys in VRRP Advertisement packets that
meet various network security requirements.
l On secure networks, you can use the non authentication mode. In this mode, a device
does not authenticate VRRP Advertisement packets before sending them. After a peer
device receives VRRP Advertisement packets, it does not authenticate them either, but it
considers them authentic and valid.
l On insecure networks, you can use the simple or message digest algorithm 5 (MD5)
authentication mode.
– Simple authentication: Before a device sends a VRRP Advertisement packet, it adds
an authentication mode and key to the packet. After a peer device receives the
packet, the peer device checks whether the authentication mode and key carried in
the packet are the same as the locally configured ones. If they are the same, the peer
device considers the packet valid. If they are different, the peer device considers the
packet invalid and discards it.
– MD5 authentication: A device uses the MD5 algorithm to encrypt the locally
configured authentication key and saves the encrypted authentication key in the
Authentication Data field. After receiving a VRRP Advertisement packet, the
device uses the MD5 algorithm to encrypt the authentication key carried in the
packet and checks packet validity by comparing the encrypted authentication key
saved in the Authentication Data field with the encrypted authentication key carried
in the VRRP Advertisement packet.
NOTE
Master/Backup Mode
A VRRP backup group comprises a master router and one or more backup routers. As shown
in Figure 6-9, Device A is the master router and forwards packets, and Device B and Device
C are backup routers and monitor Device A's status. If Device A fails, Device B or Device C
is elected as a new master router and takes over services from Device A.
DeviceA
VRRP Master
DeviceE DeviceD
User IP/MPLS
network core
DeviceB
Backup
Device C
Backup
DeviceA
VRRP Initialize
DeviceE DeviceD
IP/MPLS
User core
network
DeviceB
Master
Data flow
VRRP packet
DeviceC
ARP packet
Backup
1. When Device A functions properly, user traffic travels along the path Device E ->
Device A -> Device D. Device A periodically sends VRRP Advertisement packets to
notify Device B and Device C of its status.
2. If Device A fails, its VRRP functions are unavailable. Because Device B has a higher
priority than Device C, Device B switches to the Master state and Device C remains in
the Backup state. User traffic switches to the new path Device E -> Device B -> Device
D.
3. After Device A recovers, it enters the Backup state (its priority remains 120). After
receiving a VRRP Advertisement packet from Device B, the current master, Device A
finds that its priority is higher than that of Device B. Therefore, Device A preempts the
Master state after the preemption delay elapses, and sends VRRP Advertisement packets
and gratuitous ARP packets.
After receiving a VRRP Advertisement packet from Device A, Device B finds that its
priority is lower than that of Device A and changes from the Master state to the Backup
state. User traffic then switches to the original path Device E -> Device A -> Device D.
User 2
VRRP VRID2 DeviceB Data flow 1
VRID2: Master Data flow 2
VRID1: Backup
As shown in Figure 6-10, VRRP backup groups 1 and 2 are deployed on the network.
– VRRP backup group 1: Device A is the master router, and Device B is the backup
router.
– VRRP backup group 2: Device B is the master router, and Device A is the backup
router.
VRRP backup groups 1 and 2 back up each other and serve as gateways for different
users, therefore load-balancing service traffic.
l Single-gateway load balancing: A load-balance redundancy group (LBRG) with a virtual
IP address is created, and VRRP backup groups without virtual IP addresses are added to
the LBRG. The LBRG is specified as a gateway to implement load balancing for all
users.
Single-gateway load balancing, an enhancement to multi-gateway load balancing,
simplifies user-side configurations and facilitates network maintenance and
management.
Figure 6-11 shows single-gateway load balancing.
User 2
VRRP VRID2 DeviceB Data flow 1
Member VRRP VRID2: Master Data flow 2
Load-Balance VRRP VRID1: Backup
As shown in Figure 6-11, VRRP backup groups 1 and 2 are deployed on the network.
– VRRP backup group 1: an LBRG. Device A is the master router, and Device B is
the backup router.
– VRRP backup group 2: an LBRG member group. Device B is the master router, and
Device A is the backup router.
VRRP backup group 1 serves as a gateway for all users. After receiving an ARP request
packet from a user, VRRP backup group 1 returns an ARP response packet and
encapsulates its virtual MAC address or VRRP backup group 2's virtual MAC address in
the response.
6.2.5 mVRRP
Principles
A switch is dual-homed to two routers at the aggregation layer on a metropolitan area network
(MAN). Multiple VRRP backup groups can be configured on the two routers to transmit
various types of services. Because each VRRP backup group must maintain its own state
machine, a large number of VRRP Advertisement packets are transmitted between the routers.
To help reduce bandwidth and CPU resource consumption during VRRP packet transmission,
a VRRP backup group can be configured as a management Virtual Router Redundancy
Protocol (mVRRP) backup group. Other VRRP backup groups are bound to the mVRRP
backup group and become service VRRP backup groups. Only the mVRRP backup group
sends VRRP packets to negotiate the master/backup status. The mVRRP backup group
determines the master/backup status of service VRRP backup groups.
As shown in Figure 6-12, an mVRRP backup group can be deployed on the same side as
service VRRP backup groups or on the interfaces that directly connect Device A and Device
B.
DeviceE
2 IP/MPLS
User
Service mVRRP
network core
VRRP
1 mVRRP
DeviceB DeviceD
Backup
Related Concepts
mVRRP backup group: has all functions of a common VRRP backup group. Different from a
common VRRP backup group, an mVRRP backup group can be tracked by service VRRP
backup groups and determine their statuses. An mVRRP backup group provides the following
functions:
l When the mVRRP backup group functions as a gateway, it determines the master/backup
status of devices and transmits services. In this situation, a common VRRP backup group
with the same ID as the mVRRP backup group must be created and assigned a virtual IP
address. The mVRRP backup group's virtual IP address is a gateway IP address set by
users.
l When the mVRRP backup group does not function as a gateway, it determines the
master/backup status of devices but does not transmit services. In this situation, the
mVRRP backup group does not require a virtual IP address. You can create an mVRRP
backup group directly on interfaces to simplify maintenance.
Service VRRP backup group: After common VRRP backup groups are bound to an mVRRP
backup group, they become service VRRP backup groups. Service VRRP backup groups do
not need to send VRRP packets to determine their states. The mVRRP backup group sends
VRRP packets to determine its state and the states of all its bound service VRRP backup
groups. A service VRRP backup group can be bound to an mVRRP backup group in either of
the following modes:
l Flowdown: The flowdown mode applies to networks on which both upstream and
downstream packets are transmitted over the same path. If the master device in an
mVRRP backup group enters the Backup or Initialize state, the VRRP module instructs
all service VRRP backup groups that are bound to the mVRRP backup group in
flowdown mode to enter the Initialize state.
l Unflowdown: The unflowdown mode applies to networks on which upstream and
downstream packets can be transmitted over different paths. If the mVRRP backup
group enters the Backup or Initialize state, the VRRP module instructs all service VRRP
backup groups that are bound to the mVRRP backup group in unflowdown mode to
enter the same state.
NOTE
Multiple service VRRP backup groups can be bound to an mVRRP backup group. However, the
mVRRP backup group cannot function as a service backup group and is bound to another mVRRP
backup group.
If a physical interface on which a service VRRP backup group is configured goes Down, the status of
the service VRRP backup group becomes Initialize, irrespective of the status of the mVRRP backup
group.
Benefits
VRRP offers the following benefits:
Background
Virtual Router Redundancy Protocol (VRRP) can monitor the status change only in the
VRRP-enabled interface on the master device. If a VRRP-disabled interface on the master
device or the uplink connecting the interface to a network fails, VRRP cannot detect the fault,
which causes traffic interruptions.
To resolve this issue, configure VRRP to monitor the VRRP-disabled interface status. If a
VRRP-disabled interface on the master device or the uplink connecting the interface to a
network fails, VRRP instructs the master device to reduce its priority to trigger a master/
backup VRRP switchover.
Related Concepts
If a VRRP-disabled interface of a VRRP device goes Down, the VRRP device changes its
VRRP priority in either of the following modes:
l Increased mode: The VRRP device increases its VRRP priority by a specified value.
l Reduced mode: The VRRP device reduces its VRRP priority by a specified value.
Implementation
As shown in Figure 6-13, a VRRP backup group is configured on Device A and Device B.
Device A is the master device, and Device B is the backup device.
Device A is configured to monitor interface 1. If interface 1 fails, Device A reduces its VRRP
priority and sends a VRRP Advertisement packet carrying a reduced priority. After Device B
receives the packet, it checks that its VRRP priority is higher than the received priority and
preempts the Master state.
After interface 1 goes Up, Device A restores the VRRP priority. After Device A receives a
VRRP Advertisement packet carrying Device B's priority in preemption mode, Device A
checks that its VRRP priority is higher than the received priority and preempts the Master
state.
DeviceE
User IP/MPLS
network core
DeviceB DeviceD
Backup
VRRP
DeviceA DeviceC
Backup Interface 1
DeviceE
User IP/MPLS
network core
DeviceB DeviceD
Master
Data flow
Interface in the Up state
Interface in the Down state
Benefits
The association between VRRP and a VRRP-disabled interface helps trigger a master/backup
VRRP switchover if the VRRP-disabled interface fails or the uplink connecting the interface
to a network fails.
Background
To prevent failures on a VRRP-disabled interface from causing service interruptions,
configure a VRRP backup group to track the VRRP-disabled interface. However, a VRRP
backup group can track only one VRRP-disabled interface at a time. As the network scale is
expanding and more interfaces are appearing, a VRRP backup group is required to track more
VRRP-disabled interfaces. If the original technology is used, the configuration workload is
very large.
To reduce the configuration workload, you can add multiple VRRP-disabled interfaces to an
interface monitoring group and enable a VRRP backup group to track the interface monitoring
group. When the link failure ratio of the interface monitoring group reaches a specified
threshold, the VRRP backup group performs a master/backup switchover to ensure reliable
service transmission.
Related Concepts
A VRRP backup group can track three interface monitoring groups at the same time.
l A VRRP backup group can track two interface monitoring groups on the access side in
normal mode (link is not specified). When the link failure ratio on the access side
reaches a specified threshold, the VRRP backup group reduces the priority of the local
device to trigger the remote device to preempt the Master state.
l A VRRP backup group can track one interface monitoring group on the network side in
link mode. When the link failure ratio on the network side reaches a specified threshold,
the local device in the VRRP backup group changes to the Initialize state and sends a
VRRP Advertisement packet carrying a priority of 0 to the remote device to trigger the
remote device to preempt the Master state.
Implementation
Each interface in an interface monitoring group has a Down weight. If an interface goes
Down, the fault weight of the interface monitoring group to which the interface belongs
increases; if an interface goes Up, the fault weight of the interface monitoring group to which
the interface belongs decreases. The fault weight of an interface monitoring group reflects
link quality. VRRP can be configured to track an interface monitoring group. If the fault
weight of the interface monitoring group changes, the system notifies the VRRP module of
the change. The VRRP module calculates the VRRP priority or status based on the fault rate
of the interface monitoring group, configured monitoring mode, and priority change value.
DeviceE
User Interface IP/MPLS
network monitoring core
group
DeviceB DeviceD
Backup
VRRP
DeviceA DeviceC
Backup
DeviceE
User Interface IP/MPLS
network monitoring core
group
DeviceB DeviceD
Master
Service traffic
Link fault
Benefits
Configuring VRRP to track an interface monitoring group on a device where a VRRP backup
group is configured helps to reduce the workload for configuring the VRRP backup group to
track VRRP-disabled interfaces.
preventing user traffic loss. A BFD session is established between the master and backup
devices in a VRRP backup group and is bound to the VRRP backup group. BFD immediately
detects communication faults in the VRRP backup group and instructs the VRRP backup
group to perform a master/backup switchover, minimizing service interruptions.
Asso A backup device Static BFD If the BFD session VRRP devices
ciati monitors the status sessions or static detects a fault and must be enabled
on of the master BFD sessions with goes Down, the with BFD.
betw device in a VRRP automatically BFD module
een a backup group. A negotiated notifies the VRRP
VRR common BFD discriminators backup group of
P session is used to the status change.
back monitor the link After receiving the
up between the notification, the
grou master and backup VRRP backup
p and devices. group changes
a VRRP priorities of
com devices and
mon determines
BFD whether to
sessi perform a master/
on backup VRRP
switchover.
Asso The master and Static BFD If the link or peer VRRP devices and
ciati backup devices sessions or static BFD session goes the downstream
on monitor the link BFD sessions with Down, BFD switch must be
betw and peer BFD automatically notifies the VRRP enabled with BFD.
een a sessions. A link negotiated backup group of
VRR BFD session is discriminators the fault. After
P established receiving the
back between the notification, the
up master and backup VRRP backup
grou devices. A peer group immediately
p and BFD session is performs a master/
link established backup VRRP
and between a switchover.
peer downstream
BFD switch and each
sessi VRRP device.
ons BFD helps the
VRRP backup
group detect faults
in the link
between a VRRP
device and the
downstream
switch.
Figure 6-15 Association between a VRRP backup group and a common BFD session
VRRP
Device A
(master) Device C
Device E
User IP/MPLS
network core
Device B Device D
(backup)
BFD control packet
Data flow
Association Between a VRRP Backup Group and Link and Peer BFD Sessions
As shown in Figure 6-16, the master and backup devices monitor the status of link and peer
BFD sessions to identify local or remote faults.
Device A and Device B run VRRP. A peer BFD session is established between Device A and
Device B to detect link and device failures. Link BFD sessions are established between
Device A and Device E and between Device B and Device E to detect link and device
failures. After Device B detects that the peer BFD session goes Down and Link2 BFD session
goes Up, Device B's VRRP status changes from Backup to Master, and Device B takes over.
Figure 6-16 Association between a VRRP backup group and link and peer BFD sessions
Device A
VRRP (master) Device C
FD
1B
Link
Device E
Device B Device D
(backup) BFD control packet
Data flow
NOTE
A Link2 fault does not affect Device A's VRRP status, and Device A continues to forward upstream
traffic. However, Device B's VRRP status becomes Master if both the peer BFD session and Link2 BFD
session go Down, and Device B detects the peer BFD session status change before detecting the Link2
BFD session status change. After Device B detects the Link2 BFD session status change, Device B's
VRRP status becomes Initialize.
Figure 6-17 shows the state machine for the association between a VRRP backup group and
link and peer BFD sessions.
Figure 6-17 State machine for the association between a VRRP backup group and link and
peer BFD sessions
Initialize
n
Th
n. sio
rit s
e go
io e
ow es
Th s U i s
lin es
pr go
go orit
y
D s
k
pr
es FD
5. P n
BF o
lin p a we
25 R sio
i
go k B
k nd r t
D n
is VR s
BF th ha
D
y
e se
se .
lin
D e n
w
ss
th D
e
lo
d F
se VR 25
Th
io
an k B
n
ss R 5 .
p n
io P
U e li
n
Th
Master Backup
The peer BFD session goes Down
and the link BFD session goes Up.
The preceding process shows that, after link and peer BFD for VRRP is deployed, the backup
device immediately preempts the Master state if a fault occurs. Link and peer BFD for VRRP
implements a millisecond-level master/backup VRRP switchover.
Benefits
BFD for VRRP speeds up master/backup VRRP switchovers if faults occur.
Implementation
NOTE
EFM can detect only local link failures. If the link between the UPE and NPE1 fails, NPE2 cannot detect
the failure. NPE2 has to wait three VRRP Advertisement packet transmission intervals before it switches
to the Master state. During this period, upstream service traffic is interrupted. To speed up master/
backup VRRP switchovers and minimize the service interruption time, configure VRRP also to track the
peer BFD session.
Figure 6-18 shows a network on which VRRP tracking EFM is configured. NPE1 and NPE2
are configured to belong to a VRRP backup group. A peer BFD session is configured to
detect the faults on the two NPEs and on the link between the two NPEs. An EFM session is
configured between the UPE and NPE1 and between the UPE and NPE2 to detect the faults
on the UPE and NPEs and on the links between the UPE and NPEs. The VRRP backup group
determines the VRRP status of NPEs based on the link status reported by EFM and the peer
BFD session.
RRP
M fo r V
E F
Peer BFD
User
IP/MPLS
network
core
UPE
E FM
fo r V
RRP
NPE2
Backup
In Figure 6-18, the following example describes how EFM and a peer BFD session affect the
VRRP status when a fault occurs and rectified.
l NPE1 and NPE2 run VRRP.
l A peer BFD session is established between NPEs to detect link and device failures on the
link between the NPEs.
l An EFM session is established between NPE1 and the UPE and between NPE2 and UPE
to detects link and node faults on the links between NPEs and the UPE.
The implementation is as follows:
1. In normal circumstances, NPE1 periodically sends VRRP Advertisement packets to
inform NPE2 that NPE1 works properly. NPE1 and NPE2 both track the EFM and peer
BFD session status.
2. If NPE1 or the link between the UPE and NPE1 fails, the status of the EFM session
between the UPE and NPE1 changes to Discovery, the status of the peer BFD session
changes to Down, and the status of the EFM session between the UPE and NPE2
changes to Detect. NPE1's VRRP status directly changes from Master to Initialize, and
NPE2's VRRP status directly changes from Backup to Master.
3. After NPE1 or the link between the UPE and NPE1 recovers, the status of the peer BFD
session changes to Up, and the status of the EFM session between the UPE and NPE1
changes to Detect. If the preemption function is configured on NPE1, NPE1 changes
back to the Master state after VRRP negotiation, and NPE2 changes back to the Backup
state.
NOTE
In normal circumstances, if the link between the UPE and NPE2 fails, NPE1 remains in the Master
state and continues to forward upstream traffic. However, NPE2's VRRP status changes to Master
if NPE2 detects the Down state of the peer BFD session before it detects the Discovery state of the
link between itself and the UPE. After NPE2 detects the Discovery state of the link between itself
and the UPE, NPE2's VRRP status changes from Master to Initialize.
Figure 6-19 shows the state machine for VRRP tracking EFM.
Initialize
Th
is
on
e
st
T h is io r i
si
P ion
at
es
e De ty
5. RR ss
us D
ry s
st te is
25 V se
ve FM
a t c lo
of isco
pr
us t , w
M
th ve
co E
rit d EF
e
is e
of and er t
D f th
EF ry.
is he
th th ha
io an e
pr t, th
t
o
M
e
s
EF e V n 25
ec of
se
u
at
D tu s
M RR 5.
ss
y
st
io
se P
is sta
e
et
n
Th
ss
is
e
io
Th
Master Backup
The status of the peer BFD session is
Down, and the status of the EFM
session is Detect.
Benefits
VRRP tracking EFM facilitates master/backup VRRP switchovers on a network on which
UPEs do not support BFD but support 802.3ah.
PE1
UPE2
UPE1 IP/MPLS
UPE3 core
PE2
NPE2
Master
Connectivity fault management (CFM) defined in 802.1ag provides functions, such as point-
to-point connectivity fault detection, fault notification, fault verification, and fault locating.
CFM can monitor the connectivity of an entire network and locate connectivity faults. CFM
can also be used together with switchover techniques to improve network reliability. VRRP
tracking CFM enables a VRRP backup group to rapidly perform a master/backup VRRP
switchover when CFM detects a link fault. This implementation minimizes the service
interruption time.
Implementation
NOTE
CFM can detect only local link failures. If the link between UPE2 and NPE1 fails, NPE2 cannot detect
the failure. NPE2 has to wait three VRRP Advertisement packet transmission intervals before it switches
to the Master state. During this period, upstream service traffic is interrupted. To speed up master/
backup VRRP switchovers and minimize the service interruption time, configure VRRP also to track the
peer BFD session.
Figure 6-21 shows a network on which VRRP tracks CFM and the peer BFD session.
Peer BFD
UPE1 IP/MPLS
UPE3 core
CF
M for PE2
VR
RP
NPE2
Backup
Figure 6-22 shows the state machine for VRRP tracking CFM.
Initialize
Th ss
n
io
se
e ion
n. M
P ss
Th ss P p 5.
st g
ow F
R se
at o
D eC
se R 25
e ion rio
us es
st g ri
VR M
VR a n
es th
at o ty
5. the CF
of Do
th
go of
us es is
th w
e
n s
e n.
of Up low
is n th
i o tu
C
ss sta
th , a e
y , a of
FM
e
25 d
rit p s
se he
i o U tu
CF nd
pr es ta
T
M the
s
go e
Th
r
Master Backup
The status of the peer BFD session
goes Down, and the status of the
CFM session goes Up.
Benefits
VRRP tracking CFM prevents service interruptions caused by dual master devices in a VRRP
backup group and facilitates master/slave VRRP switchovers.
DeviceE
Internet
DeviceB DeviceD
Host A
To resolve the preceding issue, you can associate VRRP with network quality analysis
(NQA). Using test instances, NQA sends probe packets to check the reachability of
destination IP addresses. After VRRP is associated with an NQA test instance, VRRP tracks
the NQA test instance to implement rapid master/backup VRRP switchovers. For the example
shown in the preceding figure, you can configure an NQA test instance on Device A to check
whether the IP address 20.1.1.1 of Interface 2 on Device C is reachable.
NOTE
VRRP association with an NQA test instance is required on only the local device (Device A).
Implementation
You can configure VRRP association with an NQA test instance to track a gateway router's
uplink, which is a cross-device link. If the uplink fails, NQA instructs VRRP to reduce the
gateway router's priority by a specified value. Reducing the priority enables another gateway
router in the VRRP backup group to take over services and become the master, thereby
ensuring communication continuity between hosts on the LAN served by the gateway and the
external network. After the uplink recovers, NQA instructs VRRP to restore the gateway
router's priority.
Figure 6-24 illustrates VRRP association with an NQA test instance.
DeviceB DeviceD
Host A
Benefits
VRRP association with NQA implements a rapid master/backup VRRP switchover if a cross-
device uplink fails.
Background
To improve device reliability, two user gateways working in master/backup mode are
connected to a network, and VRRP is enabled on these gateways to determine their master/
backup status. If a VRRP backup group has been configured and an uplink route to a network
becomes unreachable, access-side users still use the VRRP backup group to forward traffic
along the uplink route, which causes user traffic loss.
Association between a VRRP backup group and a route can prevent user traffic loss. A VRRP
backup group can be configured to track the uplink route to a network. If the route is
withdrawn or becomes inactive, the route management (RM) module notifies the VRRP
backup group of the change. After receiving the notification, the VRRP backup group changes
its master device's VRRP priority and performs a master/backup switchover. This process
ensures that user traffic can be forwarded along a properly functioning link.
Implementation
A VRRP backup group can be associated with an uplink route to a network to determine
whether the route is reachable. If the uplink route is withdrawn or becomes inactive after the
uplink goes Down or the network topology changes, hosts on a local area network (LAN) fail
to access the external network through gateways. The RM module notifies the VRRP backup
group of the route status change. The VRRP priority of the master device decreases by a
specified value. A backup device with a priority higher than others preempts the Master state
and takes over traffic. This process ensures communication continuity between these hosts
and the external network. After the uplink recovers, the RM module instructs the VRRP
backup group to restore the master device's VRRP priority.
As shown in Figure 6-25, a VRRP backup group is configured on Device A (master) and
Device B (backup), with Device A forwarding user traffic. The VRRP backup group on
Device A is associated with the route 100.1.2.0/24.
When the uplink from Device A to Device C goes Down, the route 100.1.2.0/24 becomes
unreachable and Device A's VRRP priority decreases. Because Device A's reduced VRRP
priority is lower than Device B's VRRP priority, Device B preempts the Master state and takes
over, which prevents user traffic loss.
DeviceE
User IP/MPLS
100.1.2.0/24
network core
DeviceB DeviceD
Backup
VRRP DeviceA
Backup DeviceC
DeviceE
User IP/MPLS
100.1.2.0/24
network core
DeviceB DeviceD
Master
Data flow
Benefits
Association between a VRRP backup group and a route helps implement a master/backup
VRRP switchover when an uplink route to a network is unreachable. The association also
ensures that the VRRP backup group performs a traffic switchback and minimizes traffic
downtime.
Compared with the association between a VRRP backup group and the interface status, the
association detects both direct uplink interface faults and faults of links and devices when
uplink traffic passing through multiple devices.
Figure 6-26 Association between direct routes and a VRRP backup group
NPE1
Master NPE3
VRRP
UPE
User IP/MPLS
network core
NPE2
NPE4
Backup
NPE1
Master NPE3
VRRP
UPE
User Direct route tracking IP/MPLS
network VRRP core
NPE2
NPE4
Backup
User-to-network traffic
Network-to-user traffic
Related Concepts
Direct route: a 32-bit host route or a network segment route that is generated after a device
interface is assigned an IP address and its protocol status is Up. A device automatically
generates direct routes without using a routing algorithm.
Implementation
Association between direct routes and a VRRP backup group allows VRRP interfaces to
adjust the costs of direct network segment routes based on the VRRP status. The direct route
with the master device as the next hop has the lowest cost. A dynamic routing protocol
imports the direct routes and selects the direct route with the lowest cost. For example, VRRP
interfaces on Device1 and Device2 on the network shown in Figure 1 are configured with
association between direct routes and the VRRP backup group. The implementation is as
follows:
l Device1 in the Master state sets the cost of its route to the directly connected virtual IP
network segment to 0 (default value).
l Device2 in the Backup state increases the cost of its route to the directly connected
virtual IP network segment.
A dynamic routing protocol selects the route with Device1 as the next hop because this route
costs less than the other route. Therefore, both user-to-network and network-to-user traffic
travels through Device1.
Usage Scenario
When a data center is used, firewalls are attached to devices in a VRRP backup group to
improve network security. Network-to-user traffic cannot pass through a firewall if it travels
over a path different than the one used by user-to-network traffic.
When an IP radio access network (RAN) is configured, VRRP is configured to set the master/
backup status of aggregation site gateways (ASGs) and radio service gateways (RSGs).
Network-to-user and user-to-network traffic may pass through different paths, complicating
network operation and management.
Association between direct routes and a VRRP backup group can address the preceding
problems by ensuring the user-to-network and network-to-user traffic travels along the same
path.
VRRP PE3
PE1 Master RNC1
P W
r im ar y
P
Primary link
CSG
NodeB/
eNodeB Seco
nary Secondary link
PW PE4
PE2 RNC2
Backup
VRRP PE3
PE1 Master RNC1
PW
ar y
Prim
Primary link
CSG
NodeB/
Seco
eNodeB nary
PW Secondary link
PE2 PE4 RNC2
Backup
Upstream traffic
To meet carrier-class reliability requirements, configure devices in the VRRP backup group to
forward traffic even when they are in the Backup state. This configuration can prevent traffic
interruptions in the preceding scenario.
Implementation
As shown in Figure 6-27, upstream traffic travels along the path CSG -> PE1 -> PE3 ->
RNC1/RNC2 in normal circumstances. PE3 is in the Master state, and PE4 in the Backup
state.
If PE1 fails, traffic switches from the primary link between PE1 and PE3 to the secondary link
between PE2 and PE4. Because the speed of a primary/secondary link switchover is higher
than that of a master/backup VRRP switchover:
l If PE4 cannot forward traffic, service traffic is temporarily interrupted before the master/
backup VRRP switchover is complete.
l If PE4 can forward traffic, PE4 takes over service traffic forwarding even if the master/
backup VRRP switchover is not complete.
Benefits
Traffic forwarding by a backup device improves master/backup VRRP switchover
performance and reduces the service interruption time.
Internet Internet
Interface 1 Interface 1
Servie Standby Service Active
Standby
Active link VRRP link VRRP link
link
UPE UPE
User User
network network
Data flow
Related Concept
A VRRP switchback is a process during which the original master device switches its status
from Backup to Master after a fault is rectified.
Implementation
Rapid VRRP switchback allows the original master device to switch its status from Back to
Master without using VRRP Advertisement packets to negotiate the status. For example, on
the network shown in Figure 6-28, device configurations are as follows:
l A common VRRP backup group is configured on NPE1 and NPE2 that run VRRP. An
mVRRP backup group is configured on directly connected interfaces of NPE1 and
NPE2. The common VRRP backup group is bound to the mVRRP backup group and
becomes a service VRRP backup group. The mVRRP backup group determines the
master/backup status of the service VRRP backup group.
l NPE1 has a VRRP priority of 120 and works in the Master state in the mVRRP backup
group.
l NPE2 has a VRRP priority of 100 and works in the Backup state in the mVRRP backup
group.
l NPE1 tracks interface 1 and reduces its priority by 40 if interface 1 goes Down.
The rapid VRRP switchback process is as follows:
1. If NPE1 is working properly, NPE1 periodically sends VRRP Advertisement packets to
notify NPE2 of the Master state. NPE1 tracks interface 1 connected to the active link.
2. If the active link or interface 1 fails, interface 1 goes Down. The service VRRP backup
group on NPE1 is in the Initialize state. NPE1 reduces its mVRRP priority to 80 (120 -
40). As a result, the mVRRP priority of NPE2 is higher than that of NPE1, and NPE2
immediately preempts the Master state. NPE2 then sends a VRRP Advertisement packet
carrying a higher priority than that of NPE1. After receiving the packet, the mVRRP
backup group on NPE1 stops sending VRRP Advertisement packets and enters the
Backup state. The status of the service VRRP backup group is the same as that of the
mVRRP backup group on NPE2. User traffic switches to the path UPE -> PE1 -> PE2 ->
NPE2.
3. After the fault is rectified, interface 1 goes Up and NPE1 increases its VRRP priority to
120 (80 + 40). NPE1 immediately preempts the Master state and sends VRRP
Advertisement packets to NPE2. User traffic switches back to the path UPE -> PE1 ->
NPE1.
NOTE
If rapid VRRP switchback is not configured and NPE1 restores its priority to 120, NPE1 has to
wait until it receives VRRP Advertisement packets carrying a lower priority than its own priority
from NPE2 before preempting the Master state.
4. NPE1 then sends VRRP Advertisement packets carrying a higher priority than NPE2's
priority. After receiving the VRRP Advertisement packets, NPE2 enters the Backup
state. Both NPE1 and NPE2 restore their previous status.
Usage Scenario
Rapid VRRP switchback applies to a specific network with all of the following
characteristics:
l The master device in an mVRRP backup group tracks a VRRP-disabled interface or
feature and reduces its VRRP priority if the interface or feature status becomes Down.
l Devices in a VRRP backup group are connected to user-side devices over the active and
standby links.
l An active/standby link switchback is implemented quicker than a master/backup VRRP
switchback.
Benefits
Rapid VRRP switchback speeds up a VRRP switchback after a fault is rectified.
Implementation
The implementation of unicast VRRP is similar to that of common VRRP..
A unicast VRRP backup group cannot function as a user gateway. In addition to implementing
master/backup status negotiation between devices, unicast VRRP provides the following
extended functions:
l Security authentication: MD5 or HMAC-SHA256 authentication can be configured for a
unicast VRRP backup group to improve network security.
l Delayed preemption: This function prevents the master/backup status of devices in a
unicast VRRP backup group from changing frequently, thereby ensuring network
stability.
l Association with a VRRP-disabled interface, Carrier Grade NAT (CGN), and BFD: If the
master device in a unicast VRRP backup group fails, the backup device immediately
takes over, thereby ensuring network reliability.
l Association with an interface monitoring group: When the link failure ratio on the access
or network side reaches a specified threshold, the unicast VRRP backup group performs
a master/backup switchover to ensure network reliability.
NOTE
As an extension to association between a unicast VRRP backup group and a VRRP-disabled
interface, association between a unicast VRRP backup group and an interface monitoring group
reduces the configuration workload and implements uplink and downlink monitoring.
Usage Scenario
Unicast VRRP applies when two devices on a Layer 3 network need to use VRRP to negotiate
their master/backup status.
Benefits
Unicast VRRP allows two devices on a Layer 3 network to use VRRP to negotiate their
master/backup status. Unicast VRRP can be associated with a VRRP-disabled interface, BFD,
or CGN. If the master device in a unicast VRRP backup group fails, the backup device rapidly
detects the fault and becomes the new master device.
Service Overview
NodeBs and radio network controllers (RNCs) on an IP radio access network (IPRAN) do not
have dynamic routing capabilities. Static routes must be configured to allow NodeBs to
communicate with access aggregation gateways (AGGs) and allow RNCs to communicate
with radio service gateways (RSGs) at the aggregation level. To ensure that various value-
added services, such as voice, video, and cloud computing, are not interrupted on mobile
bearer networks, a VRRP backup group can be deployed to implement gateway redundancy.
When the master device in a VRRP backup group goes Down, a backup device takes over,
ensuring normal service transmission and enhancing device reliability at the aggregation
layer.
Networking Description
Figure 6-29 shows the network for the IPRAN gateway protection solution. A NodeB is
connected to AGGs over an access ring or is dual-homed to two AGGs. The cell site gateways
(CSGs) and AGGs are connected using the pseudo wire emulation edge-to-edge (PWE3)
technology, which ensures connection reliability. Two VRRP backup groups can be
configured on the AGGs and RSGs to implement gateway backup for the NodeB and RNC,
respectively.
RNC
CSG
VRRP VRRP
NodeB
AGG2 P RSG2
CSG
PWE3+VPLS L3VPN
Feature Deployment
Table 6-5 describes VRRP-based gateway protection applications on an IPRAN.
Associate By default, when a VRRP backup group detects that the master
an mVRRP device goes Down, the backup device attempts to preempt the
backup Master state after 3 seconds (three times the interval at which
group with VRRP Advertisement packets are broadcast). During this period,
a BFD no master device forwards user traffic, which leads to traffic
session. forwarding interruptions.
BFD can detect link faults in milliseconds. After an mVRRP
backup group is associated with a BFD session and BFD detects
a fault, a master/backup VRRP switchover is implemented,
preventing user traffic loss. When the master device goes Down,
the BFD module instructs the backup device in the mVRRP
backup group to preempt the Master state and take over traffic.
The status of the service VRRP backup group associated with
the mVRRP backup group changes accordingly. This
implementation reduces service interruptions.
Associate During the traffic transmission between the NodeB and RNC,
direct user-to-network and network-to-user traffic may travel through
network different paths, causing network operation, maintenance, and
segment management difficulties. For example, the NodeB sends traffic
routes with destined for the RNC through the master AGG. The RNC sends
a service traffic destined for the NodeB through the backup AGG. This
VRRP implementation increases traffic monitoring costs. Association
backup between direct network segment routes and a service VRRP
group. backup group can be deployed to ensure that user-to-network
and network-to-user traffic travels through the same path.
Deploy Deploy RSGs provide gateway functions for the RNC. Basic VRRP
VRRP basic functions can be configured on the VLANIF interface of the
backup VRRP RSGs to implement gateway backup. In normal circumstances,
groups on functions. the master device forwards user traffic. When the master device
RSGs to goes Down, the backup device takes over.
implemen
t gateway Associate a A VRRP backup group can be associated with a BFD session to
backup VRRP implement a rapid master/backup VRRP switchover when BFD
for the backup detects a fault. When the master device goes Down, the BFD
RNC. group with module instructs the backup device in the VRRP backup group
a BFD to preempt the Master state and take over traffic. This
session. implementation reduces service interruptions.
RNC
CSG
NodeB
AGG2 P RSG2
CSG Backup -> Master Backup
RNC
CSG
NodeB
AGG2 P RSG2
CSG Backup
Master
When AGG1 recovers, it becomes the master device after a specified preemption delay
elapses. AGG2 then becomes the backup device. Traffic sent from the NodeB goes through
the CSGs to AGG1 over the previous primary PW. AGG1 sends the traffic to RSG1 through
the P device. RSG1 then sends the traffic to the RNC. The path for user-to-network traffic is
CSG -> AGG1 -> P -> RSG1 -> RNC, and the path for network-to-user traffic is RNC ->
RSG1 -> P -> AGG1 -> CSG.
PW pseudo wire
7 Ethernet OAM
OAM mechanisms for server-layer services such as synchronous digital hierarchy (SDH) and
for client-layer services such as IP cannot be used on Ethernet networks. Ethernet OAM
differs from client- or server-layer OAM and has been developed to support the following
functions:
l Monitors Ethernet link connectivity.
l Pinpoints faults on Ethernet networks.
l Evaluates network usage and performance.
These functions help carriers provide services based on service level agreements (SLAs).
Ethernet operation, administration and maintenance (OAM) is used for Ethernet networks.
l Fault management
– Ethernet OAM sends detection packets on demand or periodically to monitor
network connectivity.
– Ethernet OAM uses methods similar to Packet Internet Groper (PING) and
traceroute used on IP networks to locate and diagnose faults on Ethernet networks.
– Ethernet OAM is used together with a protection switching protocol to trigger a
device or link switchover if a connectivity fault is detected. Switchovers help
networks achieve carrier-class reliability, by ensuring that network interruptions are
less than or equal to 50 milliseconds.
l Performance management
Ethernet OAM measures network transmission parameters including packet loss ratio,
delay, and jitter and collects traffic statistics including the numbers of sent and received
bytes and the number of frame errors. Performance management is implemented on
access devices. Carriers use this function to monitor network operation and dynamically
adjust parameters in real time based on statistical data. This process reduces maintenance
costs.
Netw Checks network IEEE 802.1ag, also known CFM is used at the access and
ork- connectivity, as connectivity fault aggregation layers of the MAN
level pinpoints management (CFM), shown in Figure 7-1. For
Ether connectivity defines OAM functions, example, CFM monitors the link
net faults, and such as continuity check between a user-end provider
OAM monitors E2E (CC), loopback (LB), and edge (UPE) and a PE. It
network linktrace (LT), for Ethernet monitors network-wide
performance at bearer networks. CFM connectivity and detects
the access and applies to large-scale E2E connectivity faults. CFM is used
aggregation Ethernet networks. together with protection
layers. For switchover mechanisms to
example, IEEE maintain network reliability.
802.1ag (CFM)
and Y.1731. Y.1731 is an OAM Y.1731 is a CFM enhancement
protocol defined by the that applies to access and
Telecommunication aggregation networks. Y.1731
Standardization Sector of supports performance
the International monitoring functions, such as
Telecommunication Union LM and DM, in addition to fault
(ITU-T). It covers items management that CFM supports.
defined in IEEE 802.1ag
and provides additional
OAM messages for fault
management and
performance monitoring.
Fault management
includes alarm indication
signal (AIS), remote
defect indication (RDI),
locked signal (LCK), test
signal, maintenance
communication channel
(MCC), experimental
(EXP) OAM, and vendor
specific (VSP) OAM.
Performance monitoring
includes frame loss
measurement (LM) and
delay measurement (DM).
CE UPE
PE-AGG
CE UPE
SOHO
NPE
PE
CE UPE
Enterprise
Network IP/MPLS
core
......
CE
NPE PE
UPE
Business CE
PE-AGG
Center
CE UPE
Residential
Core
EFM OAM (802.3ah) CFM(802.1ag)/Y.1731 Network
MAN access to Aggregation layer
Benefits
P2P EFM, E2E CFM, E2E Y.1731, and their combinations are used to provide a complete
Ethernet OAM solution, which brings the following benefits:
l Ethernet is deployed near user premises using remote terminals and roadside cabinets at
remote central offices or in unattended areas. Ethernet OAM allows remote maintenance,
saving the trouble in onsite maintenance. Engineers operate detection, diagnosis, and
monitoring protocols and techniques from remote locations to maintain Ethernet
networks. Remote OAM maintenance saves the trouble of onsite maintenance and helps
reduce maintenance and operation expenditures.
l Ethernet OAM supports various performance monitoring tools that are used to monitor
network operation and assess service quality based on SLAs. If a device using the tools
detects faults, the device sends traps to a network management system (NMS). Carriers
use statistics and trap information on NMSs to adjust services. The tools help ensure
proper transmission of voice and data services.
Dest addr Source addr Type Subtype Flags Code Data/Pad CRC
Source addr Source address, which is a unicast MAC address of a port on the
transmit end. If no port MAC address is specified on the transmit end,
the bridge MAC address of the transmit end is used.
Subtype Subtype of a slow protocol. The value is 0x03, which means that the
slow sub-protocol is EFM.
Field Description
Event Notification Used to monitor links. If an errored frame event, errored symbol
OAMPDU period event, or errored frame second summary event occurs on
an interface, the interface sends an Event Notification OAMPDU
to notify the remote interface of the event.
Connection Modes
EFM supports two connection modes: active and passive. Table 7-4 describes capabilities of
processing OAMPDUs in the two modes.
NOTE
l An EFM connection can be initiated only by an OAM entity working in active mode. An OAM
entity working in passive mode waits to receive a connection request from its peer entity. Two
OAM entities that both work in passive mode cannot establish an EFM connection between them.
l An OAM entity that is to initiate a loopback request must work in active mode.
7.2.2 Background
As telecommunication technologies develop quickly and the demand for service diversity is
increasing, various user-oriented teleservices are being provided over digital and intelligent
media through broadband paths. Backbone network technologies, such as synchronous digital
hierarchy (SDH), asynchronous transfer mode (ATM), passive optical network (PON), and
dense wavelength division multiplexing (DWDM), grow mature and popular. The
technologies allow the voice, data, and video services to be transmitted over a single path to
every home. Telecommunication experts and carriers focus on using existing network
resources to support new types of services and improve the service quality. The key point is to
provide a solution to the last-mile link to a user network.
A "last mile" reliability solution also needs to be provided. High-end clients, such as banks
and financial companies, demand high reliability. They expect carriers to monitor both carrier
networks and last-mile links that connect users to those carrier networks. EFM can be used to
satisfy these demands.
Core
Infrastructure EFM
Maintenance
Service CFM/Y.1731
Maintenance
Subscriber CFM/Y.1731
Maintenance
On the network shown in Figure 7-3, EFM is an OAM mechanism that applies to the last-
mile Ethernet access links to users. Carriers use EFM to monitor link status in real time,
rapidly locate failed links, and identify fault types if faults occur. OAM entities exchange
various OAMPDUs to monitor link connectivity and locate link faults.
Network Side
User Side
Port 2
Port 1 PE2
CE PE1 PE3 IP/MPLS
EFM
PE4
OAM Discovery
During the discovery phase, a local EFM entity discovers and establishes a stable EFM
connection with a remote EFM entity. Figure 7-5 shows the discovery process.
Link Monitoring
Monitoring Ethernet links is difficult if network performance deteriorates while traffic is
being transmitted over physical links. To resolve this issue, the EFM link monitoring function
can be used. This function can detect data link layer faults in various environments. EFM
entities that are enabled with link monitoring exchange Event Notification OAMPDUs to
monitor links.
If an EFM entity receives a link event listed in Table 7-5, it sends an Event Notification
OAMPDU to notify the remote EFM entity of the event and also sends a trap to an NMS.
After receiving the trap on the NMS, an administrator can determine the network status and
take remedial measures as needed.
Errored symbol If the number of symbol errors This event helps the device detect
period event that occur on a device's interface code errors during data
during a specified period of time transmission at the physical layer.
reaches a specified upper limit,
the device generates an errored
symbol period event, advertises
the event to the remote device,
and sends a trap to the NMS.
Errored frame If the number of frame errors This event helps the device detect
event that occur on a device's interface frame errors that occur during data
during a specified period of time transmission at the MAC sublayer.
reaches a specified upper limit,
the device generates an errored
frame event, advertises the event
to the remote device, and sends a
trap to the NMS.
Errored frame An errored frame second is a This event helps the device detect
seconds one-second interval wherein at errored frame seconds that occur
summary event least one frame error is detected. during data transmission at the
If the number of errored frame MAC sublayer.
seconds that occur during a
specified period of time reaches
a specified upper limit on a
device's interface, the device
generates an errored frame
second summary event,
advertises the event to the
remote device, and sends a trap
to the NMS.
Fault Notification
After the OAM discovery phase finishes, two EFM entities at both ends of an EFM
connection exchange Information OAMPDUs to monitor link connectivity. If traffic is
interrupted due to a remote device failure, the remote EFM entity sends an Information
OAMPDU carrying an event listed in Table 7-6 to the local EFM entity. After receiving the
notification, the local EFM entity sends a trap to the NMS. An administrator can view the trap
on the NMS to determine link status and take measures to rectify the fault.
Link fault If a loss of signal (LoS) error occurs because the interval at
which OAMPDUs are sent elapses or a physical link fails, the
local device sends a trap to the NMS.
Remote Loopback
Figure 7-6 demonstrates the principles of remote loopback. When a local interface sends non-
OAMPDUs to a remote interface, the remote interface loops the non-OAMPDUs back to the
local interface, not to the destination addresses of the non-OAMPDUs. This process is called
remote loopback. An EFM connection must be established to implement remote loopback.
Port 1 Port 2
(Active mode) (Passive mode)
Data flow
A device enabled with remote loopback discards all data frames except OAMPDUs, causing a
service interruption. To prevent impact on services, use remote loopback to check link
connectivity and quality before a new network is used or after a link fault is rectified.
The local device calculates communication quality parameters such as the packet loss ratio on
the current link based on the numbers of sent and received packets. Figure 7-7 shows the
remote loopback process.
CE PE1
1 . Se
nds
Proactive carrying a Lo
a r em opback
mode ot e l Co 2. After receiving the
oopb ntrol O
ack A OAMPDU, PE1
requ MPDU
est determines whether to
enter the loopback state:
- If not, PE1 discards the
Loopback Control
PDU
O AM pted OAMPDU and forwards
tion e
acc the data frame as
f o rma est is
n In equ desired
e n d s a t th e r - If yes, PE1 stops
3 . S n g th a
forwarding the data
c a ti
i ndi frame and go to step 3
4. Enters the
loopback state 5 . Se
nds a
loopb
ack t
est p
a cket
7. Compares the t he
number of sent t ba c k to
packet with that st p ack e
e
the t
of received ops
packets and 6 . Lo
t or
checks the link initia
status based on
the result
If the local device attempts to stop remote loopback, it sends a message to instruct the remote
device to disable remote loopback. After receiving the message, the remote device disables
remote loopback.
If remote loopback is left enabled, the remote device keeps looping back service data, causing
a service interruption. To prevent this issue, a capability can be configured to disable remote
loopback automatically after a specified timeout period. After the timeout period expires, the
local device automatically sends a message to instruct the remote device to disable remote
loopback.
Maintenance Domain
MDs are discrete areas within which connectivity fault detection is enabled. The boundary of
an MD is determined by MEPs configured on interfaces. An MD is identified by an MD
name.
To help locate faults, MDs are divided into levels 0 through 7. A larger value indicates a
higher level, and an MD covers a larger area. One MD can be tangential to another MD.
Tangential MDs share a single device and this device has one interface in each of the MDs. A
lower level MD can be nested in a higher level MD. An MD must be fully nested in another
MD, and the two MDs cannot overlap. A higher level MD cannot be nested in a lower level
MD.
Classifying MDs based on levels facilitates fault diagnosis. MD2 is nested in MD1 on the
network shown in Figure 7-8. If a fault occurs in MD1, PE2 through PE6 and all the links
between the PEs are checked. If no fault is detected in MD2, PE2, PE3, and PE4 are working
properly. This means that the fault is on PE5, PE6, or PE7 or on a link between these PEs.
In actual network scenarios, a nested MD can monitor the connectivity of the higher level MD
in which it is nested. Level settings allow 802.1ag packets to transparently travel through a
nested MD. For example, on the network shown in Figure 7-8, MD2 with the level set to 3 is
nested in MD1 with the level set to 6. 802.1ag packets must transparently pass through MD2
to monitor the connectivity of MD1. The level setting allows 802.1ag packets to pass through
MD2 to monitor the connectivity of MD1 but prevents 802.1ag packets that monitor MD2
connectivity from passing through MD1. Setting levels for MDs helps locate faults.
MD2 (Level=3)
PE1 PE2 PE3 PE6 PE8
……
PE4 PE7
……
802.1ag packets are exchanged and CFM functions are implemented based on MDs. Properly
planned MDs help a network administrator locate faults.
Maintenance Association
Multiple MAs can be configured in an MD as needed. Each MA contains MEPs. An MA is
uniquely identified by an MD name and an MA name.
MIPs can be automatically generated based on rules or manually created on interfaces. Table
7-7 describes MIP creation modes.
Manual Only IEEE Std 802.1ag-2007 supports manual MIP configuration. The
configuration MIP level must be set. Manually configured MIPs are preferable to
automatically generated MIPs. Although configuring MIPs manually
is easy, managing many manually configured MIPs is difficult and
errors may occur.
A device automatically generates MIPs based on creation rules, which are configurable.
Creation rules are classified as explicit, default, or none rules, as listed in Table 7-8.
None - -
NOTE
MIPs are separately calculated in each service instance such as a VLAN. In a single service
instance, MAs in MDs with different levels have the same VLAN ID but different levels.
For each service instance of each interface, the device attempts to calculate a MIP from the
lowest level MEP based on the rules listed in Table 7-7 and the following conditions:
l Each MD on a single interface has a specific level and is associated with multiple
creation rules. The creation rule with the highest priority applies. An explicit rule has a
higher priority than a default rule, and a default rule takes precedence over a none rule.
l The level of a MIP must be higher than any MEP on the same interface.
l An explicit rule applies to an interface only when MEPs are configured on the interface.
l A single MIP can be generated on a single interface. If multiple rules for generating
MIPs with different levels can be used, a MIP with the lowest level is generated.
MIP creation rules help detect and locate faults by level.
For example, CCMs are sent to detect a fault in a level 7 MD on the network shown in Figure
7-10. Loopback or linktrace is used to locate the fault in the link between MIPs that are in a
level 5 MD. This process is repeated until the faulty link or device is located.
Explicit rule
Level 5
Default rule
Level 3
MEP
MIP
The following example illustrates how to create a MIP based on a default rule defined in
IEEE Std 802.1ag-2007.
On the network shown in Figure 7-11, MD1 through MD5 are nested in MD7, and MD2
through MD5 are nested in MD1. MD7 has a higher level than MD1 through MD5, and MD1
has a higher level than MD2 through MD5. Multiple MEPs are configured on Device A in
MD1, and the MEPs belong to MDs with different levels.
VLAN2
MD7(Level=7)
...
MD1(Level=6)
MD5(Level=2)
... ...
VLAN1 VLAN2
MD2(Level=5) MD3(Level=4)
RouterA
...
MD4(Level=3)
A default rule is configured on Device A to create a MIP in MD1. The procedure for creating
the MIP is as follows:
1. Device A compares MEP levels and finds the MEP at level 5, the highest level. The
MEP level is determined by the level of the MD to which the MEP belongs.
2. Device A selects the MD at level 6, which is higher than the MEP of level 5.
3. Device A generates a MIP at level 6.
Hierarchical MP Maintenance
MEPs and MIPs are maintenance points (MPs). MPs are configured on interfaces and belong
to specific MAs shown in Figure 7-12.
MA
Inward MEP
Outward MEP
MIP
The scope of maintenance performed and the types of maintenance services depend on the
need of the organizations that use carrier-class Ethernet services. These organizations include
leased line users, service providers, and network carriers. Users purchase Ethernet services
from service providers, and service providers use their networks or carrier networks to
provide E2E Ethernet services. Carriers provide transport services.
Figure 7-13 shows locations of MEPs and MIPs and maintenance domains for users, service
providers, and carriers.
Service provider:level 5
Customer :level 6
Inward MEP
Outward MEP
MIP
Operator 1, operator 2, the service provider, and the customer use MDs with levels 3, 4, 5, and
6, respectively. A higher MD level indicates a larger MD.
Defined by Optional
CCM 0X01 Sequence number MEP ID MA ID
ITU-T Y.1731 CCM TLVs
Loopback
LBR 0X02 Optional LBR TLVs
transaction ID
Loopback
LBM 0X03 Optional LBM TLVs
transaction ID
LTR
LTR 0X04 Reply TTL Relay action Additional LTM TLVs
transaction ID
Field Description
0x01 Continuity check message Used for monitoring E2E link connectivity.
(CCM)
7.3.2 Background
IP-layer mechanisms, such as Simple Network Management Protocol (SNMP), IP ping, and
IP traceroute, are used to manage network-wide services, detect faults, and monitor
performance on traditional Ethernet networks. These mechanisms are unsuitable for client-
layer E2E Ethernet operation and management.
PE1 PE2
CE1 IP/MPLS Core CE2
CFM supports service management, fault detection, and performance monitoring on the E2E
Ethernet network. In Figure 7-15:
l A network is logically divided into maintenance domains (MDs). For example, network
devices that a single Internet service provider (ISP) manages are in a single MD to
distinguish between ISP and user networks.
l Two maintenance association end points (MEPs) are configured on both ends of a
management network segment to be maintained to determine the boundary of an MD.
l Maintenance association intermediate points (MIPs) can be configured as needed. A
MEP initiates a test request, and the remote MEP (RMEP) or MIP responds to the
request. This process provides information about the management network segment to
help detect faults.
CFM supports level-specific MD management. An MD at a given level can manage MDs at
lower levels but cannot manage an MD at a higher level than its own. Level-specific MD
management is used to maintain a service flow based on level-specific MDs and different
types of service flows in an MD.
Continuity Check
CC monitors the connectivity of links between MEPs. A MEP periodically sends multicast
continuity check messages (CCMs) to an RMEP in the same MA. If an RMEP does not
receive a CCM within a period 3.5 times the interval at which CCMs are sent, the RMEP
considers the path between itself and the MEP faulty.
Figure 7-16 CC
M
CC
CC
M
MEP1
MEP3
CM
C
MA
MEP2
forward this CCM. This process prevents a lower level CCM from being sent to a higher
level MD.
Loopback
Loopback is also called 802.1ag MAC ping. Similar to IP ping, loopback monitors the
connectivity of a path between a local MEP and an RMEP.
A MEP initiates an 802.1ag MAC ping test to monitor the reachability of an RMEP or MIP
destination address. The MEP, MIP, and RMEP have the same level and they can share an
MA or be in different MAs. The MEP sends Loopback messages (LBMs) to the RMEP or
MIP. After receiving the messages, the RMEP or MIP replies with loopback replies (LBRs).
Loopback helps locate a faulty node because a faulty node cannot send an LBR in response to
an LBM. LBMs and LBRs are unicast packets.
The following example illustrates the implementation of loopback on the network shown in
Figure 7-17.
MEP
MIP
LBM data flow
LBR data flow
CFM is configured to monitor a path between PE1 (MEP1) and PE4 (MEP2). The MD level
of these MEPs is 6. A MIP with a level of 6 is configured on PE2 and PE3. If a fault is
detected in a link between PE1 and PE4, loopback can be used to locate the fault. Figure 7-18
illustrates the loopback process.
MEP1 can measure the network delay based on 802.1ag MAC ping results or the frame loss
ratio based on the difference between the number of LBMs and the number of LBRs.
Linktrace
Linktrace is also called 802.1ag MAC trace. Similar to IP traceroute, linktrace identifies a
path between two MEPs.
A MEP initiates an 802.1ag MAC trace test to monitor a path to an RMEP or MIP destination
address. The MEP, MIP, and RMEP have the same level and they can share an MA or be in
different MAs. A source MEP constructs and sends a Linktrace message (LTM) to a
destination MEP. After receiving this message, each MIP forwards it and replies with a
linktrace reply (LTR). Upon receipt, the destination MEP replies with an LTR and does not
forward the LTM. The source MEP obtains topology information about each hop on the path
based on the LTRs. LTMs are multicast packets and LTRs are unicast packets.
The following example illustrates the implementation of linktrace on the network shown in
Figure 7-19.
1. MEP1 sends MEP2 an LTM carrying a time to live (TTL) value and the MAC address of
the destination MEP2.
2. After the LTM arrives at MIP1, MIP1 reduces the TTL value in the LTM by 1 and
forwards the LTM if the TTL is not zero. MIP1 then replies with an LTR to MEP1. The
LTR carries forwarding information and the TTL value carried by the LTM when MIP1
received it.
3. After the LTM reaches MIP2 and MEP2, the process described above for MIP1 is
repeated for MIP2 and MEP2. In addition, MEP2 finds that its MAC address is the
destination address carried in the LTM and therefore does not forward the LTM.
4. The LTRs from MIP1, MIP2, and MEP2 provide MEP1 with information about the
forwarding path between MEP1 and MEP2.
If a fault occurs on the path between MEP1 and MEP2, MEP2 or a MIP cannot receive
the LTM or reply with an LTR. MEP1 can locate the faulty node based on such a
response failure. For example, if the link between MEP1 and MIP2 works properly but
the link between MIP2 and MEP2 fails, MEP1 can receive LTRs from MIP1 and MIP2
but fails to receive a reply from MEP2. MEP1 then considers the path between MIP2 and
MEP2 faulty.
hwDot1agCfmRDI A MEP receives a CCM frame with the RDI field set.
Alarm Anti-jitter
Multiple alarms and clear alarms may be generated on an unstable network enabled with CC.
These alarms consume system resources and deteriorate system performance. An RMEP
activation time can be set to prevent false alarms, and an alarm anti-jitter time can be set to
limit the number of alarms generated.
Function Description
Setting
RMEP Prevents false alarms. A local MEP with the ability to receive CCMs can
activation accept CCMs only after the RMEP activation time elapses.
time
Alarm Suppression
If different types of faults trigger more than one alarm, CFM alarm suppression allows the
alarm with the highest level to be sent to the NMS. If alarms persist after the alarm with the
highest level is cleared, the alarm with the second highest level is sent to the NMS. The
process repeats until all alarms are cleared.
7.4.1 Background
EFM and CFM are used to detect link faults. Y.1731 is an enhancement of CFM and is used to
monitor service performance.
Figure 7-20 shows typical Y.1731 networking. Y.1731 performance monitoring tools can be
used to assess the quality of the purchased Ethernet tunnel services or help a carrier conduct
regular service level agreement (SLA) monitoring.
Single-ended Collects frame loss To collect frame loss statistics, select either
Frame Loss statistics to assess the single- or dual-ended frame loss
Measurement quality of links between measurement:
MEPs, independent of l Dual-ended frame loss measurement
continuity check (CC). provides more accurate results than
Dual-ended Collects frame loss single-ended frame loss measurement.
Frame Loss statistics to assess link The interval between dual-ended frame
Measurement quality on CFM CC- loss measurements varies with the
enabled devices. interval between CCM transmissions.
The CCM transmission interval is
shorter than the interval between
single-ended frame loss measurements.
Dual-ended frame loss measurement
allows for a short interval between
dual-ended frame loss measurements.
l Single-ended frame loss measurement
can be used to minimize the impact of
many CCMs on the network.
One-way Measures the network To measure the link delay, select either
Frame Delay delay on a unidirectional one- or two-way frame delay
Measurement link between MEPs. measurement:
ETH-LCK Informs the server-layer The ETH-LCK function must work with
(sub-layer) MEP of the ETH-Test function.
administrative locking and
the interruption of traffic
destined for the MEP in the
inner maintenance domain
(MD).
ETH-LM
Ethernet frame loss measurement (ETH-LM) enables a local MEP and its RMEP to exchange
ETH-LM frames to collect frame loss statistics on E2E links. ETH-LM modes are classified
as near- or far-end ETH-LM.
reply (LMR) carrying an ETH-LM response. Figure 7-21 illustrates the process for
single-ended frame loss measurement.
ETH-LMM
ETH-LMR
CE2 CE4
CE3 CE6
Y.1731
After single-ended frame loss measurement is enabled, a MEP on PE1 sends an RMEP
on PE2 an ETH-LMM carrying an ETH-LM request. The MEP then receives an ETH-
LMR message carrying an ETH-LM response from the RMEP on PE2. The ETH-LMM
carries a local transmit counter TxFCl (with a value of TxFCf), indicating the time when
the message was sent by the local MEP. After receiving the ETH-LMM, PE2 replies with
an ETH-LMR message, which carries the following information:
– TxFCf: copied from the ETH-LMM
– RxFCf: value of the local counter RxFCl at the time of ETH-LMM reception
– TxFCb: value of the local counter TxFCl at the time of ETH-LMM transmission
After receiving the ETH-LMR message, PE1 measures near- and far-end frame loss
based on the following values:
– Received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and local counter
RxFCl value that is the time when this ETH-LMR message was received. These
values are represented as TxFCf[tc], RxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this ETH-LMR message was received.
– Previously received ETH-LMR message's TxFCf, RxFCf, and TxFCb values and
local counter RxFCl value that is the time when this ETH-LMR message was
received. These values are represented as TxFCf[tp], RxFCf[tp], TxFCb[tp], and
RxFCl[tp].
tp is the time when the previous ETH-LMR message was received.
Far-end frame loss = |TxFCf[tc] - TxFCf[tp]| - |RxFCf[tc] - RxFCf[tp]|
Near-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
Service packets are prioritized based on 802.1p priorities and are transmitted using
different policies. Traffic passing through the P device on the network shown in Figure
7-22 carries 802.1p priorities of 1 and 2.
Single-ended frame loss measurement is enabled on PE1 to send traffic with a priority of
1 to measure frame loss on a link between PE1 and PE2. Traffic with a priority of 2 is
also sent. After receiving traffic with priorities of 1 and 2, the P device forwards traffic
with a higher priority, delaying the arrival of traffic with a priority of 1 at PE2. As a
result, the frame loss ratio is inaccurate.
802.1p priority-based single-ended frame loss measurement can be enabled to obtain
accurate results.
User User
Network Network
Y.1731
Priority 1
Priority 2
ETH-CCM
ETH-CCM
CE2 CE4
CE3 CE6
Y.1731
After dual-ended frame loss measurement is configured, each MEP periodically sends a
CCM carrying a request to its RMEP. After receiving the CCM, the RMEP collects near-
and far-end frame loss statistics but does not forward the message. The CCM carries the
following information:
– TxFCf: value of the local counter TxFCl at the time of CCM transmission
– RxFCb: value of the local counter RxFCl at the time of the reception of the last
CCM
– TxFCb: value of TxFCf in the last received CCM
PE1 uses received information to measure near- and far-end frame loss based on the
following values:
– Received CCM's TxFCf, RxFCb, and TxFCb values and local counter RxFCl value
that is the time when this CCM was received. These values are represented as
TxFCf[tc], RxFCb[tc], TxFCb[tc], and RxFCl[tc].
tc is the time when this CCM was received.
– Previously received CCM's TxFCf, RxFCb, and TxFCb values and local counter
RxFCl value that is the time when this CCM was received. These values are
represented as TxFCf[tp], RxFCb[tp], TxFCb[tp], and RxFCl[tp].
tp is the time when the previous CCM was received.
Far-end frame loss = |TxFCb[tc] - TxFCb[tp]| - |RxFCb[tc] - RxFCb[tp]|
Near-end frame loss = |TxFCf[tc] - TxFCb[tp]| - |RxFCl[tc] - RxFCl[tp]|
ETH-DM
Delay measurement (DM) measures the delay and its variation. A MEP sends its RMEP a
message carrying ETH-DM information and receives a response message carrying ETH-DM
information from its RMEP.
ETH-DM supports the following modes:
l One-way frame delay measurement
A MEP sends its RMEP a 1DM message carrying one-way ETH-DM information. After
receiving this message, the RMEP measures the one-way frame delay and its variation.
One-way frame delay measurement can be implemented only after the MEP
synchronizes the time with its RMEP. The delay variation can be measured regardless of
whether the MEP synchronizes the time with its RMEP. If a MEP synchronizes its time
with its RMEP, the one-way frame delay and its variation can be measured. If the time is
not synchronized, only the one-way delay variation can be measured.
One-way frame delay measurement can be implemented in either of the following
modes:
– On-demand measurement: calculates the one-way frame delay at a time or a
specific number of times for diagnosis.
– Proactive measurement: calculates the one-way frame delay periodically.
Figure 7-24 illustrates the process for one-way frame delay measurement.
1DM PDU
CE2 CE4
Y.1731
One-way frame delay measurement is implemented on an E2E link between a local MEP
and its RMEP. The local MEP sends 1DMs to the RMEP and then receives replies from
the RMEP. After one-way frame delay measurement is configured, a MEP periodically
sends 1DMs carrying TxTimeStampf (the time when the 1DM was sent). After receiving
the 1DM, the RMEP parses TxTimeStampf and compares this value with RxTimef (the
time when the DM frame was received). The RMEP calculates the one-way frame delay
based on these values using the following equation:
Frame delay = RxTimef - TxTimeStampf
The frame delay can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets are used to prioritize services. Traffic passing
through the P device on the network shown in Figure 7-25 carries 802.1p priorities of 1
and 2.
One-way frame delay measurement is enabled on PE1 to send traffic with a priority of 1
to measure the frame delay on a link between PE1 and PE2. Traffic with a priority of 2 is
also sent. After receiving traffic with priorities of 1 and 2, the P device forwards traffic
with a higher priority, delaying the arrival of traffic with a priority of 1 at PE2. As a
result, the frame delay calculated on PE2 is inaccurate.
802.1p priority-based one-way frame delay measurement can be enabled to obtain
accurate results.
1DM PDU
User User
Network Network
Y.1731
Priority 1
Priority 2
CE3 CE6
Y.1731
(the time when the DMM was received) and TxTimeStampb (the time when the DMR
was sent). The value in every field of the DMM is copied to the DMR, with the
exception that the source and destination MAC addresses were interchanged. Upon
receipt of the DMR message, the MEP calculates the two-way frame delay using the
following equation:
Frame delay = (RxTimeb - TxTimeStampf) - (TxTimeStampb - RxTimeStampf)
The frame delay can be used to measure the delay variation.
A delay variation is an absolute difference between two delays.
802.1p priorities carried in service packets are used to prioritize services. Traffic passing
through the P device on the network shown in Figure 7-27 carries 802.1p priorities of 1
and 2.
Two-way frame delay measurement is enabled on PE1 to send traffic with a priority of 1
to measure the frame delay on a link between PE1 and PE2. Traffic with a priority of 2 is
also sent. After receiving traffic with priorities of 1 and 2, the P device forwards traffic
with a higher priority, delaying the arrival of traffic with a priority of 1 at PE2. As a
result, the frame delay calculated on PE2 is inaccurate.
802.1p priority-based two-way frame delay measurement can be enabled to obtain
accurate results.
DMR
User User
Network Network
Y.1731
Priority 1
Priority 2
AIS
AIS is a protocol used to transmit fault information.
A MEP is configured in MD1 with a level of 6 on each of CE1 and CE2 access interfaces on
the user network shown in Figure 7-28. A MEP is configured in MD2 with a level of 3 on
each of PE1 and PE2 access interfaces on a carrier network.
l If CFM detects a fault in the link between AIS-enabled PEs, CFM sends AIS packet data
units (PDUs) to CEs. After receiving the AIS PDUs, the CEs suppress alarms,
VLL/VPLS/VLAN
VLAN/QinQ VLAN/QinQ
MD2 Level 3
MD1 Level 6
ETH-Test
ETH-Test is used to perform one-way on-demand in-service or out-of-service diagnostic tests
on the throughput, frame loss, and bit errors.
ETH-Test provides two types of test modes: out-of-service ETH-Test and in-service ETH-
Test:
l Out-of-service ETH-Test mode: Client data traffic is interrupted in the diagnosed entity.
To resolve this issue, the out-of-service ETH-Test function must be used together with
the ETH-LCK function.
l In-service ETH-Test mode: Client data traffic is not interrupted, and the frames with the
ETH-Test information are transmitted using part of bandwidths.
ETH-LCK
ETH-LCK is used for administrative locking on the MEP in the outer MD with a higher level
than the inner MD, that is, preventing CC alarms from being generated in the outer MD.
When implementing ETH-LCK, a MEP in the inner MD sends frames with the ETH-LCK
information to the MEP in the outer MD. After receiving the frames with the ETH-LCK
information, the MEP in the outer MD can differentiate the alarm suppression caused by
administrative locking from the alarm suppression caused by a fault in the inner MD (the AIS
function).
To suppress CC alarms from being generated in the outer MD, ETH-LCK is implemented
with out-of-service ETH-Test. A MEP in the inner MD with a lower level initiates ETH-Test
by sending an ETH-LCK frame to a MEP in the outer MD. Upon receipt of the ETH-LCK
frame, the MEP in the outer MDsuppresses all CC alarms immediately and reports an ETH-
LCK alarm indicating administrative locking. Before out-of-service ETH-Test is complete,
the MEP in the inner MD sends ETH-LCK frames to the MEP in the outer MD. After out-of-
service ETH-Test is complete, the MEP in the inner MD stops sending ETH-LCK frames. If
the MEP in the outer MD does not receive ETH-LCK frames for a period 3.5 times provided
that; if the specified interval, it releases the alarm suppression and reports a clear ETH-LCK
alarm.
As shown in Figure 7-29, MD2 with the level of 3 is configured on PE1 and PE2; MD1 with
the level of 6 is configured on CE1 and CE2. When PE1's MEP1 sends out-of-service ETH-
Test frames to PE2's MEP2, MEP1 also sends ETH-LCK frames to CE1's MEP11 and CE2's
MEP22 separately to suppress MEP11 and MEP22 from generating CC alarms. When MEP1
stops sending out-of-service ETH-Test frames, it also stops sending ETH-LCK frames. If
MEP11 and MEP22 do not receive ETH-LCK frames for a period 3.5 times provided that; if
the specified interval, they release the alarm suppression.
ETH-LCK
ETH-LCK ETH-Test
Single-ended ETH-SLM
SLM measures frame loss using synthetic frames instead of data traffic. When implementing
SLM, the local MEP exchanges frames containing ETH-SLM information with one or more
RMEPs.
Figure 7-30 demonstrates the process of single-ended SLM:
1. The local MEP sends ETH-SLM request frames to the RMEPs.
2. After receiving the ETH-SLM request frames, the RMEPs send ETH-SLM reply frames
to the local MEP.
A frame with the single-ended ETH-SLM request information is called an SLM, and a frame
with the single-ended ETH-SLM reply information is called an SLR. SLM frames carry SLM
protocol data units (PDUs), and SLR frames carry SLR PDUs.
Single-ended SLM and single-ended frame LM are differentiated as follows: On the point-to-
multipoint network shown in Figure 7-30, inward MEPs are configured on PE1's and PE3's
interfaces, and single-ended frame LM is performed on the PE1-PE3 link. Traffic coming
through PE1's interface is destined for both PE2 and PE3, and single-ended frame LM will
collect frame loss statistics for all traffic, including the PE1-to-PE2 traffic. As a result, the
collected statistics are not accurate. Unlike singled-ended frame LM, single-ended SLM
collects frame loss statistics only for the PE1-to-PE3 traffic, which is more accurate.
PE2 CE2
User
Network
CE1 PE1
User
Network
Network
CE3
User
Network
SLM PE3
SLR
When implementing single-ended SLM, PE1 sends SLM frames to PE3 and receives SLR
frames from PE3. SLM frames contain TxFCf, the value of TxFCl (frame transmission
counter), indicating the frame count at the transmit time. SLR frames contain the following
information:
l TxFCf: value of TxFCl (frame transmission counter) indicating the frame count on PE1
upon the SLM transmission
l TxFCb: value of RxFC1 (frame receive counter) indicating the frame count on PE3 upon
the SLR transmission
After receiving the last SLR frame during a measurement period, a MEP on PE1 measures the
near-end and far-end frame loss based on the following values:
l Last received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive counter)
indicating the frame count on PE1 upon the SLR reception. These values are represented
as TxFCf[tc], TxFCb[tc], and RxFCl[tc].
tc indicates the time when the last SLR frame was received during the measurement
period.
l Previously received SLR's TxFCf and TxFCb, and value of RxFC1 (frame receive
counter) indicating the frame count on PE1 upon the SLR reception. These values are
represented as TxFCf[tp], TxFCb[tp], and RxFCl[tp].
tp indicates the time when the last SLR frame was received during the previous
measurement period.
Far-end frame loss = |TxFCf[tc] – TxFCf[tp]| – |TxFCb[tc] – TxFCb[tp]|
Near-end frame loss = |TxFCb[tc] – TxFCb[tp]| – |RxFCf[tc] – RxFCf[tp]|
On a network, each packet carries the IEEE 802.1p field, indicating its priority. According to
packet priority, different QoS policies will be applied. On the network shown in Figure 7-31,
the PE1-to-PE3 traffic has two priorities: 1 and 2, as indicated by the IEEE 802.1p field.
When implementing single-ended SLM for traffic over the PE1-PE3 link, PE1 sends SLM
frames with varied priorities and checks the frame loss. Based on the check result, the
network administrator can adjust the QoS policy for the link.
PE2 CE2
User
Network
CE1 PE1
User Network
Network Y.1731 CE3
User
Network
PE3
Y.1731
MEP
Priority 1
Priority 2
ETH-BN
Ethernet bandwidth notification (ETH-BN) enables server-layer MEPs to notify client-layer
MEPs of the server layer's connection bandwidth when routing devices connect to microwave
devices. The server-layer devices are microwave devices, which dynamically adjust the
bandwidth according to the prevailing atmospheric conditions. The client-layer devices are
routing devices. Routing devices can only function as ETH-BN packets' receive ends and
must work with microwave devices to implement this function.
As shown in Figure 7-32, server-layer MEPs are configured on the server-layer devices, and
the ETH-BN sending function is enabled. The levels of client-layer MEPs must be specified
for the server-layer MEPs when the ETH-BN sending function is enabled. Client-layer MEPs
are configured on the client-layer devices, and the ETH-BN receiving function is enabled. The
levels of the client-layer MEPs are the same as those specified for the server-layer MEPs.
l If the ETH-BN function has been enabled on the server-layer devices Device2 and
Device3 and the bandwidth of the server-layer devices' microwave links decreases, the
server-layer devices send ETH-BN packets to the client-layer devices (Device1 and
Device4). After receiving the ETH-BN packets, the client-layer MEPs can use
bandwidth information in the packets to adjust service policies, for example, to reduce
the rate of traffic sent to the degraded links.
l When the server-layer devices' microwave links work properly, whether to send ETH-
BN packets is determined by the configuration of the server-layer devices. When the
server-layer microwave devices stop sending ETH-BN packets, the client-layer devices
do not receive any ETH-BN packets. The ETH-BN data on the client-layer devices is
aged after 3.5 times the interval at which ETH-BN packets are sent.
NOTE
When planning ETH-BN, you must check that the service burst traffic is consistent with a device's
buffer capability.
256Q 128Q 64Q 32Q 16Q QP 16Q 32Q 64Q 128Q 256
AM AM AM AM AM SK AM AM AM AM QAM
Device2 Device3
Bandwidth=B2
B2
B1 Server Server
Client Client
MEP MEP MEP MEP
Usage Scenario
Y.1731 supports performance statistics collection on both end-to-end and end-to-multi-end
links.
On the network shown in Figure 7-33, Y.1731 collects statistics about the end-to-end link
performance between the CE and PE1, between PE1 and PE2, or between the CE and PE3.
On the network shown in Figure 7-34, user-to-network traffic from different users traverses
CE1 and CE2 and is converged on CE3. CE3 forwards the converged traffic to the UPE.
Network-to-user traffic traverses CE3, and CE3 forwards the traffic to CE1 and CE2.
When Y.1731 is used to collect statistics about the link performance between the CE and the
UPE, end-to-end performance statistics collection cannot be implemented. This is because
only one inbound interface (on the UPE) sends packets but two outbound interfaces (on CE1
and CE2) receive the packets. In this case, statistics on the outbound interfaces fail to be
collected. To resolve this issue, end-to-multi-end performance statistics collection can be
implemented.
The packets carry the MAC address of CE1 or CE2. The UPE identifies the outbound
interface based on the destination MAC address carried in the packets and collects end-to-end
performance statistics.
Core
Y.1731 Y.1731
Y.1731
VoIP CE2
Y.1731
STB
7.5.1 Background
Link detection protocols are used to monitor the connectivity of links between devices and
detect faults. A single fault detection protocol cannot detect all faults in all links on a complex
network. A combination of protocols and techniques must be used to detect link faults.
Ethernet OAM detects faults in Ethernet links and advertises fault information to interfaces or
other protocol modules. Ethernet OAM fault advertisement is implemented by an OAM
manager (OAMMGR) module, application modules, and detection modules. An OAMMGR
module associates one module with another. A detection module monitors link status and
network performance. If a detection module detects a fault, it instructs the OAMMGR module
to notify an application module or another detection module of the fault. After receiving the
Figure 7-35 Fault information advertisement between EFM and detection modules
The following example illustrates fault information advertisement between EFM and
detection modules over a path CE5 -> CE4 -> CE1-> PE2 -> PE4 on the network shown in
Table 7-14.
Table 7-14 Fault information advertisement between EFM and detection modules
Function Issue to Be Resolved Solution
Deployment
Figure 7-36 Fault information advertisement between EFM and application modules
PE-AGG1
PE1
EFM for VRRP NPE1
UPE
DSLAM
IP Core
Switch2 NPE2
EFM for VRRP PE2
PE-AGG2
Table 7-15 describes fault information advertisement between EFM and VRRP modules.
Table 7-15 Fault information advertisement between EFM and VRRP modules
Function Issue to Be Resolved Solution
Deployment
NOTE
In a virtual access scenario, EFM supports virtual access, but EFM cannot be associated with BFD,
MAC entry clearing, 802.1ag, or VRRP.
Figure 7-37 Networking for fault information advertisement between CFM and detection
modules
PE1
CE1 UPE1
IP Core
PE2
Port1
PE4 PE6
EFM
PE8
CFM CFM
BFD
The following example illustrates fault information advertisement between CFM and
detection modules over a path UPE1 -> PE2 -> PE4 -> PE6 -> PE8 on the network shown in
Table 7-16.
Table 7-16 Fault information advertisement between CFM and detection modules
Function Issue to Be Resolved Solution
Deployment
CFM is used to Although CFM detects a fault CFM can be associated with port
monitor the link in the link between UPE1 and 1.
between UPE1 and PE4, CFM cannot notify PE6 l If CFM detects a fault, it
PE4. of the fault. As a result, PE6 instructs the OAMMGR
still forwards network traffic to module to disconnect port 1
PE4, causing a traffic intermittently. This operation
interruption. allows other modules to detect
Although port 1 on PE4 goes the fault.
Down, port 1 cannot notify l If port 1 goes Down, it
CE1 of the fault. As a result, instructs the OAMMGR
CE1 still forwards user traffic module to notify CFM of the
to PE4, causing a traffic fault. After receiving the
interruption. notification, CFM notifies PE6
of the fault.
The association between CFM and
a port is used to detect faults in an
active link of a link aggregation
group or in the link aggregation
group in 1:1 active/standby mode.
If a fault is detected, a protection
switchover is triggered.
EFM is used to Although CFM detects a fault, The EFM module can be
monitor the direct CFM cannot notify CE1 of the associated with the CFM module.
link between CE1 fault. As a result, CE1 still l If the EFM module detects a
and UPE1, and forwards user traffic to PE4, fault, it instructs the
CFM is used to causing a traffic interruption. OAMMGR module to notify
monitor the link the CFM module of the fault.
between UPE1 and
PE4. l If the CFM module detects a
fault, it instructs the
OAMMGR module to notify
the EFM module of the fault.
The association allows a module
to notify another associated
module of a fault and to send an
alarm to an NMS. A network
administrator analyzes alarm
information and takes measures to
rectify the fault.
IP/MPLS Core
Core NPE2
NPE1
Master Backup
ma2
CF
M
m
a1
CFM
Aggregation
PE-AGG1 PE-AGG2
CFM
LSW1 LSW5
SEP
Segment1
LSW2 LSW4
LSW3
Access
IP/MPLS Core
NPE1 NPE2
Master Backup
CFM CFM
PE-AGG1 PE-AGG2
PW1 PW2
UPE
CE
Table 7-17 describes fault information advertisement between CFM and VRRP modules.
Table 7-17 Fault information advertisement between CFM and VRRP modules
Function Deployment Issue to Be Resolved Solution
l A VRRP backup group If a fault occurs on the link CFM can be associated with
is configured to between NPE1 (the master) the VRRP module on NPEs.
determine the master/ and PE-AGG1, NPE2 cannot If CFM detects a fault in the
backup status of receive VRRP packets within link between PE-AGG1 and
network provider a period of three times the NPE1, it instructs the
edges (NPEs). interval at which VRRP OAMMGR module to notify
l CFM is used to packets are sent. NPE2 then the VRRP module of the
monitor links between preempts the Master state. As fault. After receiving the
NPEs and PE-AGGs. a result, two master devices notification, the VRRP
coexist in a VRRP backup module triggers a master/
group, and the UPE receives backup VRRP switchover.
double copies of network NPE1 then changes its VRRP
traffic. status to Initialize. NPE2
changes its VRRP status from
Backup to Master after a
period of three times the
interval at which VRRP
packets are sent. This process
prevents two master devices
from coexisting in the VRRP
backup group.
IP/MPLS core
NPE1 NPE2
Core
PE-AGG1 PE-AGG2
CFM/Y.1731
Aggregation
VPLS
UPE7 UPE5
UPE4
UPE6
UPE1 UPE2
UPE3
EFM
Acess DSLAM1
LSW1 LSW2 DSLAM2
Figure 7-40 shows a typical MAN network. The following example illustrates Ethernet OAM
applications on a MAN.
l EFM is used to monitor P2P direct links between a digital subscriber line access
multiplexer (DSLAM) and a user-end provider edge (UPE) or between a LAN switch
(LSW) and a UPE. If EFM detects errored frames, codes, or frame seconds, it sends
alarms to the network management system (NMS) to provide information for a network
administrator. EFM uses the loopback function to assess link quality.
l CFM is used to monitor E2E links between a UPE and an NPE or between a UPE and a
provider edge-aggregation (PE-AGG). A network planning engineer groups the devices
of each Internet service provider (ISP) into an MD and maps a type of service to an MA.
A network maintenance engineer enables maintenance points to exchange CCMs to
monitor network connectivity. After receiving an alarm on the NMS, a network
administrator can enable loopback to locate faults or enable linktrace to discover paths.
l Y.1731 is used to measure packet loss and the delay on E2E links between a UPE and an
NPE or between a UPE and a PE-AGG at the aggregation layer.
UPE1 NPE1
PE-AGG1 RSG1 RNC1
NodeB/
eNodeB CSG1
UPE2 NPE2
A mobile backhaul network shown in Figure 7-41 consists of a transport network between a
cell site gateway (CSG) and remote service gateways (RSGs) and a wireless network between
NodeBs/eNodeBs and the CSG. Carriers operate the transport and wireless networks
separately. Therefore, traffic transmitted on the transport network of one carrier is invisible to
devices on the wireless network of another carrier.
Ethernet OAM can be used on the transport and wireless networks to identify and locate
faults.
l EFM monitors Layer 2 links between a NodeB/eNodeB and CSG1.
– EFM is used to monitor the connectivity of links between a NodeB/eNodeB and
CSG1 or between RNCs and RSGs.
– EFM detects errored codes, frames, and frame seconds on links between a NodeB/
eNodeB and CSG1 and between RNCs and RSGs. If the number of errored codes,
frames, or frame seconds exceeds a configured threshold, an alarm is sent to the
NMS. A network administrator is notified of link quality deterioration and can
assess the risk of adverse impact on voice traffic.
– Loopback is used to monitor the quality of voice links between a NodeB/eNodeB
and CSG1 or between RNCs and RSGs.
l CFM is used to locate faulty links over which E2E services are transmitted.
– CFM periodically monitors links between cell site gateway (CSG) 1 and remote site
gateways (RSGs). If CFM detects a fault, it sends an alarm to the NMS. A network
administrator analyzes alarm information and takes measures to rectify the fault.
– Loopback and linktrace are enabled on links between CSG1 and the RSGs to help
link fault diagnosis.
l Y.1731 is used together with CFM to monitor link performance and voice and data traffic
quality.
2
Optical Optical
module A module B
1
8 Dual-Device Backup
According to the types of services to be backed up, dual-device backup can be classified as
follows:
l Dual-Device ARP Hot Backup
l Unicast dual-device backup for IPv4
Unicast dual-device backup for IPv4 uses the dual-device backup technology to back up
IPoE/PPPoE user information. When the master device or the link, interface, or board of
the master device is faulty, services can be quickly switched to the slave device.
l Multicast dual-device backup for IPv4
Multicast dual-device backup for IPv4 is based on the dual-device backup for access
users.
The multicast dual-device hot backup technology provides high reliability when
theNE40E is used as the service control point and multicast replication point. This
technology mainly applies to the Interactive Personality TV (IPTV) service scenario.
When deploying the IPTV service, operators use the BRAS as the multicast replication
point. Set Top Boxes (STBs) can be classified into Dynamic Host Configuration Protocol
(DHCP) STBs and Point-to-Point Protocol over Ethernet (PPPoE) STBs. Hot backup
ensures service continuity by allowing the slave BRAS to take over services when the
master BRAS is being upgraded or a fault occurs on the user side or network side of the
master BRAS.
Related Concepts
If VRRP is used as a master/backup status negotiation protocol, dual-device backup involves
the following concepts:
l VRRP
VRRP is a fault-tolerant protocol that groups several routers into a virtual router. If the
next hop of a host is faulty, VRRP switches traffic to another router, which ensures
communication continuity and reliability.
For details about VRRP, see the chapter "VRRP" in NE40E Feature Description -
Network Reliability.
l RUI
RUI is a Huawei-specific redundancy protocol that is used to back up user information
between devices. RUI, which is carried over the Transmission Control Protocol (TCP),
specifies which user information can be transmitted between devices and the format and
amount of user information to be transmitted.
l RBS
The remote backup service (RBS) is an RUI module used for inter-device backup. A
service module uses the RBS to synchronize service control data from the master device
to the backup device. When a master/backup VRRP switchover occurs, service traffic
quickly switches to a new master device.
l RBP
The remote backup profile (RBP) is a configuration template that provides a unified user
interface for dual-device backup configurations.
If E-Trunk is used as a master/backup status negotiation protocol, dual-device backup
involves the following concept:
l E-Trunk
E-Trunk implements inter-device link aggregation, providing device-level reliability. E-
Trunk aggregates data links of multiple devices to form a link aggregation group (LAG).
If a link or device fails, services are automatically switched to the other available links or
devices in the E-Trunk, improving link and device-level reliability.
For details about E-Trunk, see "E-Trunk" in NE40E Feature Description - LAN Access
and MAN Access.
Purpose
In traditional service scenarios, all users use a single device to access a network. Once the
device or the link directly connected to the device fails, all user services are interrupted, and
the service recovery time is uncertain. To resolve this issue, deploy dual-device backup to
enable the master device to back up service control data to the backup device in real time.
l The NE40E supports only dual-device hot backup for Address Resolution Protocol
(ARP) services, also called dual-device ARP hot backup.
Dual-device ARP hot backup enables the master device to back up the ARP entries at the
control and forwarding layers to the backup device in real time. When the backup device
switches to a master device, it uses the backup ARP entries to generate host routing
information without needing to relearn ARP entries, ensuring downlink traffic continuity.
– Manually triggered dual-device ARP hot backup: You must manually establish a
backup platform and backup channel for the master and backup devices. In
addition, you must manually trigger ARP entry backup from the master device to
the backup device. This backup mode has complex configurations.
– Automatically enabled dual-device ARP hot backup: You need to establish only a
backup channel between the master and backup devices, and the system
automatically implements ARP entry backup. This backup mode has simple
configurations.
l Dual-device IGMP snooping hot backup enables the master device to back up IGMP
snooping entries to the backup device in a master/backup E-Trunk scenario. If the master
device or the link between the master device and user fails, the backup device switches
to a master device and takes over, ensuring multicast service continuity.
Benefits
l Benefits to users
– Improved user experience
– The dual-device backup for user access improves network reliability. When the
network is faulty, the slave BRAS can quickly take over user services. And in this
case, users can use network resources continuously without realizing the network
failure.
l Benefits to operators
Improving network reliability from the perspective of service reliability.
8.2.1 Overview
The NE40E ensures high reliability of services through the following approaches:
l Status control: Several BRASs negotiate a master BRAS through VRRP. With the help
of BFD or Ethernet OAM, the master BRAS can detect a link fault quickly and traffic
can be switched to the standby BRAS immediately.
l Service control: Information about access users is backed up to the standby BRAS from
the master BRAS through TCP. This ensures service consistency.
l Route control: By controlling routes in the address pool or user routes in a real-time
manner, the BRAS ensures that downstream traffic can reach users smoothly when an
active/standby switchover occurs.
Device1
Host A
Device2
Host B
Network
Device3
Host C
Virtual
Ethernet
Router
On the LAN, hosts need to obtain only the IP address of the virtual router rather than the IP
address of each router in the backup group. The hosts set the IP address of the virtual router as
the address of their default gateway. Then, the hosts can communicate with an external
network through the virtual gateway.
VRRP dynamically associates the virtual router with a physical router that transmits services.
When the physical router fails, another router is selected to take over services and user
services are not affected. The internal network and the external network can communicate
without interruption.
2. If the management priorities of the two masters are the same, the master with a smaller
management IP address becomes the primary master, and the master with a larger
management IP address becomes the secondary master.
If dual-device hot backup is deployed in a virtual access scenario, the master device functions
as the primary master, and the standby device functions as the secondary master in the virtual
access system. If the primary master fails, the system uses Diameter to re-negotiate the
primary and secondary states of the two masters.
AP
Diameter
User IP/MPLS
Network Core
Secondary Master
Figure 8-3 Diagram of the active/standby switchover for high reliability of services
LSW-1 BFD Device1
LINK 1 LI
N
3 K
VRRP+BFD MAN
LINK 2
LSW-2 BFD Device2
As shown in Figure 8-3, the two routers negotiate the master and standby states using VRRP.
The NE40E supports active/standby status selection of interfaces and sub-interfaces.
BFD is enabled between the two routers to detect links between the two devices. BFD in this
mode is called Peer-BFD. BFD is also enabled between the router and the LSW to detect links
between the router and the LSW. BFD in this mode is called Link-BFD.
When a link fails, through VRRP, the new master and standby devices can be negotiated, but
several seconds are needed and the requirements of carrier-grade services cannot be met.
Through BFD or Eth OAM, a faulty link can be detected in several milliseconds and the
device can perform a fast active/standby switchover with the help of VRRP.
During the implementation of an active/standby switchover, VRRP has to determine device
status based on Link-BFD status and Peer-BFD status. As shown in Figure 8-3, when Link 1
fails, the Peer-BFD status and Link-BFD status of Device1 both go down and Device1
becomes the standby device. In this case, the Peer-BFD status of Device2 goes down but the
Link-BFD status of Device2 is still up. Therefore, Device2 becomes the master device.
In actual networking, certain LSWs may not support BFD. In this case, you have to select
another detection mechanism. Besides BFD, the NE40E also supports detection of links
connecting to LSWs through Eth OAM.
The NE40E supports monitoring of upstream links (for example, Link 3 in Figure 8-3) to
enhance reliability protection for the network side. When an upstream link fails, the NE40E
responds to the link failure quickly and performs an active/standby link switchover.
UCL-Group UCL for user group policy control delivered by the RADIUS
server.
Attribute Description
Attribute Description
Radius client IP address Source IP address carried in a received RADIUS packet sent by
a client when the BAS device functions as a RADIUS proxy.
When backing up information about access users, you need to ensure that the configurations
of the active and standby BRASs are consistent, including the IP address, VLAN, and QoS
parameters. You need to ensure the consistency of common attributes. The special attributes
of a user are backed up through TCP. Figure 8-4 shows the process of backing up the special
attributes of a user. A TCP connection can be set up based on the uplinks connecting to the
MAN.
Figure 8-4 Diagram for user information backup for high service reliability
LSW-1 BFD Device1
LINK 1 LI
N
3 K
LINK 2
LSW-2 BFD Device2
The user information backup function supports backup of information about authentication,
accounting, and authorization of users. The NE40E controls user access according to the
master/backup status negotiated through VRRP. Only the active device can handle users'
access requests and perform authentication, real-time accounting, and authorization for users.
The standby device discards users' access requests.
After a user logs on through the active device, the active device backs up information about
the user to the standby device through TCP. The standby device generates a corresponding
service based on user information. This ensures that the standby device can smoothly take
over services from the active device when the active device fails.
When the active device fails (for example, the system restarts), services are switched to the
standby device. When the active device recovers, services need to be switched back. The
active device, however, lacks information about users. Therefore, information about users on
the standby device must be backed up to the active device in batch. At present, the maximum
rate of information backup is 1000 pieces of information per second.
As shown in Figure 8-5, the entire service control process can be divided into the following
phases:
1. Backup phase
– The two NE40Es negotiate the active device (Device1) and standby device
(Device2) using VRRP.
– A user logs on through Device1, and information about this user is backed up to
Device2 in a real-time manner.
– The two NE40Es detect the link between them through BFD or Ethernet OAM.
2. Switchover phase
– For user-to-network traffic, if a link to Device 1 fails, VRRP, with the help of BFD
or Ethernet OAM, rapidly switches Device 1 to the backup state and Device 2 to the
master state and advertises gratuitous ARP packets to update the MAC address
table on the LSW, which allows following user packets to successfully reach
Device2.
– For network-to-user traffic, if a link to Device 1 fails, Device 2 forwards traffic
based on the backup ARP entry, preventing traffic loss.
3. Switchback phase
– The link on the Device1 recovers, and VRRP renegotiates the active device and the
standby device. Then, Device1 acts as the active device; Device2 acts as the
standby device. In this case, Device2 needs to back up information about all users
to Device1 in batch and Device1 needs to back up information about users on it to
Device2. User entry synchronization between the two devices is bidirectional.
– Before the batch backup is completed, the VRRP switchover is not performed. At
this time, Device1 is still the standby device and Device2 is still the active device.
When the batch backup is completed, the VRRP switchover is performed. Device1
becomes the active device and sends a free ARP packet; Device2 becomes the
standby device and completes switchback of user services.
Figure 8-5 Flowchart for service control for high service reliability
Router1 Router2
VRRP negotiation
Backup
Master Standby
User information
backup
Switch
User information
back
backup
Master Standby
NOTICE
The NE40E provides high reliability protection for Web authentication users. The principle of
high reliability protection for Web authentication users is similar to that for ordinary access
users. No special configuration is needed on the Web server.
After the active/standby switchover, Device1 acts as the standby device and Device2 acts as
the active device. Device1 withdraws the route and Device2 advertises the route. In this case,
traffic can be forwarded from the router to the PC through Device2.
You need to ensure that no fault occurs on the active device after a switchover caused by a
link failure or device failure. Route control in this mode is based on the active/standby status
of the device. You can ensure that traffic can be forwarded from the router to the PC by
controlling a route.
Device1
B FD
VRRP+BFD
LSW
Device
BF D
Device2
Router1
1.
Router
LSW
2.
Router2
1. No fault occurs
Router1
1.
Router
LSW
2.
Router2
2. A rapid active/standby switchover is performed after a link fails
Router1
1.
Router
LSW
2.
Router2
Dual-device hot backup must be configured before multicast hot backup is configured.
b. After receiving DHCP packets, the master BRAS attempts to authenticate user
information. If authentication is successful, the master BRAS allocates an IP
address to the user. The slave BRAS does not provide access services for the user.
c. The user gets online successfully.
d. The master BRAS sends user information to the slave BRAS along a backup
channel. The slave BRAS uses the information to locally generate control and
forwarding information for the user.
Figure 8-8 Hot backup for multicast traffic sent to a DHCP STB
Multicast
traffic
IGMP IGMP
Multicast
traffic
IGMP
On the network shown in Figure 8-8, the procedure for ordering multicast programs is as
follows:
a. A DHCP STB sends an Internet Group Management Protocol (IGMP) Report
message to an aggregation switch, and the switch forwards the message to both the
master and slave BRASs.
b. Both the master and slave BRASs receive the IGMP Report message, and pull
multicast traffic from multicast sources.
c. The master BRAS replicates multicast traffic to the STB, but the slave BRAS does
not.
4. The master BRAS replicates multicast traffic to the STB, but the slave BRAS does not.
Figure 8-9 Multicast service hot backup for a DHCP STB using SmartLink to control
active and standby links
Multicast
traffic
Backup IGMP
message
VRRP VRRP
IGMP
Figure 8-10 Multicast service hot backup for a DHCP STB using E-Trunk to control
active and standby links
Backup IGMP
Multicast
message
traffic
VRRP VRRP
E-Trunk E-Trunk
IGMP
DSLAM1 Device-1
PC1
PC2 SW
DSLAM2 Device-2
To prevent a user from detecting the active link fault, NE40E-2 must use the same link-local
address and MAC address as those of NE40E-1.
Device-1
DHCPv6
Server
DHCPv6
PC DSLAM
Relay
Device-2
On the network shown in Figure 8-12, the NE40Es act as the master and backup DHCPv6
servers by running VRRP. The master DHCPv6 server assigns an IPv6 address to the PC. The
DHCPv6 packets that the master DHCPv6 server sends carry the DHCP unique identifier
(DUID), which uniquely identifies the DHCPv6 server. If RUI is enabled for the two
DHCPv6 servers, to ensure that the new master DHCPv6 server sends correct DHCPv6
packets to the PC after a master/backup switchover, the master and backup DHCPv6 servers
must use the same DUID.
The PC automatically generates a DUID in the link-layer address (DUID-LL) mode using the
virtual MAC address of the VRRP backup group. This process avoids the need to configure a
DUID in the link-layer address plus time (DUID-LLT) mode or configure a DUID statically.
After the DUID is generated in the DUID-LL mode, the master and backup DHCPv6 servers
do not use the globally configured DUID, saving the process of backing up the DUID
between the servers.
Figure 8-13 RUI networking where NE40Es function as DHCPv6 relay agents
Device-1
DHCPv6
Relay
DHCPv6
PC DSLAM
Server
Device-2
On the network shown in Figure 8-13, the NE40Es act as the master and backup DHCPv6
relay agents. A unique DHCPv6 relay agent remote-ID identifies the master DHCPv6 relay
agent. In the RUI-enabled scenario, to enable the backup DHCPv6 relay agent to forward the
DHCPv6 packets after a master/backup switchover, the master and backup DHCPv6 relay
agents must use the same DHCPv6 relay agent remote-ID. This way ensures that the DHCPv6
server processes the packets correctly.
The RUI-enabled PC uses the DUID that identifies the master and backup DHCPv6 servers as
the DHCPv6 relay agent remote-ID to identify both the master and backup DHCPv6 relay
agents.
NOTE
Dual-device ARP hot backup applies in both Virtual Router Redundancy Protocol (VRRP) and enhanced
trunk (E-Trunk) scenarios. This section describes the implementation of dual-device ARP hot backup in
VRRP scenarios.
Figure 8-14 shows a typical network topology in which a Virtual Router Redundancy
Protocol (VRRP) backup group is deployed. In the topology, Device A is a master device, and
Device B is a backup device. In normal circumstances, Device A forwards both uplink and
downlink traffic. If Device A or the link between Device A and the switch fails, a master/
backup VRRP switchover is triggered to switch Device B to the Master state. Device B needs
to advertise a network segment route to a device on the network side to direct downlink traffic
from the network side to Device B. If Device B has not learned ARP entries from a device on
the user side, the downlink traffic is interrupted. Device B forwards the downlink traffic only
after it learns ARP entries from a device on the user side.
DeviceA
Master
DeviceC
VRRP
User IP/MPLS
Network Core
DeviceB
Backup
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not learn ARP
entries from a device on the user side, deploy dual-device ARP hot backup on Device A and
Device B, as shown in Figure 8-15.
DeviceA
Master
Dual-Device ARP
DeviceC
Hot Backup
VRRP
User IP/MPLS
Network Core
DeviceB
Backup
After the deployment, Device B backs up the ARP entries on Device A in real time. If a
master/backup VRRP switchover occurs, Device B forwards downlink traffic based on the
backup ARP entries without needing to relearn ARP entries from a device on the user side.
DeviceC
Eth-Trunk
User IP/MPLS
Network Core
DeviceB
Backup
Feature Deployment
To prevent downlink traffic from being interrupted because Device B does not generate
multicast forwarding entries directing traffic to the user side, deploy dual-device IGMP
snooping hot backup on Device A and Device B, as shown in Figure 8-17.
IGMP Snooping
Dual Device
Hot Backup
DeviceC
Eth-Trunk
User IP/MPLS
Network Core
DeviceB
Backup
After the deployment, Device A and Device B generate the same multicast forwarding entries
at the same time. If a master/backup Eth-Trunk link switchover occurs, Device B forwards
downlink traffic based on the generated multicast forwarding entries without needing to
generate the entries directing traffic to the user side.
Device 1 Device 2
S1 S2
To resolve the preceding problem, configure user data virtual backup between NE40E 1 and
NE40E 2. On the network shown in Figure 8-19, information about User1's identity is backed
up on NE40E 2. The aggregation switch S1 is single-homed to NE40E 1. VRRP is deployed
on the access side. One VRRP protection group is deployed for each pair of active and
standby links. If the VRRP group is in the Master state, the access link can be accessed by
users. If User1's COA/DM and web authentication response messages are randomly delivered
to NE40E 2, user data virtual backup allows NE40E 2 to forward the response messages to
NE40E 1. Additionally, if the link between NE40E 1 and the network goes faulty, NE40E 2
can also take over the traffic on the faulty link, preventing traffic interruption.
IP Core
VRRP VRRP
S1 S2
User 1 User 2
Single-homing
access in a multi-
node backup
scenario
NOTE
Single-homing access in a multi-node backup scenario can be implemented only after user data virtual
backup is configured.
IP Core
User data
backup
VRRP
Figure 8-21 Dual-homing access through the ring network (semi-ring) formed by aggregation
switches
IP Core
IP Core
User data
backup
User data
backup
VRRP
VRRP
online using the master devices. When master devices or the links of master devices are
faulty, the slave device takes over user services.
IP Core
In the topology shown in Figure 8-22, focus on the VLAN planning, and make sure that the
two NE40Es can be accessed by users simultaneously.
IP Core
Master Master
Slave Slave
IP Core
User data
backup
VRRP+BFD
Figure 8-25 Load balancing based on odd and even MAC addresses
Device 1 Device 2
VRRP
NPE
VRRP1
VRRP2
As shown in Figure 8-25, two Virtual Router Redundancy Protocol (VRRP) backup groups
are deployed on the access side. One VRRP backup group uses NE40E 1 as the master and
NE40E 2 as the backup, and the other uses NE40E 2 as the master and NE40E 1 as the
backup.
In multi-device backup scenarios, configure load balancing based on odd and even MAC
addresses to enable the master NE40E to forward only user packets carrying odd or even
MAC addresses.
To determine the forwarding path of uplink traffic and prevent packet disorder, the master and
backup NE40Es in the same virtual local area network (VLAN) must use different virtual
MAC addresses to establish sessions with hosts.
Device A
(DR)
VRRP
STB +BFD
Bypass tunnel
SW1
STB
Device B
(DR)
IGMP PIM
As shown in Figure 8-26, the NE40Es serve as multicast replication points. Multicast hot
backup does not apply to VLAN-based or interface-based multicast replication.
primary master or the direct link to the primary master fails, BRAS users do not need to go
online again and their services are switched to the secondary master. After the fault is
rectified, the BRAS users' services are switched back to the primary master, which improves
network reliability. The BRAS users are not aware of the fault during the preceding process.
NOTE
The primary and secondary masters negotiate the primary/secondary relationship through a virtual access
inter-chassis protocol (for example, the Diameter protocol), and support Virtual Access Solution - Reliability.
Figure 8-27 describes a typical networking environment with dual-device hot backup for
BRAS virtual access deployed. Device A and Device B are the primary and secondary
masters, respectively. In normal circumstances, Device A forwards both upstream and
downstream traffic. The secondary master synchronizes BRAS user information from the
primary master. If Device A or the link between Device A and the AP fails, a primary/
secondary switchover is performed and Device B becomes the primary master. Device B
needs to advertise a network segment route to network-side devices so that the devices will
transmit downstream traffic to Device B. Device B forwards traffic directly based on the
backup user information, and users do not need to go online again.
CE AP D
ia
m
User e IP/MPLS
te
network r core
Device B
(Secondary master)
Dual-device A feature in which one device functions as a master device and the other
backup functions as a backup device. In normal circumstances, the master
device provides service access and the backup device monitors the
running status of the master device. When the master device fails, the
backup device switches to a master device and provides service access,
ensuring service traffic continuity.
Term Definition
Remote Backup A configuration template that provides a unified user interface for dual-
Profile system backup configurations.
Remote Backup An inter-device backup channel, used to synchronize data between two
Service devices so that user services can smoothly switch from a faulty device
to another device during a master/backup device switchover.
DR Designated Router
TE Traffic Engineering
Purpose
The demand for network bandwidth is rapidly increasing as mobile services evolve from
narrowband voice services to integrated broadband services, including voice and streaming
media. Meeting the demand for bandwidth with traditional bearer networks dramatically
raises carriers' operation costs. To tackle the challenges posed by this rapid broadband-
oriented development, carriers urgently need mobile bearer networks that are flexible, low-
cost, and highly efficient. IP-based mobile bearer networks are an ideal choice. IP radio
access networks (IPRANs), a type of IP-based mobile bearer network, are being increasingly
widely used.
Traditional bearer networks use retransmission or the mechanism that allows one end to
accept only one copy of packets from multiple copies of packets sent by the other end to
minimize bit error impact. IPRANs have higher reliability requirements than traditional bearer
networks when carrying broadband services. Traditional fault detection mechanisms cannot
trigger protection switching based on random bit errors. As a result, bit errors may degrade or
even interrupt services on an IPRAN.
To solve this problem, configure bit-error-triggered protection switching.
NOTE
To prevent impacts on services, check whether protection links have sufficient bandwidth resources
before deploying bit-error-triggered protection switching.
Benefits
Bit-error-triggered protection switching offers the following benefits:
l Protects traffic against random bit errors, meeting high reliability requirements and
improving service quality.
l Enables devices to record bit error events. These records help carriers locate the nodes or
lines that have bit errors and take corrective measures accordingly.
Related Concepts
Bit error detection involves the following concepts:
l Bit error: deviation between a bit that is sent and the bit that is received.
l BER: number of bit errors divided by the total number of transferred bits during a certain
period. The BER can be considered as an approximate estimate of the probability of a bit
error occurring on any particular bit.
l LSP BER: calculation result based on the BER of each node on an LSP.
For dynamic services that use BFD to detect faults, a device uses BFD packets to advertise
the bit error status (including the BER). If the BER exceeds the bit error alarm threshold
configured on a device's interface, the device determines that bit errors have occurred on the
interface's link, and instructs an upper-layer application to perform a service switchover. The
device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device. If bit-error-triggered protection switching also
has been deployed on the peer device, the peer device performs protection switching.
If a transit node or the egress of a dynamic CR-LSP detects bit errors, the transit node or
egress must use BFD packets to advertise the BER. On the network shown in Figure 9-1, a
dynamic CR-LSP is deployed from PE1 to PE2. If both the transit node P and egress PE2
detect bit errors:
1. The P node obtains the local BER and sends PE2 a BFD packet carrying the BER.
2. PE2 obtains the local BER. After receiving the BER from the P node, PE2 calculates the
BER of the CR-LSP based on the BER received and the local BER.
3. PE2 sends PE1 a BFD packet carrying the BER of the CR-LSP.
4. After receiving the BER of the CR-LSP, PE1 determines the bit error status based on a
specified threshold. If the BER exceeds the threshold, PE1 performs protection
switching.
Dynamic CR-LSP
PE1 P PE2
Bit errors
BFD packet for advertising the BER
For static services that use MPLS-TP OAM to detect faults, a device uses MPLS-TP OAM to
advertise the bit error status. If the BER reaches the bit error alarm threshold configured on an
interface of a device along a static CR-LSP or PW, the device determines that bit errors have
occurred on the interface's link, and notifies the MPLS-TP OAM module. The MPLS-TP
OAM module uses AIS packets to advertise the bit error status to the egress, and then APS is
used to trigger a traffic switchover.
If a transit node detects bit errors on a static CR-LSP or PW, the transit node uses AIS packets
to advertise the bit error status to the egress, triggering a traffic switchover on the static CR-
LSP or PW. On the network shown in Figure 9-2, a static CR-LSP is deployed from PE1 to
PE2. If the transit node P detects bit errors:
1. The P node uses AIS packets to notify PE2 of the bit error event.
2. After receiving the AIS packets, PE2 reports an AIS alarm to trigger local protection
switching. PE2 then sends CRC-AIS packets to PE1 and uses the APS protocol to
complete protection switching through negotiation with PE1.
3. After receiving the CRC-AIS packets, PE1 reports a CRC-AIS alarm.
Static CR-LSP
PE1 P PE2
Bit errors
AIS packet
CRC_AIS alarm
Implementation Principles
Trigger-section bit error detection must be enabled on an interface. After detecting bit errors
on an inbound interface, a device notifies the interface management module of the bit errors.
The link-layer protocol status of the interface then changes to bit-error-detection Down,
triggering an upper-layer application associated with the interface for a service switchover.
After the bit errors are cleared, the link-layer protocol status of the interface changes to Up,
triggering an upper-layer application associated with the interface for a service switchback.
The device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device.
l If bit-error-triggered section switching also has been deployed on the peer device, the bit
error status is advertised to the interface management module of the peer device. The
link-layer protocol status of the interface then changes to bit-error-detection Down or
Up, triggering an upper-layer application associated with the interface for a service
switchover or switchback.
l If bit-error-triggered section switching is not deployed on the peer device, the peer
device cannot detect the bit error status of the interface's link. In this case, the peer
device can only depend on an upper-layer application (for example, IGP) for link fault
detection.
For example, on the network shown in Figure 9-3, trigger-section bit error detection is
enabled on each interface, and nodes communicate through IS-IS routes. In normal cases, IS-
IS routes on PE1 and PE2 are preferentially transmitted over the primary link. Therefore,
traffic in both directions is forwarded over the primary link. If PE2 detects bit errors on the
interface to PE1:
If trigger-section bit error detection is not supported or enabled on PE1's interface to PE2,
PE1 can only use IS-IS to detect that the primary link is unavailable, and then performs an IS-
IS route switchover.
PE1 PE2
Bit errors
P
BFD packet
Primary link
Secondary link
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered section switching to cope with link bit errors
on the LDP LSPs.
NOTE
After bit-error-triggered section switching is deployed, if bit errors occur on both the primary and
secondary links on an LDP LSP, the interface status changes to bit-error-detection Down on both the
primary and secondary links. As a result, services are interrupted. Therefore, it is recommended that you
deploy bit-error-triggered IGP route switching.
Background
Bit-error-triggered section switching can cope with link bit errors. If bit errors occur on both
the primary and secondary links, bit-error-triggered section switching changes the interface
status on both the primary and secondary links to bit-error-detection Down. As a result,
services are interrupted because no link is available. To resolve the preceding issue, deploy
bit-error-triggered IGP route switching. After the deployment is complete, link bit errors
trigger IGP route costs to be adjusted, preventing upper-layer applications from transmitting
service traffic to links with bit errors. Bit-error-triggered IGP route switching ensures normal
running of upper-layer applications and minimizes the impact of bit errors on services.
Implementation Principles
Link-quality bit error detection must be enabled on an interface. After detecting bit errors on
an inbound interface, a device notifies the interface management module of the bit errors. The
link quality level of the interface then changes to Low, triggering an IGP (OSPF or IS-IS) to
increase the cost of the interface's link. In this case, IGP routes do not preferentially select the
link with bit errors. After the bit errors are cleared, the link quality level of the interface
changes to Good, triggering the IGP to restore the original cost for the interface's link. In this
case, IGP routes preferentially select the link again. The device also notifies the BFD module
of the bit error status, and then uses BFD packets to advertise the bit error status to the peer
device.
l If bit-error-triggered IGP route switching also has been deployed on the peer device, the
bit error status is advertised to the interface management module of the peer device. The
link quality level of the interface then changes to Low or Good, triggering the IGP to
increase the cost of the interface's link or restore the original cost for the link. IGP routes
on the peer device then do not preferentially select the link with bit errors or
preferentially select the link again.
l If bit-error-triggered IGP route switching is not deployed on the peer device, the peer
device cannot detect the bit error status of the interface's link. Therefore, the IGP does
not adjust the cost of the link. Traffic from the peer device may still pass through the link
with bit errors. As a result, bidirectional IGP routes pass through different links. The
local device can receive traffic properly, and services are not interrupted. However, the
impact of bit errors on services cannot be eliminated.
For example, on the network shown in Figure 9-4, link-quality bit error detection is enabled
on each interface, and nodes communicate through IS-IS routes. In normal cases, IS-IS routes
on PE1 and PE2 are preferentially transmitted over the primary link. Therefore, traffic in both
directions is forwarded over the primary link. If PE2 detects bit errors on interface 1:
l PE2 adjusts the link quality level of interface 1 to Low, triggering IS-IS to increase the
cost of the interface's link to a value (for example, 40). PE2 uses a BFD packet to
advertise the bit errors to PE1.
l After receiving the BFD packet, PE1 also adjusts the link quality level of interface 1 to
Low, triggering IS-IS to increase the cost of the interface's link to a value (for example,
40).
IS-IS routes on both PE1 and PE2 preferentially select the secondary link, because the cost
(20) of the secondary link is less than the cost (40) of the primary link. Traffic in both
directions is then switched to the secondary link.
If bit-error-triggered IGP route switching is not supported or enabled on PE1, PE1 cannot
detect the bit errors. In this case, PE1 still sends traffic to PE2 through the primary link. PE2
can receive traffic properly, but services are affected by the bit errors.
If PE2 detects bit errors on both interface 1 and interface 2, PE2 adjusts the link quality levels
of the interfaces to Low, triggering the costs of the interfaces' links to be increased to 40. IS-
IS routes on PE2 still preferentially select the primary link to ensure service continuity,
because the cost (40) of the primary link is less than the cost (50) of the secondary link. To
eliminate the impact of bit errors on services, you must manually restore the link quality.
Co
st 10
=
= 10 st
Co
Bit errors
P
BFD packet
Primary link
Secondary link
NOTE
When link-quality bit error detection is enabled on an interface, bit errors can be either CRC or Prefec
bit errors. Only OTN interfaces support Prefec bit errors.
Bit-error-triggered section switching and bit-error-triggered IGP route switching are mutually exclusive.
Usage Scenario
If LDP LSPs are used, deploy bit-error-triggered IGP route switching to cope with link bit
errors on the LDP LSPs. Bit-error-triggered IGP route switching ensures service continuity
even if bit errors occur on both the primary and secondary links on an LDP LSP. Therefore, it
is recommended that you deploy bit-error-triggered IGP route switching.
Implementation Principles
According to the types of protection switching triggered, bit-error-triggered trunk update is
classified as follows:
Trunk
Bit errors
BFD packet
the trunk interface. After detecting bit errors on a trunk interface's member interface, a device
advertises the bit errors to the trunk interface, triggering the trunk interface to delete the
member interface from the forwarding plane. The trunk interface then does not select the
member interface to forward traffic. After the bit errors are cleared from the member
interface, the trunk interface re-adds the member interface to the forwarding plane. The trunk
interface can then select the member interface to forward traffic. If bit errors occur on all
trunk member interfaces or the number of member interfaces without bit errors is lower than
the lower threshold for the trunk interface's Up links, the trunk interface ignores the bit errors
on the member interfaces and remains Up. However, the link quality level of the trunk
interface becomes Low, triggering an IGP (OSPF or IS-IS) to increase the cost of the trunk
interface's link. IGP routes then do not preferentially select the link. If the number of member
interfaces without bit errors reaches the lower threshold for the trunk interface's Up links, the
link quality level of the trunk interface changes to Good, triggering the IGP to restore the
original cost for the trunk interface's link. In this case, IGP routes preferentially select the link
again.
The device also notifies the BFD module of the bit error status, and then uses BFD packets to
advertise the bit error status to the peer device connected to the trunk interface.
l If trunk-bit-error-triggered IGP route switching also has been deployed on the peer
device, the bit error status is advertised to the trunk interface of the peer device. The
trunk interface is then triggered to delete or re-add the member interface from or to the
forwarding plane. The link quality level of the trunk interface is also triggered to change
to Low or Good. In this case, the cost of IGP routes is adjusted, implementing
switchover or switchback synchronization with the device.
l If trunk-bit-error-triggered IGP route switching is not deployed on the peer device, the
peer device cannot detect the bit error status of the interface's link. If the trunk interface
of the device has deleted the member interface with bit errors from the forwarding plane,
the trunk interface of the peer device may still select the member interface to forward
traffic. Similarly, if the link quality level of the trunk interface on the device has changed
to Low, the IGP is triggered to increase the cost of the trunk interface's link. In this case,
IGP routes do not preferentially select the link. However, IGP on the peer device does
not adjust the cost of the link. Traffic from the peer device may still pass through the link
with bit errors. As a result, bidirectional IGP routes pass through different links. To
ensure normal running of services, the device can receive traffic from the member
interface with bit errors. However, bit errors may affect service quality.
Tr k
un
k un
Tr
Bit errors
P
BFD packet
Primary link
Secondary link
NOTE
Layer 2 trunk interfaces do not support an IGP. Therefore, bit-error-triggered IGP route switching cannot
be deployed on Layer 2 trunk interfaces. If bit errors occur on all Layer 2 trunk member interfaces or the
number of member interfaces without bit errors is lower than the lower threshold for the trunk interface's
Up links, the trunk interface remains in the Up state. As a result, protection switching cannot be
triggered. To eliminate the impact of bit errors on services, you must manually restore the link quality.
Usage Scenario
If a trunk interface is deployed, deploy bit-error-triggered trunk update to cope with bit errors
detected on trunk member interfaces. Trunk-bit-error-triggered IGP route switching is
recommended.
Implementation Principles
On the network shown in Figure 9-7, trigger-LSP bit error detection must be enabled on each
node's interfaces on the RSVP-TE tunnels. To implement dual-ended switching, configure the
RSVP-TE tunnels in both directions as bidirectional associated CR-LSPs. If a node on a CR-
LSP detects bit errors in a direction, the ingress of the tunnel obtains the BER of the CR-LSP
after BER calculation and advertisement. For details, see Bit Error Detection.
P2
Bit errors
Primary CR-LSP
Hot-standby CR-LSP
The ingress then determine the bit error status of the CR-LSP based on the BER threshold
configured for the RSVP-TE tunnel. For rules for determining the bit error status of the CR-
LSP, see Figure 9-8.
l If the BER of the CR-LSP is greater than or equal to the switchover threshold of the
RSVP-TE tunnel, the CR-LSP is always in the excessive BER state.
l If the BER of the CR-LSP falls below the switchback threshold, the CR-LSP changes to
the normalized BER state.
Figure 9-8 Rules for determining the bit error status of the CR-LSP
Switchover
threshold
Switchback
threshold
After the bit error statuses of the primary and backup CR-LSPs are determined, the RSVP-TE
tunnel determines whether to perform a primary/backup CR-LSP switchover based on the
following rules:
l If the primary CR-LSP is in the excessive BER state, the RSVP-TE tunnel attempts to
switch traffic to the backup CR-LSP.
l If the primary CR-LSP changes to the normalized BER state or the backup CR-LSP is in
the excessive BER state, traffic is switched back to the primary CR-LSP.
The RSVP-TE tunnel in the opposite direction also performs the same switchover, so that
traffic in the upstream and downstream directions is not transmitted over the CR-LSP with bit
errors.
Usage Scenario
If RSVP-TE tunnels are used as public network tunnels, deploy bit-error-triggered RSVP-TE
tunnel switching to cope with link bit errors along the tunnels.
Background
When PW redundancy is configured for L2VPN services, bit-error-triggered switching can be
configured. With this function, if bit errors occur, services can switch between the primary
and secondary PWs.
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. PW redundancy
can be configured in either a single segment or multi-segment scenario.
PE2
Primary PW
Secondary PW
PE3
Bit errors
BFD packets
SPE1 PE2
PW1
Primary PW
CE2
CE1 PE1
Bypass PW
Secondary PW
PW2
SPE2 PE3
Bit errors
BFD packets
LDP Notification messages
After traffic switches to the secondary PW, and bit errors are removed from the primary PW,
traffic switches back to the primary PW based on a configured switchback policy.
NOTE
If an RSVP-TE tunnel is established for PWs, and bit-error-triggered RSVP-TE tunnel switching is
configured, a switchover is preferentially performed between the primary and hot-standby CR-LSPs in
the RSVP-TE tunnel. A primary/secondary PW switchover can be triggered only if the primary/hot-
standby CR-LSP switchover fails to remove bit errors in either of the following situations:
l The hot standby function is not configured.
l Bit errors occur on both the primary and hot-standby CR-LSPs.
Usage Scenario
If L2VPN is used to carry user services and PW redundancy is deployed to ensure reliability,
deploy bit-error-triggered switching for PW to minimize the impact of bit errors on user
services and improve service quality.
Principles
Trigger-LSP bit error detection must be enabled on each node's interfaces. In Figure 9-11, an
HVPN is configured on an IP/MPLS backbone network. VPN FRR is configured on a UPE. If
SPE1 detects bit errors, the processing is as follows:
l SPE1 reduces the Local Preference attribute value or increase the Multi-Exit
Discrimination (MED) attribute value. Then, the preference value of a VPN route that
SPE1 advertises to an NPE is reduced. As a result, the NPE selects the VPN route to
SPE2, not the VPN route to SPE1. Traffic switches to the standby link. In addition, SPE1
sends a BFD packet to notify the UPE of bit errors.
l Upon receipt of the BFD packet, the UPE switches traffic to the standby link over the
VPN route destined for SPE2.
If the bit errors on the active link are removed, the UPE re-selects the VPN routes destined for
SPE1, and SPE1 restores the preference value of the VPN route to be advertised to the NPE.
Then the NPE also re-selects the VPN route destined for SPE1.
NOTE
If an RSVP-TE tunnel is established for an L3VPN, and bit-error-triggered RSVP-TE tunnel switching
is configured, a traffic switchover between the primary and hot-standby CR-LSPs in the RSVP-TE
tunnel is preferentially performed. An active/standby L3VPN route switchover can be triggered only if
the primary/hot-standby CR-LSP switchover fails to remove bit errors in either of the following
situations:
l The hot standby function is not configured.
l Bit errors occur on both the primary and hot-standby CR-LSPs.
Usage Scenario
If L3VPN is used to carry user services and VPN FRR is deployed to ensure reliability,
deploy bit-error-triggered L3VPN switching to minimize the impact of bit errors on user
services and improve service quality.
Implementation Principles
The MAC-layer SD alarm function (Trigger-LSP type) must be enabled on interfaces, and
then MPLS-TP OAM must be deployed to monitor CR-LSPs/PWs. Static PWs/E-PWs are
classified as SS-PWs or MS-PWs.
In an SS-PW networking scenario (see Figure 9-12), the bit error generation and clearing
process is as follows:
Bit error generation:
l If the BER on an inbound interface of the P node reaches a specified threshold, the CRC
module detects the bit error status of the inbound interface, notifies all static CR-LSP
modules, and constructs and sends AIS packets to PE2.
l Upon receipt of the AIS packets, PE2 notifies static PWs established over the CR-LSPs
of the bit errors and instructs the TP OAM module to perform APS. APS triggers a
primary/backup CR-LSP switchover, and a PW established over the new primary CR-
LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error
status on the inbound interface. The CRC module informs the TP OAM module that the bit
errors have been cleared. Upon receipt of the notification, the TP OAM module stops sending
AIS packets to PE2 functioning as the egress. PE2 does not receive AIS packets after a
specified period and determines that the bit errors have been cleared. PE2 then generates an
AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a primary/backup
CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
CE PE1 CE
DNI PW
AIS packet
In an MS-PW networking scenario (see Figure 9-13), the bit error generation and clearing
process is as follows:
Bit error generation:
l The CRC module of an inbound interface on the SPE detects bit errors and determines to
send either an SF or SD alarm based on a specified BER threshold. The CRC module
then notifies the TP OAM module of the bit errors. The TP OAM module notifies the bit
error status, sends RDI packets, and performs APS. The APS module instructs the peer
node to perform a traffic switchover, which triggers a primary/backup CR-LSP
switchover. The PW established over the bit-error-free CR-LSP takes over traffic.
l If the BER on an inbound interface of the SPE reaches a specified threshold, the CRC
module detects the bit error status of the inbound interface, sets all static CR-LSP
modules to the bit error status, and constructs and sends AIS packets to PE2.
l Upon receipt of the AIS packets, PE2 notifies the TP OAM module. The TP OAM
module then performs APS, which triggers a primary/backup CR-LSP switchover. The
PW established over the bit-error-free CR-LSP takes over traffic.
Bit error clearing: After bit errors are cleared, the CRC module cannot detect the bit error
status on the inbound interface. The CRC module informs the TP OAM module that the bit
errors have been cleared. Upon receipt of the notification, the TP OAM module stops sending
AIS packets to PE2 functioning as the egress. PE2 does not receive AIS packets after a
specified period and determines that the bit errors have been cleared. PE2 then generates an
AIS clear alarm and instructs the TP OAM to perform APS. APS triggers a primary/backup
CR-LSP switchover, and services are switched back to the PW over the primary CR-LSP.
CE PE1 DNI PW
CE
AIS packet
NOTE
If a tunnel protection group has been deployed for static CR-LSPs carrying PWs/E-PWs, bit errors
preferentially trigger static CR-LSP protection switching. Bit-error-triggered PW protection switching is
performed only when bit-error-triggered static CR-LSP protection switching fails to protect services
against bit errors (for example, bit errors occur on both the primary and backup CR-LSPs).
Usage Scenario
If static CR-LSPs/PWs/E-PWs are used to carry user services and MPLS-TP OAM is
deployed to ensure reliability, deploy bit-error-triggered APS to minimize the impact of bit
errors on user services and improve service quality.
Bit- If bit errors are Link-quality bit l This feature l Enable bit-
error- generated or cleared error detection is error-triggered
triggere on an interface, the must be independent IGP route
d IGP link quality level of enabled on an ly deployed. switching on the
route the interface changes interface. l When interfaces at
switchi to Low or Good, The bit error deploying both ends of a
ng triggering an IGP status must be trunk-bit- link.
(OSPF or IS-IS) to advertised error- l When link-
increase the cost of using BFD triggered quality bit error
the interface's link or packets. IGP route detection is
restore the original switching, enabled on an
cost for the link. IGP you must interface, bit
routes on the peer deploy bit- errors can be
device then do not error- either CRC or
preferentially select triggered Prefec bit
the link with bit IGP route errors. Only
errors or switching OTN interfaces
preferentially select on trunk support Prefec
the link again. interfaces. bit errors.
remains Up.
However, the link
quality level of
the trunk interface
becomes Low,
triggering an IGP
to increase the
cost of the trunk
interface's link.
IGP routes then
do not
preferentially
select the link.
Bit- The ingress of the Trigger-LSP bit l This feature To implement dual-
error- primary and backup error detection is ended switching,
triggere CR-LSPs determines must be independent deploy bit-error-
d the bit error statuses enabled on an ly deployed. triggered protection
RSVP- of the CR-LSPs interface. l This feature switching on the
TE based on link BERs. The bit error is deployed RSVP-TE tunnels
tunnel A service switchover status must be together in both directions
switchi or switchback is then advertised with bit- and configure the
ng performed based on using BFD error- tunnels as
the bit error statuses packets. triggered bidirectional
of the CR-LSPs. PW associated CR-
switching. LSPs.
l This feature
is deployed
together
with bit-
error-
triggered
L3VPN
switching.
Networking Description
Figure 9-14 shows typical L2VPN+L3VPN networking in an IP RAN application. A VPWS
based on an RSVP-TE tunnel is deployed at the access layer, an L3VPN based on an RSVP-
TE tunnel is deployed at the aggregation layer, and L2VPN access to L3VPN is configured on
the AGGs. To ensure reliability, deploy PW redundancy for the VPWS, configure VPN FRR
protection for the L3VPN, and configure hot-standby protection for the RSVP-TE tunnels.
RSVP-TE RSVP-TE
Access Aggregation
CSG
L2VPN+L3VPN
VPWS L3VPN
Traffic path
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered RSVP-TE tunnel
switching, bit-error-triggered PW switching, and bit-error-triggered L3VPN route switching
in the scenario shown in Figure 9-14. The deployment process is as follows:
On the network shown in Figure 9-15, if bit errors occur on location 1, the RSVP-TE tunnel
between the CSG and AGG1 detects the bit errors, triggering dual-ended switching. Both
upstream and downstream traffic is switched to the hot-standby path, preventing traffic from
passing through the link with bit errors.
RSVP-TE RSVP-TE
Access Aggregation
CSG
L2VPN+L3VPN
VPWS L3VPN
Bit errors
Traffic path
Scenario 2
On the network shown in Figure 9-16, if bit errors occur on both locations 1 and 2, both the
primary and secondary links of the RSVP-TE tunnel between the CSG and AGG1 detect the
bit errors. In this case, bit-error-triggered RSVP-TE tunnel switching cannot protect services
against bit errors. The bit errors further trigger PW and L3VPN route switching.
l After detecting the bit errors, the CSG performs a primary/secondary PW switchover and
switches upstream traffic to AGG2.
l After detecting the bit errors, AGG1 reduces the priority of VPNv4 routes advertised to
RSG1, so that RSG1 preferentially selects VPNv4 routes advertised by AGG2.
Downstream traffic is then switched to AGG2.
Access Aggregation
2
CSG
L2VPN+L3VPN
VPWS L3VPN
Bit errors
Traffic path
Access Aggregation
CSG
L2VPN+L3VPN
VPWS L3VPN
Traffic path
Feature Deployment
To prevent the impact of bit errors on services, deploy bit-error-triggered IGP route switching
in the scenario shown in Figure 9-17. Deploy trunk-bit-error-triggered IGP route switching
on the Eth-Trunk interfaces. The deployment process is as follows:
l Enable link-quality bit error detection on each physical interface and Eth-Trunk member
interface.
l Enable bit-error-triggered IGP route switching on each physical interface and Eth-Trunk
interface.
Access Aggregation
CSG
L2VPN+L3VPN
VPWS L3VPN
Bit errors
Traffic path
Scenario 2
On the network shown in Figure 9-19, if bit errors occur on location 2 (Eth-Trunk member
interface), AGG1 detects the bit errors.
l If the number of member interfaces without bit errors is still higher than the lower
threshold for the Eth-Trunk interface's Up links, the Eth-Trunk interface deletes the Eth-
Trunk member interface from the forwarding plane. In this case, service traffic is still
forwarded over the normal path.
l If the number of member interfaces without bit errors is lower than the lower threshold
for the Eth-Trunk interface's Up links, the Eth-Trunk interface ignores the bit errors on
the Eth-Trunk member interface and remains Up. However, the link quality level of the
Eth-Trunk interface becomes Low, triggering an IGP (OSPF or IS-IS) to increase the
cost of the Eth-Trunk interface's link. IGP routes then do not preferentially select the
link. AGG1 also uses a BFD packet to advertise the bit errors to the peer device, so that
the peer device also performs the same processing. Both upstream and downstream
traffic is then switched to the paths without bit errors.
2 AGG1 RNC
RSG1
NodeB
Access Aggregation
CSG
L2VPN+L3VPN
VPWS L3VPN
Bit errors
Traffic path
RNC
BNC
Feature Deployment
To meet high reliability requirements of the IP RAN and protect services against bit errors,
configure bit-error-triggered protection switching for the CR-LSPs/PWs. To do so, enable bit
error detection on the interfaces along the CR-LSPs/PWs, configure the switching type as
trigger-LSP, and configure bit error alarm generation and clearing thresholds. If the BER
reaches the bit error alarm threshold configured on an interface of a device along a static CR-
LSP or PW, the device determines that a bit error occurrence event has occurred and notifies
the MPLS-TP OAM module of the event. The MPLS-TP OAM module uses AIS packets to
advertise the bit error status to the egress, and then APS is used to trigger a traffic switchover.
BER (bit error rate) A bit error rate (BER) indicates the
probability that incorrect packets are
received and packets are discarded.
PW pseudo wire
10.1 Introduction
10.2 Principles
10.3 Applications
10.4 Terms and Abbreviations
10.1 Introduction
Definition
Uninterruptible service technologies are a type of high availability (HA) technology that
ensures service continuity when a device performs a protocol restart or AMB/SMB
switchover.
NOTE
Purpose
With the rapid development of networks and the diversification of applications, various value-
added services, such as Internet Protocol television (IPTV) and video conferencing, have
become so vastly widespread. Any network service interruption can cause immeasurable loss
to users, resulting in the increase of user demands for network infrastructure reliability.
The uninterruptible service technology resolves route flapping and ensures service continuity,
meeting carriers' HA requirements for networks.
10.2 Principles
Related Technologies
Uninterruptible service technologies include:
l Non-stop forwarding (NSF): This technology ensures data forwarding continuity if a
device's control plane fails. NSF is implemented based on graceful restart (GR). This
section mainly describes GR implementation. GR ensures service continuity when a
device performs an IP/MPLS restart or AMB/SMB switchover. GR requires the
collaboration of neighboring devices to back up and restore information such as routing
information.
l Non-stop routing (NSR): This technology enables a device to back up routing and
forwarding information from the AMB to the SMB. During an AMB/SMB switchover,
NSR enables the device to restore the information, ensuring service continuity. NSR does
not require the collaboration of neighboring devices.
GR and NSR have specific requirements on system hardware, software, and protocols. Table
10-1 describes these requirements.
NSR NSR has the same requirements on NSR has the same NSR has no
system hardware as GR. requirements on special
system software as requirements on
GR. protocols.
Figure 10-1 Framework model with the main control board redundancy mechanism
GR
capability AMB Routing/MPLS protocol SMB
negotiation state sync
RPA MPLS RPA MPLS
Socket/TCP link
RIB Configuration and RIB
IFnet IFnet
Download FIB change sync FIB
FIB IPC IPC
Heartbeat check
IO
FIB FIB FIB FIB
board
Interface
GR Implementation
When a device performs a protocol restart or AMB/SMB switchover, GR enables the device
to instruct its neighboring devices to maintain the adjacencies and routing stability within a
specified GR time. After the protocol restart or AMB/SMB switchover is complete, the device
restores network topology, routing, and session information with the collaboration of its
neighboring devices within a short period of time. During the entire GR process, no route
flapping occurs and no forwarding paths change. Therefore, the device can forward service
data without interruptions.
Concepts related to GR implementation are as follows:
l GR restarter: A device that has two main control boards, AMB and SMB. Its routing
protocols support the GR capability. When a GR restarter performs an AMB/SMB
switchover, it notifies its neighbors of the switchover and instructs them to maintain
adjacencies with it.
l GR helper: A neighbor of a GR restarter. A GR helper must be able to identify GR
signaling. When the GR restarter performs an AMB/SMB switchover, the GR helper
maintains the adjacency with the GR restarter. After the AMB/SMB switchover is
complete, the GR helper assists the GR restarter in restoring network topology
information.
NOTE
GR restarters and GR helpers are relative concepts. When the GR capability is enabled on a GR
helper, the GR helper can function as a GR restarter.
l GR session: A GR capability negotiation process between a GR restarter and a GR
helper. A GR session includes protocol restart notification and information exchanging
during a protocol restart. The GR restarter and GR helper can use a GR session to obtain
each other's GR capability.
l GR time: A period of time taken for a GR restarter and GR helper to establish a GR
session. When finding that the GR restarter is Down, the GR helper preserves the
topology or routing information sent from the GR restarter within a specified GR time.
Table 10-2 describes GR implementation. Device A functions as a GR restarter, and Device
B, Device C, and Device D function as GR helpers.
Recovery of the Figure 10-5 The GR restarter obtains the current topology
network topology and routing information from the GR helpers,
information on the recalculates routes, updates the routing table, and
GR restarter ages out the old routing information.
DeviceD
DeviceA GR restarter
DeviceB
DeviceC
GR helper GR helper
GR session
DeviceD
DeviceA GR restarter
DeviceB DeviceC
GR helper GR helper
GR session
AMB/SMB switchover on the
GR restarter
DeviceD
DeviceA GR restarter
DeviceB DeviceC
GR helper GR helper
GR signaling
DeviceD
DeviceA GR restarter
DeviceB DeviceC
GR helper GR helper
The GR restarter obtains topology or
routing information from its neighbors.
NSR Implementation
NSR implementation involves the following phases:
1. Batch backup: NSR is automatically enabled right after the SMB starts. The SMB backs
up routing and forwarding information sent by the AMB in batches. Batch backup is
performed before real-time backup. NSR cannot perform an AMB/SMB switchover
when batch backup is being performed.
2. Real-time backup: Changes to the control and forwarding planes are backed up from the
AMB to the SMB in real time so that NSR will be ready to perform an AMB/SMB
switchover and allow the SMB to take over traffic from the AMB if a fault occurs.
3. AMB/SMB switchover: If the AMB fails, the SMB detects the failure and becomes the
new AMB. The SMB instructs the LPU to send packets to itself instead of the original
AMB. The AMB/SMB switchover is rapidly performed so that the routes between the
device and its neighboring devices remain reachable.
Figure 10-6 shows NSR networking.
Neighboring Neighboring
devices devices
2 If the AMB fails, the SMB detects the failure and becomes the new
AMB. The SMB instructs the LPU to send packets to itself.
AMB 1 SMB
IS-IS IS-IS
BGP 2 BGP
APPCFG APPCFG
Packet 3 Packet
Packet
transmission
LPU
The SMB starts and sends a message about its in-service status to the AMB. After
receiving the message, the AMB backs up its data to the SMB in batches.
– After batch backup is complete, the device enters the redundancy protection state. If
the AMB fails, the SMB can become the new AMB and restore data.
– If the AMB fails before batch backup is complete, the SMB cannot become the new
AMB. The fault can be rectified after the device restarts.
l Real-time backup
AMB SMB
IS-IS 2 IS-IS
BGP BGP
3
Packet Packet
Packet
transmission
LPU
1 1
After batch backup is complete, the device enters the real-time backup phase. If the
neighbor status or routing information changes on the AMB, the AMB backs up the
updated information to the SMB in real time.
l AMB/SMB switchover
Packet
transmission
2
IFNET
Forwarding LPU
plane
If the AMB's software or hardware fails, the SMB detects the failure and automatically
becomes the new AMB. The new AMB uses the backup data to forward traffic. The LPU
sends the information that has been updated during the AMB/SMB switchover to the
new AMB. Routes are reachable and traffic forwarding is uninterrupted during the
switchover.
Table 10-3 describes the differences between GR and NSR.
NSR NSR allows a device to NSR has heavy loads and low
perform an AMB/SMB performance.
switchover separately, which Software exceptions can cause
means that its neighbor devices NSR failures.
do not need to support NSR or
detect routing information
changes.
NSR ensures proper traffic
transmission even if the control
planes of multiple nodes fail
simultaneously.
After a fault is rectified, NSR
rapidly restores data and
network topology information.
10.3 Applications
Uninterruptible service technologies are typically applied on provider edges (PEs), especially
when customer edges (CEs) are connected to a carrier network. As shown in Figure 10-7, if a
PE fails or maintenance operation needs to be performed on a PE (for example, a software
version needs to be upgraded), the PE performs an AMB/SMB switchover, which causes
service interruptions. Uninterruptible service technologies can be deployed to resolve this
issue.
VPN B VPN A
CE1
PE1 PE3
CE3
IBGP Full mesh
AS 100
IS-IS Level-2
PE2 PE4
CE2
CE4
VPN A VPN B
Generally, CEs do not have the GR helper capability. NSR uses an internal mechanism to
ensure that the routing protocol and adjacency statuses are consistent on the control planes of
the AMB and SMB. With NSR, no adjacencies need to be reestablished when an AMB/SMB
switchover occurs. AMB/SMB switchovers are transparent to neighbors. Therefore, neighbors
do not need to support NSR. NSR can be deployed to ensure service continuity on the
network shown in Figure 10-7.
NOTE
If CEs have the GR helper capability, GR can be deployed on the CEs to ensure service continuity.
GR restarter A device that has two main control boards: an AMB and an
SMB. Its routing protocols support the GR capability. When a
GR restarter performs an AMB/SMB switchover, it notifies its
neighbors of the switchover and instructs them to maintain
adjacencies with it.
Term Definition
GR graceful restart
HA high availability
11 CGN Reliability
11.1 Introduction
11.2 Principles
11.3 Applications
11.4 Terms, Acronyms, and Abbreviations
11.1 Introduction
Definition
CGN reliability allows for inter-chassis and intra-chassis hot backup on the master and slave
VSUF-80s/160s to ensure data consistency as well as enhance service reliability and device
availability.
Purpose
In the event of a fault on the master service board, master device, or a link, CGN reliability
allows services to be smoothly switched to the slave service board or device, thereby ensuring
service restoration within a short period.
Benefits
l This feature offers the following benefits to carriers:
Enhances network reliability by providing service continuity and provides reliable
support for IPv4-to-IPv6 transition networks with CGN devices deployed.
l This feature offers the following benefits to users:
Ensures service continuity without letting users perceive faults.
11.2 Principles
Basic Concepts
From a micro perspective, inter-board hot backup indicates the CPU backup between the
master and slave service boards. From a macro perspective, inter-board hot backup indicates
that a NAT device is equipped with multiple service boards where the master and slave
service boards back up each other. Inter-board hot backup allows for a master/slave
switchover upon a fault on the master service board, thereby ensuring data consistency and
service continuity as well as preventing users from perceiving faults.
Backup Principles
Inter-board hot backup uses static configuration to determine the master/slave relationship
between service boards. NAT session tables are established on the master service board, and
service traffic is transmitted over the master service board only. Once the master
VSUF-80/160 becomes faulty, the chassis detects the fault, and the interface board switches
traffic to the slave service board.
Currently, inter-board backup includes the cold backup, warm backup, and hot backup modes.
In comparison with cold backup and warm backup, hot backup has the following differences:
l Hot backup
The slave service board automatically synchronizes NAT sessions with the master
service board, without requiring NAT sessions to be re-established during traffic
switching. Figure 11-1 shows the internal processing path: multi-core CPU (maser) <->
TM (master) <-> SFU <-> TM (slave) <-> multi-core CPU (slave).
NOTE
l Multi-core CPU: processes NAT information, including the public and private IP addresses,
range of ports to be allocated, NAT source tracing logs, and session aging processing
mechanism.
l Traffic manager (TM): processes forwarding traffic on VSUFs and other related boards.
l Switch fabric unit (SFU): provides data communication channels for all boards.
SFU Multi-
core CPU
TM
TM
l Cold/Warm backup
The slave service board does not synchronize NAT sessions with the master service
board. When traffic is switched to the slave service board, NAT sessions need to be re-
established. The re-establishment time is determined by the number of NAT sessions.
Figure 11-2 shows the internal processing path: multi-core CPU (maser) <-> TM
(master).
SFU Multi-
core CPU
TM
Backup service
board
Multi-
core CPU
TM
Troubleshooting Mechanism
Table 11-1 lists the comparison between cold backup, warm backup, and hot backup. Inter-
board hot backup supports both centralized and distributed scenarios for rapid service
recovery.
Inter- Centrali The master When the master service When the master service
board zed service board board becomes faulty, board recovers from the
cold NAT processes user traffic is interrupted fault, it enters the
backup services, and temporarily, and traffic is delayed switchback
the slave switched to the slave phase, and NAT sessions
service board service board. A public are re-established.
does not back IP address is re-
up any tables. allocated, and NAT
sessions are re-
established.
Inter- Distrib The master When the master service When the master service
board uted service board board becomes faulty, board recovers from the
warm NAT processes user traffic is interrupted fault, it enters the
backup services, and temporarily, and traffic is delayed switchback
the slave switched to the slave phase, and user table
service board service board. NAT information is backed up
backs up user sessions are re- to the master service
table established using the board. NAT sessions are
information backed up user table re-established using the
in real time. information. backed up user table
information.
Inter- Centrali The master When the master service When the master service
board zed service board board becomes faulty, board recovers from the
hot NAT processes user traffic is interrupted fault, it enters the
backup services, and temporarily, and traffic is delayed switchback
Distrib the slave switched to the slave phase, and user entries
uted service board service board. The and NAT session table
NAT backs up user backed up user entries information are backed
entries and and NAT session table up to the master service
NAT session information are used. board. The backed up
table user entries and NAT
information. session table information
are used.
NOTE
In centralized scenarios, two VSUF-80s/160s are deployed on a NAT device, and the CPUs of the two
VSUF-80s/160s form the hot backup or cold backup relationships. In distributed scenarios, two
VSUF-80s/160s are deployed on a BRAS, and the CPUs of the two VSUF-80s/160s form the hot backup
or warm backup relationship.
Basic Concepts
When two CGN devices equipped with the VSUF-80/160 are deployed on the network, the
master and slave service boards are deployed on different chassis to achieve inter-chassis
backup. Inter-chassis backup ensures data consistency and service continuity between the
master and slave devices by triggering a master/slave switchover upon a fault on the master
service, service board, public link, or private link.
Backup Principles
Inter-chassis hot backup is associated with the following features:
l CGN HRP
CGN Huawei Redundancy Protocol (HRP) is used to carry heartbeat detection and CGN
backup packets. The packets with the HRP header carrying CGN data backup
information are sent to notify the peer device of backup session tables and user tables,
with an aim to back up the link status and ensure service continuity upon device faults.
l VRRP
Virtual Router Redundancy Protocol (VRRP) is a fault-tolerant protocol that groups
several routers into a virtual router. If the next hop of a host is faulty, VRRP switches
traffic to another router, which ensures communication continuity and reliability.
Typically, two routing devices are grouped, with one as the master chassis and the other
as the slave chassis. The master/slave relationship is determined based on the priorities
of routing devices. If VRRP detects a VSUF board fault, service traffic will be switched
from the master device to the slave device.
NOTE
Inter-chassis hot backup uses HRP to check whether the CGN board in the peer chassis is well
positioned and then determine the backup relationships of members in the inter-chassis
backup group based on VRRP. As shown in Figure 11-3, CPU 0 of the VSUFs on the master
and slave devices belong to VSM HA backup group 1, and CPU 1 belongs to VSM HA
backup group 2. The service information on CPU 0 of the VSUF on the master service is
backed up to CPU 0 of the VSUF on the slave device, and the service information on CPU 1
of the VSUF on the maser device is backed up to CPU 1 of the VSUF on the slave device.
Backup CPU
Running CPU
Inter-chassis backup includes the cold backup, warm backup, and hot backup modes. Table
11-2 lists the comparison between the cold backup, warm backup, and hot backup. Hot
backup is widely used because it delivers high reliability and minimized impact on the
network.
Troubleshooting Mechanism
Direct links are established between the master and slave NAT devices. As defined in relevant
standards, the destination IP address in VRRP packets is the multicast IP address 224.0.0.18,
and the TTL value must be 255. Therefore, VRRP packets can be transmitted over a Layer 2
network (VLAN, VLL, and VPLS), instead of a Layer 3 network. The network on which the
master and slave devices reside must be a directly connected one or a Layer 2 network. As
such, deploying direct-connect inter-chassis backup is recommended. For troubleshooting
details about inter-chassis backup for non-directly connected links, see Virtual Access over
NAT Inter-chassis Backup for Non-Directly-Connected Links.
Switch
VRRP+BFD
Switch
Switch
Switch
PW
VRRP/CGN HRP
PW Tunnel
User
Protocol
AP Internet
Network
PW
BRAS2(CGN2) CR
Switch
11.3 Applications
Figure 11-10 Centralized NAT444 inter-board hot backup solution based on port pre-
allocation
PC2
192.168.1.3
ISP Core
PC3
192.168.2.2
CR
10.20.10.12
CPE2 DHCP Web
User IP:10.20.10.12
Access Server Server
NAT IP:172.22.22.52
network
Port-range:2048-6143
BRAS2
Network segment 10.20.0.0
PC4
192.168.2.3
Figure 11-11 Distributed NAT444 inter-board hot backup based on port pre-allocation
PC1 User IP:10.10.10.12
192.168.1.1 NAT IP:172.3.12.25
Port-range:2048-6143
Syslog RADIUS
Access Server Server
network Network segment
10.10.0.0
CPE1 BRAS1
PC2 10.10.10.12 (CGN)
192.168.1.2 ISP Core
PC3 BRAS2
192.168.2.1 (CGN) SR
Network segment
Access 10.20.0.0 DHCP Web
network Server Server
User IP:10.20.10.12
CPE2
NAT IP:172.22.22.52
10.20.10.12
Port-range:2048-6143
PC4
192.168.2.2
Figure 11-12 Centralized NAT444 inter-chassis hot backup solution based on port pre-
allocation
PC1
192.168.1.2
Managing network
segment 10.10.0.0
CPE1 BRAS1
Access
network Syslog
CR1 Server
PC2
192.168.1.3
ISP Core
PC3
192.168.2.2
10.20.10.12
CPE2 DHCP Web
Access CR2 Server Server
network
BRAS2
Managing network
segment 10.20.0.0
PC4
192.168.2.3
Figure 11-13 Distributed NAT444 inter-chassis hot backup solution based on port pre-
allocation
PC1
192.168.1.1
Syslog RADIUS
BRAS1
Access Server Server
(CGN)
network
CPE1
PC2 10.10.10.12
192.168.1.2 ISP Core
Access
PC3 network
192.168.2.1 SR
Service Overview
In the virtual access over NAT inter-chassis hot backup scenario shown in Figure 11-14, the
AP is dual-homed to BRAS1 and BRAS2 equipped with CGN boards. There are no direct
links between BRAS1 and BRAS2. A PW tunnel is established between BRAS1 and BRAS2
using VE sub-interfaces to transparently transmit CGN HRP and VRRP packets. A PW tunnel
is established between BRAS1/BRAS2 and AP as the backup tunnel, with VRRP deployed on
VE sub-interfaces to negotiate the CGN master/slave relationship and the CGN service
deployed to implement NAT and associate with the VSM HA backup group and network-side
interfaces. The user-side PWIF interface is associated with the VSM HA status monitoring
group to associate service board faults with the user-side interface status. The network-side
link is deployed with the interface monitoring group, and the internal communication
interface is associated with the interface monitoring group. The internal communication
interface's status automatically changes based on the status of the interface bound to the
interface monitoring group.
BRAS1 functions as the master BRAS, and BRAS2 as the slave BRAS. The link between the
AP and BRAS1 is the master link, and the link between the AP and BRAS2 is the backup
link. Upstream traffic is sent to the network side after NAT is implemented on BRAS1, and
downstream traffic is sent to the user side after NAT is implemented on BRAS1 over the CR.
PW
va
User
PW Tunnel
AP Internet
Network VRRP
va
PW
virtual-ethernet
1/0/1.1
CR
BRAS2
(Slave)
Upstream and downstream
traffic before switching
VSUF
PW
va
User
PW Tunnel
AP Internet
Network VRRP
va
PW
virtual-
ethernet
1/0/1.1 CR
BRAS2
(Slave)
Upstream and downstream
traffic after switching
VSUF
PW
va
User
PW Tunnel
AP Internet
Network VRRP
va
PW
virtual-ethernet
1/0/1.1
CR
BRAS2
(Master)
Upstream and downstream
traffic after switching
VSUF
PW
va
PW Tunnel
User
AP Internet
Network VRRP
va
P W
virtual-
ethernet
1/0/1.1 CR
BRAS2
(Master)
Upstream and downstream
traffic after switching
VSUF
BRAS1
(Slave)
virtual-
CR
ethernet
1/0/1.1
l
User e
n Internet
Network AP n
u
T VRRP
W
P
virtual-
ethernet
1/0/1.1 CR
BRAS2
Upstream and
(Master)
downstream traffic
after switching
VSUF
Service Overview
On the network shown in Figure 11-19, a BRAS is equipped with two VSUF-80/VSUF-160
to implement inter-board hot backup, and a CR is connected to a NAT device. When both of
the two VSUF-80/VSUF-160 on the BRAS become faulty, the BRAS does not perform
distributed NAT on private network traffic. Instead, the private network traffic is forwarded
over routes to the CR and then redirected to the NAT device for centralized NAT. After the
private IP address is translated to a public IP address, the traffic goes to the Internet.
Internet
NAT device
Feature Deployment
l Deploy inter-board hot backup for distributed NAT on the BRAS.
l Deploy centralized NAT on the CR.
HA High Availability