Oam Best Practices

SERVICE PROVIDER
OAM Best Practices in Mission-Critical MPLS, IP, and Carrier Ethernet Networks
A variety of Operations, Administration, and Management (OAM) protocols and tools have been developed recently for MPLS, IP, and Ethernet networks, which provide the unparalleled power to proactively manage networks and customer Service-Level Agreements (SLAs). This paper reviews the OAM tools available in MPLS, IP, and Ethernet networks at various layers and describes best practices for choosing the right OAM tool to use for particular network deployments.
SERVICE PROVIDER
BEST PRACTICES GUIDE
CONTENTS
Overview ............................................................................................................................................................................................................................................. 3 OAM Layering ................................................................................................................................................................. 3 OAM Tools and Network Layers .................................................................................................................................... 4 Layer 2 OAM Tools .......................................................................................................................................................................................................................... 5 Layer 2 Trace ................................................................................................................................................................. 5 Port Loop Detection ....................................................................................................................................................... 6 Unidirectional Link Detection ........................................................................................................................................ 7 Single-Link LACP Keep-Alive .......................................................................................................................................... 8 IEEE 802.1ag CFM ......................................................................................................................................................... 9 Continuity Check Messages (CCM)...................................................................................................................... 11 Loopback Messages (LBM).................................................................................................................................. 11 Linktrace Messages (LTM) ................................................................................................................................... 11 Brocade Implementation of 802.1ag: ................................................................................................................. 12 Hierarchical Fault Detection using 802.1ag ....................................................................................................... 12 IEEE 802.1ag Configuration Example ................................................................................................................. 13 IEEE 802.1ag CFM versus ITU-T Y.1731 OAM .................................................................................................... 15 ITU-T Y.1731 Performance Management ................................................................................................................... 15 IEEE 802.3ah Ethernet First Mile (EFM) Link OAM .................................................................................................... 16 Layer 2 OAM Summary ................................................................................................................................................ 17 MPLS OAM Tools ...........................................................................................................................................................................................................................18 LSP Ping ....................................................................................................................................................................... 18 LSP Traceroute ............................................................................................................................................................. 19 LSP Ping and LSP Traceroute Considerations ............................................................................................................ 19 BFD for RSVP-TE LSPs ................................................................................................................................................. 20 MPLS OAM Summary ................................................................................................................................................... 21 IP and VRF OAM Tools .................................................................................................................................................................................................................22 IP and VRF Ping ............................................................................................................................................................ 22 IP and VRF Traceroute ................................................................................................................................................. 22 BFD for OSPFv2, OSPFv3, IS-IS, and BGP4 ................................................................................................................ 23 IP and VRF OAM Summary .......................................................................................................................................... 25 Summary .........................................................................................................................................................................................................................................26
OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks
2 of 26
SERVICE PROVIDER
OVERVIEW
A variety of OAM tools have been developed in recent years for MPLS, IP, and Ethernet networks. These tools provide unparalleled power for an operator to proactively manage networks and customer ServiceLevel Agreements (SLAs). These OAM tools address fault detection, fault verification, and fault isolation and provide proactive detection of service degradation, service performance monitoring, and SLA verification. In MPLS, IP, and Ethernet networks, Operations, Administration, and Management (OAM) and Provisioning (OAM&P) encompasses the Management Plane (see Figure 1), represented by Network Management Systems (NMS) and Element Management Systems (EMS), and the Network Plane, represented by Network Elements (NE) and the OAM tools that run across NEs. This white paper reviews the OAM tools available in MPLS, IP, and Ethernet networks at various layers of the networking stack and recommends and reviews best practices for choosing the right OAM tool to use for a particular network deployment.
Management Plane (NMS, EMS)
OAM&P
Network Plane (Network Elements)
The scope of this paper is OAM tools across Network Elements
Figure 1. OAM tools
OAM Layering
OAM tools can be classified into three main types based on the OAM layer (Figure 2): Service Layer OAM. Tools applicable to services on an end-to-end basis Network Layer OAM. Tools applicable to services over a particular network Transport Layer OAM. Tools applicable to the transport layer of the network
Service Layer OAM Network Layer OAM Transport Layer OAM
Figure 2. OAM layers These OAM layers are hierarchical in nature. For example, in Figure 3 the Service Layer OAM for Operator A can be seen as a Transport Layer OAM for the service provider, who sees the service provided by Operator A as a transport tunnel for the customer.
OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 3 of 26
SERVICE PROVIDER
NOTE: The terms customer, service provider, and operator are commonly used to reflect the business relationships that often exist among organizations and individuals. An operator provides a single Layer 2 or Layer 3 backbone network to a service provider. An operator can be identical to, or a part of the same organization as, a service provider. The best OAM tools to use at a particular network layer depend on the type of network. For example, in Figure 3, Operator A has an MPLS network and uses MPLS OAM tools, while Operator B has an Ethernet network and uses Ethernet OAM tools.
Service Provider
Customer network Site 1 MPLS Operator A Network Ethernet Operator B Network Customer network Site 2
Service OAM
MPLS OAM (Operator A) Ethernet OAM (Operator B)
Link OAM
Link OAM
Link OAM
Figure 3. Customer, operator, and service provider views of OAM layering
OAM Tools and Network Layers

Each network layer has its own best-suited OAM tools. Figure 4 lists common OAM tools applicable to Layer 2, MPLS, IP (Layer 3), and the Virtual Private Network (VPN), which includes Layers 2 and 3 VPNs. Note that certain OAM tools, for example,802.1ag CFM and Y.1731 PM, are applicable to Layer 2 networks and also to Layer 2 VPN services, as shown in Figure 4. The following sections address the OAM tools shown in Figure 4.
VPN IP MPLS Layer 2
VRF Ping and Traceroute (L3VPN) Ping and Traceroute LSP Ping and Traceroute
Layer 2 trace Port loop UDLD detection 802.1ag CFM for VPLS/VLL Y.1731 PM for VPLS/VLL (L2VPN)
BFD for OSPF and IS-IS BFD for RSVP-TE LSPs

Single-link 802.1ag 802.3ah CFM/ LACP keep-alive Y.7131 PM EFM OAM
Figure 4. Each network layer has its own best-suited OAM tools
4 of 26
SERVICE PROVIDER
LAYER 2 OAM TOOLS

This section addresses the Layer 2 OAM tools listed in Figure 4. These tools function in Layer 2 networks to monitor: Layer 2 services and connectivity (VLANs): Layer 2 Trace, Port Loop Detection, 802.1ag CFM, and Y.1731 PM Layer 2 links: UDLD, single-link keep-alive, and 802.3ah EFM OAM
Layer 2 Trace
Layer 2 Trace is a Brocade proprietary OAM tool that traces the traffic path in a VLAN. Layer 2 Trace is run on demand using a CLI command. Layer 2 Trace can be used to trace a particular IP, MAC, or hostname in a given VLAN. The Layer 2 Trace command (trace-l2) probes the entire Layer 2 topology and displays the input or output ports of each hop in the path, the round trip travel time of each hop, and each hop's Layer 2 protocol (such as STP, RSTP, 802.1w, SSTP, metro ring, or route-only). Figure 5 shows an example of Layer 2 Trace command (trace-l2) executed for the given network configuration. The probed Layer 2 information is discarded after 10 minutes or when a new trace-l2 command is issued. Layer 2 Trace can also display hops that form a forwarding loop in a VLAN. Figure 6 is an example in which the active topology for VLAN 2 forms a forwarding loop. In this case, Layer 2 Trace on VLAN 2 detects the forwarding loop and issues the indicated warning message. Layer 2 Trace configuration considerations: The devices that will participate in the Layer 2 Trace protocol must be assigned to a VLAN and all devices on that VLAN must be Brocade devices that support the Layer 2 Trace protocol. Devices that do not support the Layer 2 Trace protocol simply forward Layer 2 Trace packets without a reply and are transparent to the Layer 2 Trace protocol. The destination for the packet with the trace-l2 protocol must be a device that supports the Layer 2 Trace protocol. The destination cannot be a client, such as a personal computer, or devices from other vendors.
Figure 5. Layer 2 Trace example

SERVICE PROVIDER
Figure 6. Layer 2 trace in a loop topology
Port Loop Detection

Port Loop Detection is a Brocade proprietary OAM toll used to detect Layer 2 forwarding loops. Upon detecting a Layer 2 forwarding loop, the Port Loop Detection tool disables the errant port(s). The device can be configured to automatically re-enable ports after a timeout period. This OAM tool sends special protocol packets from the device and detects Layer 2 forwarding loops when these packets are received on ports on the same device. Layer 2 Trace can also detect forwarding loops. However, the difference is that Port Loop Detection does not require manual interaction to detect loops. That is, Layer 2 Trace is run on demand using a CLI command, while Port Loop Detection runs continuously to provide automatic detection and reduce downtime due to misconfigurations. Port Loop Detection supports two modes of operation: Strict mode. Detects a Layer 2 forwarding loop where packets loop back to the same physical port, that is, a hair pin loop.
NetIron(config)#interface ethernet 1/1 NetIron(config-if-e1000-1/1)#loop-detection
Loose mode. Detects Layer 2 forwarding loops for a given VLAN or a VLAN group. Loose mode floods test packets to the entire VLAN or VLAN group. See Figure 7.
NetIron(config)#vlan 20 NetIron(config-vlan-20)#loop-detection NetIron(config)#vlan-group 10 NetIron(config-vlan-group-10)#add-vlan 1 to 100 NetIron(config-vlan-group-10)#loop-detection
6 of 26
SERVICE PROVIDER
Figure 7. Port Loop Detection example (loose mode)
Unidirectional Link Detection

Unidirectional Link Detection (UDLD) is a Brocade proprietary OAM tool used to monitor an Ethernet link between two Brocade NetIron devices and to provide fast detection of link failures. Ports enabled for UDLD exchange proprietary health-check packets once every keep-alive interval. The keep-alive interval can be configured between 100 ms and 6000 ms in increments of 100 ms. The default keep-alive interval is 500 ms. If a port does not receive a health-check packet from the port at the other end of the link after a number of keep-alive retry intervals, UDLD brings the port down. As a consequence, UDLD brings the ports on both ends of the link down if the link goes down on one direction. Keep-alive retry intervals can be configured from 3 to 10, and the default is 5. When UDLD is enabled on a port, the port transitions into an init state to detect if the other end supports UDLD. The port does not go down if the other end is not UDLD-enabled. Figure 8 illustrates UDLD used to monitor a link between two nodes. Figure 9 is an example of a global show UDLD command. The show command also supports showing information for a specific port (not shown in the figure). Configuration considerations include the following: UDLD is supported only on Ethernet ports. To configure UDLD on a LAG group, you must configure the feature on each port of the group individually. Configuring UDLD on a LAG groups primary port enables the feature on that port only. Dynamic LAG is not supported. If you want to configure a LAG group that contains ports on which UDLD is enabled, you must remove the UDLD configuration from the ports. After you create the LAG group, you can add the UDLD configuration back. Tagged UDLD is also supported:
NetIron(config)# link-keepalive ethernet 1/18 vlan 22
Figure 8. UDLD configuration example

SERVICE PROVIDER
Figure 9. Displaying UDLD information
Single-Link LACP Keep-Alive

The Single-Link Link Aggregation Control Protocol (LACP) Keep-Alive OAM tool supports a single-port Link Aggregation Group (LAG). Single-Link LACP Keep-Alive is used to monitor an Ethernet link between two devices and to provide for fast detection of link failures. This is similar to the UDLD OAM tool, except that the Single-Link LACP Keep-Alive OAM tool uses LACP, which is a standard protocol, instead of a proprietary protocol between nodes. When should you use Single-link LACP Keep-Alive instead of UDLD? UDLD is a proprietary protocol. Single-link LACP Keep-Alive can be used to interoperate with third-party equipment also supporting this feature.
With Single-Link LACP Keep-Alive, LACP PDUs are exchanged between the two nodes to determine if the connection between the devices is still active. If no LACP PDUs are received from the other node after 3 lacp-timeout periods, a timeout event occurs and the port is blocked. The LACP keep-alive PDUs can be sent every 1 second (lacp-timeout short) or every 30 seconds (lacptimeout long). Since a timeout is declared after missing 3 consecutive LACP keep-alive PDUs, a timeout can be declared in 3 seconds or 90 seconds, depending on the selected LACP keep-alive PDUs interval. To configure single-link LACP keep-alive timeout intervals:
NetIron(config)# lacp-timeout short | long
Figure 10 shows an example of a single-link LACP keep-alive configuration.
Figure 10. Single-Link LACP Keep-Alive example
8 of 26
SERVICE PROVIDER
IEEE 802.1ag CFM

The IEEE 802.1ag Connectivity Fault Management (CFM) OAM tool facilitates path discovery, fault detection, fault verification and isolation, fault notification, and fault recovery. CFM terminology (see Figure 11): MD (Maintenance Domain). The part of a network for which faults in Layer 2 connectivity can be managed. MEP (Maintenance End Point). A Maintenance Point (MP) at the edge of a domain that actively sources CFM messages. There are two types of MEPs, as shown in Figure 12: Up (inward) MEP: Considering a MEP on a given physical port, an up MEP sends 802.1ag messages into the node. Down (outward) MEP: A down MEP sends 802.1ag messages out of the node. Note that up and down MEPs can be used to include or exclude more of the internal path inside a switch, as shown in Figure 13. MIP (Maintenance Intermediate Point). A maintenance point internal to a domain that only responds when triggered by certain CFM messages. A MIP does not actively source CFM messages. MA (Maintenance Association). A set of MEPs established to verify the integrity of a single service instance, for example, a VLAN or a VPLS. ME (Maintenance Entity). A point-to-point relationship between two MEPs within a single MA. MD Level. An integer from 0 to 7 in a field in a CFM PDU that is used, along with the VLAN ID, to identify which MIPs/MEPs would be interested in the contents of a CFM PDU. MD levels are used to separate the MAs of customer, service provider, and operators. MD levels 802.1ag recommendations for customers, service providers, and operators are shown in Figure 11. CFM Hierarchy. MD levels create a hierarchy in which 802.1ag messages sent by customer, service provider, and operators are processed only by MIPs and MEPs at the respective level of the message. A common practice is for the service provider to set up a MIP at the customer MD level at the edge of the network, as shown in Figure 11, to allow the customer to check continuity of the Ethernet service to the edge of the network. Similarly, operators set up MIPs at the service provider level at the edge of their respective networks, as shown in Figure 11, to allow service providers to check the continuity of the Ethernet service to the edge of the operators networks. Inside an operator network, all MIPs are at the respective operator level, also shown in Figure 11.
9 of 26
SERVICE PROVIDER
Service Provider
Customer network Site 1 MPLS Operator A Network Ethernet Operator B Network Customer network Site 2
Down MEP Up MEP MEP MIP
ME
Customer MA
MD level 5 (7, 6, or 5) MD level 3 (4 or 3) MD level 1 (2, 1, or 0)
ME
Service Provider MA
Operator A MA
ME
Operator B MA
ME
Figure 11. IEEE 802.1ag terminology

Switch
Up MEP Down MEP Port Up MEP Down MEP Port
Figure 12. Up and down MEPs

Switch
Up MEP Down MEP Down MEP
Switch
Up MEP
Figure 13. Using up and down MEPs to include or exclude the path inside a switch IEEE 802.1ag CFM supports Continuity Check Messages (CCM), Linktrace, and Loopback Messages, which are described in the following sections.
10 of 26
SERVICE PROVIDER
Continuity Check Messages (CCM)

CCMs are periodic hello messages multicast by a MEP within the maintenance domain to detect continuity failures. If a MEP stops receiving periodic CCMs from a peer MEP on a remote bridge, it assumes that either the remote bridge has failed or the continuity of the path between the two bridges has been interrupted.
Figure 14. 802.1ag Continuity Check Messages (CCM)
Loopback Messages (LBM)

LBM is a Unicast message used to verify the connectivity between a MEP and a peer MEP or MIP. Loopback messages are also used for fault localization. To verify the connectivity between a MEP and a peer MEP or a MIP, an LBM is initiated by the source MEP with a destination MAC address set to the MAC address of desired peer MEP or MIP. The receiving MIP or MEP responds to the LBM with a (Unicast) Loopback Reply (LBR) addressed to the source MEP. LBM helps a MEP identify the location of a continuity fault along a given MA. A MIP in front of the continuity fault responds with a loopback reply. A MIP or MEP behind the continuity fault does not respond. For loopback to work, the MEP must know the MAC address of the target MIP or MEP. These MAC addresses can be discovered using the Linktrace Message.
Figure 15. 802.1ag Loopback Message (LBM)
Linktrace Messages (LTM)

LTM is a multicast message used by a source MEP to trace the path to other MEPs in the same MA. All reachable MIPs and MEPs respond back with a Linktrace Reply (LTR) message addressed to the source MEP. The originating MEP can then determine the MAC addresses of all MIPs and MEPs belonging to the same MA. Note that the source MEP sends a single LTM to the next hop along the trace path. However, it can receive many LTR messages from different MIPs along the trace path and different MEPs terminating the branches of the trace path. Linktrace can also be used when no faults are apparent in order to discover the routes normally taken by data through the network.
Figure 16. 802.1ag Linktrace Message (LTM)
11 of 26
SERVICE PROVIDER
Brocade Implementation of 802.1ag:

CCM period 3.3 ms, 10 ms, 100 ms, 1 sec, 1 min, 10 min Support for minimum CCM timers (3.3 ms) using hardware offload Support for MIPs and up/down MEPs Support for all 8 MD levels (0 7) Support for the following types of end-points/services VLANs, VPLS, and VLL
Hierarchical Fault Detection using 802.1ag

As shown in Figure 11, 802.1ag CFM defines a domain hierarchy in which customers, service providers, and operators use different MD levels. This hierarchy is also used for fault detection. Figure 17 illustrates an example in which a customer has an Ethernet service between Sites 1 and 2. This Ethernet service is provided by Operators A and B. Operator B supports the service at the core with an MPLS network. Operator A supports the service at Metro Locations 1 and 2 using a Layer 2 Ethernet network. In Figure 17, a service continuity fault occurs inside Operator Bs network. The customer can detect an endto-end service continuity fault using CCM, but it cannot determine the location of the fault within the operators network. Operator A can detect that a service continuity fault exists within Operator Bs network. Operator B can detect the service continuity fault, but it cannot isolate the location of the continuity fault using 802.1ag CFM, since it has an MPLS network. Operator B needs to use MPLS OAM tools to isolate the fault location.
Figure 17. Example of 802.1ag hierarchical fault detection (refer to the numbered items below)
12 of 26
SERVICE PROVIDER
To simplify this example, the service provider level is not shown. If it were, the service provider would be represented by the overall network from Operator A in Location 1 through Operator B to Operator A in Location 2. The following is an example of how this fault can be detected at the different levels of the hierarchy: 1. 2. 3. 4. 5. The customer detects a service continuity fault using CCMs. Using Linktrace, the customer finds that the fault is beyond the MIPs at the border of Operator A. Provider A detects a service continuity fault using CCMs. Using Linktrace, Provider A determines that the fault is inside Operator Bs network. Operator B detects a service continuity fault using CCMs.
Operator B uses MPLS OAM tools to determine the location of the fault in its MPLS network. See the MPLS OAM section for details on MPLS-specific OAM tools. This statement is included here to emphasize the fact that you need to use the appropriate OAM tools for the type of network being used. In this case, Operator B has an MPLS network and needs to use MPLS OAM tools. Operator A has a Layer 2 Ethernet network and can use 802.1ag CFM. Note that Operator Bs MPLS network is required to support 802.1ag CFM messages over VPLS and VLL to allow customers and Operator A to use 802.1ag end-to-end. 1 Note that the customer, Operator A, and Operator B can concurrently and independently detect the continuity fault and run Linktrace to determine the location of the fault. The steps above are numbered to allow for easy reference to the respective actions depicted in Figure 17. The numbering does not imply an ordered sequence of events. That is, Operator A does not have to wait for the customer to tell it that the service is broken before it runs its own Continuity Check. Note that the CCMs shown in Figure 17 can be set up to run continuously to detect potential continuity faults or they can be set up on demand as needed.
IEEE 802.1ag Configuration Example

In Figure 18, a customer has a point-to-point service (VLL) over an MPLS network. In this example, the customer runs CCM at 10 ms intervals at MD level 7 between CE1 and CE2. The service provider runs CCM at 10 ms intervals at MD level 4 between PE1 and PE2. Figure 19 and Figure 20 show example configurations for CE1, CE2, PE1, and PE2 shown in Figure 18.
Brocade supports 802.1ag CFM over VPLS and VLL to allow Ethernet OAM to function end-to-end over an MPLS core network.
13 of 26
SERVICE PROVIDER
Customer CCM @ 10 sec Service provider CCM @ 10sec 4 7 1/1 CE1 1/1 PE1 7 7 4 7 VLL MPLS PE2 4 7 2/1 7 2/1 CE2
Customer down MEP Customer MIP Service Provider up MEP
Figure 18. Example of 802.1ag configuration
Figure 19. CE1 and CE2 configurations
14 of 26
SERVICE PROVIDER
Figure 20. PE1 and PE2 configurations
IEEE 802.1ag CFM versus ITU-T Y.1731 OAM

ITU-T Y.1731 OAM is a superset of IEEE 802.1ag CFM. 2 ITU-T Y.1731s ETH-CC (Ethernet Connectivity Check), ETH-LB (Ethernet Loopback), and ETH-LT (Ethernet Linktrace) OAM functions are equivalent to 802.1ag CCM, LBM, and LTM, respectively. Devices deploying 802.1ag CCM, LBM, and LTM can interoperate with devices deploying Y.1731 ETH-CC, ETH-LB, and ETH-LT, respectively. However, Y.1731 ETH-CC supports either multicast or Unicast messages, while 802.1ag CCM supports multicast messages only. Therefore, to interoperate 802.1ag CCM with Y.1731 ETH-CC, the Y.1731 device must be set up to use ETH-CC multicast messages.
ITU-T Y.1731 Performance Management

ITU-T Y.1731 Performance Management (PM) supports on-demand measurement of round-trip Frame Delay (FD) and Frame Delay Variation (FDV). These measurements are made between defined MEPs (see Figure 21). The main benefit of Y.1731 PM is for Service Level Agreement (SLA) monitoring and verification of services provided to customers in aggregation, metro, and core networks. SLA monitoring and verification is essential for delay-sensitive applications, for example, voice, and for services with SLA guarantees. The Brocade implementation supports a high-precision, hardware-based time-stamping mechanism that provides measurements with microsecond granularity. It also supports delay measurements for Layer 2 bridging services and for VPLS and VLL services.
Brocade MLX Brocade MLX
MEP 3
ETH-DM
MEP 2
Figure 21. Y.1731 delay measurement
Besides CFM and other functionality, ITU-T Y.1731 also includes Performance Management, which is addressed in this paper.
2
15 of 26
SERVICE PROVIDER
Figure 22 shows an example of the Y.1731 delay measurement between MEP3 and MEP2 shown in Figure 21. The command sends a selectable number (default is 10) of delay measurement PDUs (ETH-DM), which are time-stamped in hardware at the source and destination MEPs to achieve high-precision measurement independent of software delays. The command averages the individual measurements and lists the resulting minimum, average, and maximum delays.
Figure 22. Y.1731 delay measurement example
IEEE 802.3ah Ethernet First Mile (EFM) Link OAM

IEEE 802.3ah Ethernet First Mile (EFM) link OAM monitors and supports troubleshooting individual links. That is, 802.3ah OAM operates on a point-to-point link and does not propagate beyond a single hop. As shown in Figure 23, this IEEE standard was originally developed to monitor the link between a service provider and customer, where it is usually called the first mile link. 802.3ah EFM OAM supports the following functions: OAM discovery Used to discover the 802.3ah EFM OAM capabilities of the peer device Remote failure indication (critical events) Used to inform the peer node that the receive path of the link is non-operational Also includes communication of conditions such as dying gasp Link monitoring Can generate event notifications (alarms) when defined error thresholds are exceed Remote loopback testing Puts the peer in data loopback state 802.3ah supports two modes of operation: Active mode Normally used by a device controlled by a service provider The device can source OAM PDU packets in order to initiate an EFM OAM discovery process Passive mode Normally used by customer devices connected to a service provider device The device cannot source OAM PDU packets, but it can respond to received OAM PDUs
SERVICE PROVIDER
802.3ah OAM
802.3ah OAM
Figure 23. IEEE 802.3ah EFM OAM Figure 24 shows an example of the output of an 802.3ah EFM OAM show command. Note that the show command displays not only local link OAM information, but also remote link OAM information.
Figure 24. Example of 802.3ah EFM OAM show command
Layer 2 OAM Summary

Table 1 presents a summary of the Layer 2 OAM tools described in this section.
Layer 2 Trace Intended Application Layer 2 network troubleshooting and detection of misconfiguration Layer 2 topology discovery, Layer 2 loop detection Port Loop Detection Layer 2 network troubleshooting and detection of misconfiguration Layer 2 loop detection UDLD Single-Link Keep-Alive 802.1ag CFM Service verification Layer 2 connectivity Check, Linktrace, loopback CC: auto LT, LB: manual Yes Y.1731 PM Performance (SLA) verification One-way delay and delay variation Manual Yes 802.3ah EFM OAM Customer access verification Single-link OAM: fault detection, discovery, loopback Auto, Manual (LB) Yes
Single-link Single-link keep alive keep alive
Supports
Single-link Single-link keep alive keep alive
Generation
Manual No
Automatic No
Automatic No
Automatic Yes
Standard
17 of 26
SERVICE PROVIDER
MPLS OAM TOOLS

This section addresses the MPLS OAM tools listed in Figure 4: LSP Ping LSP Traceroute BFD for RSVP-TE LSPs
LSP Ping
LSP Ping provides OAM functionality for MPLS networks based on RFC 4379. LSP Ping is used to detect data plane failure and to check the consistency between the data plane and the control plane. LSP Ping verifies that packets that belong to a particular Forwarding Equivalence Class (FEC) actually end their MPLS path on a Label Switching Router (LSR) that is an egress for that FEC. LSP Ping sends MPLS echo requests following the same data path that normal MPLS packets would traverse (Figure 25). LDP LSP Ping and RSVP LSP Ping are supported, as shown in Figure 26 and Figure 27 respectively.
MPLS Network PE LSP LER LSR Echo Request Echo Reply LER P PE
Figure 25. LSP Ping operation
Figure 26. LDP LSP Ping example
Figure 27. RSVP LSP Ping example
18 of 26
SERVICE PROVIDER
LSP Traceroute
LSP Traceroute provides OAM functionality for MPLS networks based on RFC 4379. LSP Traceroute is used to isolate a data plane failure to a particular router and to provide LSP path tracing. With LSP Traceroute, an echo request packet is sent to each transit LSR and the LER. The echo request follows the same data path that normal MPLS packets would traverse. A transit LSR or an LER receiving the echo request checks that it is indeed a transit LSR or LER for this path and returns echo replies (Figure 28). LDP LSP Traceroute and RSVP LSP Traceroute are supported, as exemplified in Figure 29 and Figure 30, respectively.
MPLS Network PE LSP LER LSR LER P PE
Echo request
Echo replies
Figure 28. LSP Traceroute operation
Figure 29. LDP LSP Traceroute example
Figure 30. RSVP LSP Traceroute example
LSP Ping and LSP Traceroute Considerations

The following are common considerations for LSP Ping and LSP Traceroute: Redundant RSVP LSPs. LSP Ping or LSP Traceroute on a LSP is performed on the currently active path. One-to-one Fast ReRoute (FRR) LSPs. LSP Ping or LSP Traceroute on a one-to-one FRR LSP is performed on the active path. If a path switchover occurs while a Ping or Traceroute is in-progress, the echo request is sent out on the old active path. FRR bypass LSPs. You can Ping or Traceroute the protected LSP and bypass tunnel separately, e.g., by specifying the name of the LSP.
19 of 26
SERVICE PROVIDER
Transit-originated detour. The user can initiate a Ping or Traceroute operation on a transit-originated, detour LSP. Because the session name does not uniquely identify a session on a transit LSR, the user needs to specify the entire session ID (including the tunnel end-point, tunnel ID, and extended tunnel ID) for the detour LSP to which the LSP Ping or Traceroute command is applied. LSP re-optimization. If LSP re-optimization occurs while the Ping or Traceroute is in progress, the echo request will be sent out on the current LSP instance until the new instance is created.
BFD for RSVP-TE LSPs

Bidirectional Forwarding Detection (BFD) RSVP-TE LSP defines a method for rapid detection of the failure of the data path of an LSP (Figure 31). While LSP Ping can be used for this purpose, BFD for RSVP-TE LSP provides the following advantages: BFD for RSVP-TE LSP can be configured to dynamically detect data plane failure of MPLS RSVP LSPs. BFD for RSVP-TE LSP provides faster failure detection, since it does not require control plane verification as LSP Ping does. BFD for RSVP-TE LSP can be used to concurrently detect faults on a number of LSPs without manual interaction as required using LSP ping.
BFD allows for the detection of a forwarding path failure in 300 milliseconds or less (depending on the configuration).
MPLS Network PE LSP LER LSR LER P PE
BFD
Figure 31. BFD for RSVP-TE operation BFD for RSVP-TE LSP should be used selectively to monitor unreliable paths such as those through nonMPLS devices, for example, optical switches. In Figure 32, for example, the LSP traverses optical switches. The optical switches keep the links to the MPLS routers up even in the event of a failure between the optical switches. This would prevent the MPLS routers from supporting path switchover (since, as far as the MPLS routers are concerned, the link between them is up). BFD for RSVP-TE LSP would detect the LSP path failure and would trigger a path switchover. 3 Since a link failure will trigger FRR directly, the only benefit of using BFD for RSVP-TE LSP when there are no optical switches (or other transport types that would prevent MPLS routers from detecting the physical path as down) would be to detect control plane failures.
In configurations in which there is no alternative path, the LSP is brought down and the BFD session is deleted. The LSP then follows the normal retry procedures to come back up.
3
20 of 26
SERVICE PROVIDER
LSP BFD BFD
Failure
Figure 32. BFD for RSVP-TE LSP used to monitor paths through non-MPLS devices BFD for RSVP-TE LSP can be enabled or disabled on the fly at the global MPLS level 4 (see Figure 33) or for each individual RSVP LSP (see Figure 34) without affecting the LSP operational status. In addition, BFD for RSVP-TE LSP parameters can be changed on the fly without changing the state of the BFD session.
Figure 33. Enabling BFD for RSVP LSP globally
Figure 34. Enabling BFD for a specific RSVP-TE LSP
MPLS OAM Summary

Table 2 presents a summary of the MPLS OAM tools described in this section.
LSP Ping Intended Application To detect data plane failure and to check the consistency between the data plane and the control plane Connectivity verification LSP Traceroute To isolate the data plane failure to a particular router and to provide LSP path tracing Connectivity troubleshooting, fault localization Manual Yes BFD for RSVP-TE LSPs Fast data plane failure detection for RSVP LSPs Fast data plane failure detection (link may be up, but data path is down) Automatic Yes
Supports
Generation Standard
Manual Yes
The number of BFD sessions supported by the system must be taken into account when enabling BFD for RSVPTE globally.
4
21 of 26
SERVICE PROVIDER
IP AND VRF OAM TOOLS

This section addresses the IP and L3VPN (VRF) OAM tools listed in Figure 4: IP and VRF Ping IP and VRF Traceroute BFD for OSPFv2, OSPFv3, IS-IS, and BGP4
IP and VRF Ping

IP Ping is a tool used to verify connectivity at the IP level. The IP ping command sends an Internet Control Message Protocol (ICMP) echo request to the IP address or selected hostname and waits for a reply (see Figure 35). The Ping VRF option lets you ping an address on a specific L3VPN, that is, an address associated with a VRF table. Figure 36 shows an example of IPv4 Ping, while Figure 37 shows an example of IPv6 Ping. Note that Ping VRF is supported for both IPv4 and IPv6.
Source router Destination router
Echo request Echo reply
Figure 35. IP Ping operation
Figure 36. IPv4 Ping example
Figure 37. IPv6 Ping example
IP and VRF Traceroute

The IP Traceroute tool identifies the path that packets take through a network on a hop-by-hop basis. The IP Traceroute tool works by sending ICMP echo packets with varying IP Time-to-Live (TTL) values to the destination (see Figure 38). The Traceroute VRF option lets you traceroute an address on a specific L3VPN, that is, an address associated with a VRF table. Figure 39 shows an example of IPv4 Traceroute, while Figure 40 shows an example of IPv6 Traceroute. Note that Traceroute VRF is supported for IPv4 and IPv6.
22 of 26
SERVICE PROVIDER
Source router
Destination router
Figure 38. IP Traceroute operation
Figure 39. IPv4 Traceroute example
Figure 40. IPv6 Traceroute example
BFD for OSPFv2, OSPFv3, IS-IS, and BGP4

Bidirectional Forwarding Detection (BFD) defines a method for rapid detection of the failure of a forwarding path by checking that the next-hop router is alive. Without BFD enabled, it can take from 3 to 30 seconds to detect that a neighboring router is not operational (and packet losses would occur during that time). BFD can detect data path failures when a link is up, but the data path is not, for example, failures due to misconfiguration and path through optical switches (see Figure 41). BFD allows for the detection of a forwarding path failure in 300 ms or less (depending on the configuration). When BFD is enabled on a routed interface, a BFD session is automatically established when a neighbor router is discovered.
BFD BFD BFD
BFD
BFD
BFD
Failure
Link is up
Figure 41. BFD operation

SERVICE PROVIDER
Figure 42 shows an example of BFD configuration. BFD can be enabled or disabled for all interfaces or per interface for use with OSPFv2 (that is, IPv4), OSPFv3 (that is, IPv6), and IS-IS, as shown in Figure 43, Figure 44, and Figure 45, respectively.
Figure 42. BFD configuration example
Figure 43. Enabling/disabling BFD for OSPFv2 for all interfaces (top) or per interface (bottom)
Figure 44. Enabling/disabling BFD for OSPFv3 for all interfaces (top) and per interface (bottom)
Figure 45. Enabling/disabling BFD for IS-IS for all interfaces (top) and per interface (bottom)
24 of 26
SERVICE PROVIDER
BFD for BGP4 supports single-hop and multi-hop BFD on Ethernet, POS, and Virtual Interfaces. BFD for BGP4 can be enabled or disabled at the global BGP router level, for each individual peer, or for a peer group, as shown in Figure 46, Figure 47, and Figure 48, respectively.
Figure 46. Enabling/disabling BFD globally for BGP4
Figure 47. Enabling/disabling BFD for a specific BGP4 peer
Figure 48. Enabling/disabling BFD for a BGP4 peer group
IP and VRF OAM Summary

Table 3 presents a summary of the IP and VRF OAM tools described in this section.
IP Ping VRF Ping Intended Application Connectivity verification at the IP level Connectivity verification Manual Yes IP Traceroute VRF Traceroute Identification of the path that IP packets take through a network on a hop-by-hop basis Connectivity troubleshooting, fault localization Manual Yes BFD for OSPFv2, OSPFv3, IS-IS, BGP4 Fast data path failure detection Data path failure detection (link may be up, but data path is down) Automatic Yes
Supports
Generation Standard
25 of 26
SERVICE PROVIDER
SUMMARY
This paper reviewed OAM tools available for MPLS, IP, and Ethernet networks at various layers of the stack and reviewed best practices for choosing the right OAM tool to use in a particular network deployment. These tools provide unparalleled power for an operator to proactively manage networks and customer Service Level Agreements (SLAs). These OAM tools address fault detection, fault verification, and fault isolation; enable proactive detection of service degradation; and provide service performance monitoring and SLA verification.
2010 Brocade Communications Systems, Inc. All Rights Reserved. 11/10 GS-BP-356-00 Brocade, the B-wing symbol, BigIron, DCFM, DCX, Fabric OS, FastIron, IronView, NetIron, SAN Health, ServerIron, TurboIron, and Wingspan are registered trademarks, and Brocade Assurance, Brocade NET Health, Brocade One, Extraordinary Networks, MyBrocade, VCS, and VDX are trademarks of Brocade Communications Systems, Inc., in the United States and/or in other countries. Other brands, products, or service names mentioned are or may be trademarks or service marks of their respective owners. Notice: This document is for informational purposes only and does not set forth any warranty, expressed or implied, concerning any equipment, equipment feature, or service offered or to be offered by Brocade. Brocade reserves the right to make changes to this document at any time, without notice, and assumes no responsibility for its use. This informational document describes features that may not be currently available. Contact a Brocade sales office for information on feature and product availability. Export of technical data contained in this document may require an export license from the United States government. OAM Best Practices in Mission Critical MPLS, IP, and Carrier Ethernet Networks 26 of 26

Oam Best Practices

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Oam Best Practices

Uploaded by

Copyright:

Available Formats

SERVICE PROVIDER

BEST PRACTICES GUIDE

BEST PRACTICES GUIDE

Network Plane (Network Elements)

The scope of this paper is OAM tools across Network Elements

Figure 1. OAM tools

BEST PRACTICES GUIDE

Figure 3. Customer, operator, and service provider views of OAM layering

OAM Tools and Network Layers

BFD for OSPF and IS-IS BFD for RSVP-TE LSPs

BEST PRACTICES GUIDE

LAYER 2 OAM TOOLS

Figure 5. Layer 2 Trace example

BEST PRACTICES GUIDE

Figure 6. Layer 2 trace in a loop topology

Port Loop Detection

BEST PRACTICES GUIDE

Figure 7. Port Loop Detection example (loose mode)

Unidirectional Link Detection

Figure 8. UDLD configuration example

BEST PRACTICES GUIDE

Figure 9. Displaying UDLD information

Single-Link LACP Keep-Alive

Figure 10 shows an example of a single-link LACP keep-alive configuration.

Figure 10. Single-Link LACP Keep-Alive example

BEST PRACTICES GUIDE

IEEE 802.1ag CFM

BEST PRACTICES GUIDE

Down MEP Up MEP MEP MIP

MD level 5 (7, 6, or 5) MD level 3 (4 or 3) MD level 1 (2, 1, or 0)

Figure 11. IEEE 802.1ag terminology

Figure 12. Up and down MEPs

BEST PRACTICES GUIDE

Continuity Check Messages (CCM)

Figure 14. 802.1ag Continuity Check Messages (CCM)

Loopback Messages (LBM)

Figure 15. 802.1ag Loopback Message (LBM)

Linktrace Messages (LTM)

Figure 16. 802.1ag Linktrace Message (LTM)

BEST PRACTICES GUIDE

Brocade Implementation of 802.1ag:

Hierarchical Fault Detection using 802.1ag

BEST PRACTICES GUIDE

IEEE 802.1ag Configuration Example

BEST PRACTICES GUIDE

Customer down MEP Customer MIP Service Provider up MEP

Figure 18. Example of 802.1ag configuration

Figure 19. CE1 and CE2 configurations

BEST PRACTICES GUIDE

Figure 20. PE1 and PE2 configurations

IEEE 802.1ag CFM versus ITU-T Y.1731 OAM

ITU-T Y.1731 Performance Management

Figure 21. Y.1731 delay measurement

BEST PRACTICES GUIDE

Figure 22. Y.1731 delay measurement example

IEEE 802.3ah Ethernet First Mile (EFM) Link OAM

BEST PRACTICES GUIDE

Figure 24. Example of 802.3ah EFM OAM show command

Layer 2 OAM Summary

Single-link Single-link keep alive keep alive

Single-link Single-link keep alive keep alive

BEST PRACTICES GUIDE

MPLS OAM TOOLS