You are on page 1of 89

Layer 2 Loop Troubleshooting

Issue 01

Date 2016-10-25

HUAWEI TECHNOLOGIES CO., LTD.


Copyright © Huawei Technologies Co., Ltd. 2016. All rights reserved.
No part of this document may be reproduced or transmitted in any form or by any means without prior
written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions


and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.

All other trademarks and trade names mentioned in this document are the property of their respective
holders.

Notice
The purchased products, services and features are stipulated by the contract made between Huawei and
the customer. All or part of the products, services and features described in this document may not be
within the purchase scope or the usage scope. Unless otherwise specified in the contract, all statements,
information, and recommendations in this document are provided "AS IS" without warranties, guarantees
or representations of any kind, either express or implied.
The information in this document is subject to change without notice. Every effort has been made in the
preparation of this document to ensure accuracy of the contents, but all statements, information, and
recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.


Address: Huawei Industrial Base
Bantian, Longgang
Shenzhen 518129
People's Republic of China

Website: http://www.huawei.com

Email: support@huawei.com

Issue 01 (2016-10-25) Huawei Proprietary and Confidential i


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting
Layer 2 Loop Troubleshooting Contents

Contents

1 Layer 2 Loop Troubleshooting....................................................................................................1


1.1 Overview........................................................................................................................................................................1
1.2 How to Detect a Loop.....................................................................................................................................................3
1.2.1 General Method...........................................................................................................................................................3
1.2.2 Fault Diagnosis Process...............................................................................................................................................4
1.2.3 Fault Locating Procedure............................................................................................................................................4
1.2.3.1 Checking Interface Traffic........................................................................................................................................4
1.2.3.2 Checking MAC Address Flapping............................................................................................................................7
1.2.3.3 Checking Whether a Loop Exists After Loop Detection Is Configured.................................................................11
1.2.3.4 Checking the CPU Usage.......................................................................................................................................13
1.3 How to Quickly Remove a Loop..................................................................................................................................14
1.4 How to Find Out the Root Cause.................................................................................................................................15
1.4.1 Checking Whether the Loop Is Caused by Recent Construction or Configuration Modification.............................15
1.4.2 Checking Whether a Typical Loop Issue Occurs......................................................................................................17
1.4.3 Collecting Information..............................................................................................................................................21
1.5 Hardening and Optimizing the Network......................................................................................................................23
1.6 Typical Loop Troubleshooting Cases...........................................................................................................................23
1.6.1 Interconnection Problem...........................................................................................................................................24
1.6.1.1 MAC Address Flapping Occurs on a Non-Huawei Device....................................................................................24
1.6.1.2 ATAEs Fail to Interwork with MSTP-enabled Switches Due to a Software Problem...........................................25
1.6.1.3 RRPP Temporary Loop Occurs Because the Interface Up Time on the Switch and CX600 Is Different..............27
1.6.2 Hardware Connection Problem.................................................................................................................................29
1.6.2.1 RRPP Does Not Take Effect on an S9300 Because a Board Is Loose...................................................................29
1.6.2.2 Incorrect Device Connections Cause Broadcast Storm..........................................................................................30
1.6.3 Networking and Configuration Change.....................................................................................................................31
1.6.3.1 Improper Server Networking Causes MAC Address Flapping..............................................................................31
1.6.3.2 Incorrect Device Connection Triggers Root Protection.........................................................................................32
1.6.3.3 Network Construction Causes a Loop....................................................................................................................34
1.6.3.4 SEP Deletion on a Faulty Port Causes the Switch to Be Out of Management.......................................................37
1.6.4 Misconfigurations......................................................................................................................................................39
1.6.4.1 Services Are Interrupted When Ports Are Not Deleted from VLAN 1..................................................................39
1.6.4.2 Services Are Interrupted Because BPDU Is Not Enabled on Switch Ports............................................................41

Issue 01 (2016-10-25) Huawei Proprietary and Confidential ii


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting
Layer 2 Loop Troubleshooting Contents

1.6.4.3 Switch Ports Connected to Terminals Are Not Configured as STP Edge Ports. When Booting from the Network
Adapter, Some Terminals Cannot Obtain IP Addresses.....................................................................................................42
1.6.4.4 STP Convergence Cannot Be Adjusted in Other MSTIs But Not MSTI 0 Because MST Region Configurations
Are Different......................................................................................................................................................................44
1.6.4.5 Inconsistent MSTP Packet Formats Cause Ports to Be Down...............................................................................46
1.6.4.6 RRPP Multi-Instance Causes a Temporary RRPP Loop........................................................................................48
1.6.4.7 RRPP Master Node's Working Mode Is Different from That of Transit Nodes, Which Makes MAC Entries Fail
to Be Updated.....................................................................................................................................................................50
1.6.4.8 Users on a Transit Node and Downstream Nodes Connected to the Transit Node Cannot Go Online..................51
1.6.4.9 An RRPP Loop Occurs Due to Original Multi-Instance Configuration.................................................................53
1.6.4.10 loopback internal Causes a Loop..........................................................................................................................55
1.6.4.11 Services Are Interrupted After Smart Link Master and Slave Interfaces Are Switched.......................................56
1.6.5 Improper Configurations...........................................................................................................................................58
1.6.5.1 A Large Number of TC BPDUs Cause an ARP Learning Error on a Modular Switch..........................................58
1.6.5.2 Many TC BPDUs Cause a High CPU Usage.........................................................................................................60
1.6.5.3 An MSTP Loop Causes a High CPU Usage...........................................................................................................62
1.6.5.4 STP Convergence Is Abnormal When an S9300 Interface Processes BPDUs.......................................................63
1.6.5.5 STP Flapping Occurs Because the STP Timeout Interval on the ATAE Device Is Incorrectly Calculated............65
1.6.5.6 RSTP Cannot Provide Fast Convergence When the S6500 Port Connected to the S6500 Changes from Down to
Up.......................................................................................................................................................................................66
1.6.5.7 Unicast Suppression Causes RRPP Flapping for One Hour...................................................................................68
1.6.5.8 Unknown Unicast Suppression Causes RRPP Flapping........................................................................................70
1.6.5.9 Services on the RRPP Ring Consisting of CX600 and S3300 Are Interrupted......................................................72
1.6.5.10 ERPS Becomes Invalid When RTN Interconnects with an S Switch...................................................................73
1.6.6 Pseudo Loops.............................................................................................................................................................75
1.6.6.1 MAC Address Flapping Occurs But No Loop Is Detected....................................................................................75
1.6.7 Others........................................................................................................................................................................76
1.6.7.1 The S2300SI Configured with Loopback Detection Cannot Detect Loops...........................................................76
1.6.7.2 The OSPF Neighbor Relationship Is Down Due to a Loop on the S Switch.........................................................77
1.6.7.3 Packet Loss Due to a Loop in Layer 2 Forwarding................................................................................................78
1.7 FAQ..............................................................................................................................................................................79
1.7.1 Can a Switch Transparently Transmit BPDUs?.........................................................................................................79
1.7.2 What Are the Basis for STP Calculation? Will STP Topology Be Changed When Port Rate Is Changed?..............80
1.7.3 Does a Switch Support MAC Address Flapping Detection?.....................................................................................80
1.7.4 After LDT or LBDT Detects a Loop on an Interface, the Interface Is Blocked. Can the Blocked Interface Continue
to Send Protocol Packets?..................................................................................................................................................81
1.7.5 Can Loopback Detection Be Used with VLAN Mapping on an Interface?..............................................................81
1.7.6 What Is the Destination MAC Address of SEP Packets?..........................................................................................81
1.7.7 How Many Modes Is Available to Block an Interface?.............................................................................................81
1.7.8 After the SEP Topology Changes, Which Ring Network Protocols Will Update Their Forwarding Tables?...........81
1.7.9 What Is the Destination MAC Address of RRPP Packets?.......................................................................................81
1.7.10 What Are the Notes About Configuring RRPP?......................................................................................................82
1.7.11 How Does RRPP Implement Fast Switching?.........................................................................................................82

Issue 01 (2016-10-25) Huawei Proprietary and Confidential iii


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting
Layer 2 Loop Troubleshooting Contents

1.7.12 Why Does the display Command Not Display Statistics About Health Packets on an RRPP Transit Node?.........82
1.7.13 How Is Load Balancing Implemented When RRPP Is Deployed?..........................................................................83
1.7.14 What Is the Maximum Number of Devices That Can Be Deployed in an RRPP Ring?.........................................83
1.7.15 Do S Series Switches Support Sub-rings?...............................................................................................................83
1.7.16 Can ERPS Be Used with Other Ring Network Protocols on the Same Network?..................................................83
1.7.17 Does ERPS on S Series Switches Support Load Balancing?..................................................................................83
1.7.18 Can ERPS Be Configured on an Eth-Trunk?..........................................................................................................83

Issue 01 (2016-10-25) Huawei Proprietary and Confidential iv


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1 Layer 2 Loop Troubleshooting

1.1 Overview
1.2 How to Detect a Loop
1.3 How to Quickly Remove a Loop
1.4 How to Find Out the Root Cause
1.5 Hardening and Optimizing the Network
1.6 Typical Loop Troubleshooting Cases
1.7 FAQ

1.1 Overview
Definition
To improve reliability of an Ethernet switching network, device redundancy and link
redundancy are commonly used methods. However, many factors such as networking
adjustment, configuration modification, and upgrade/migration, loops may still occur. In
Figure 1.1, loops will occur if each two devices are connected, and broadcast storm will occur
if no loop prevention protocol is configured or network configurations are modified.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 1


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 Link redundancy on the Ethernet switching network

The major harm of Layer 2 loop is broadcast storm. If no loop has occurred on an Ethernet,
broadcast Ethernet frames are flood on the network to ensure that they can be received by
every device. With sufficient bandwidth, each bridge forwards received broadcast frames to
all interfaces except the interface receiving these frames. However, if a loop occurs, this
broadcast mechanism will affect the entire network.
When broadcast storm is generated, Ethernet frames are forwarded permanently, and the
forwarding speed reaches or approximates the line speed on an interface to consume link
bandwidth. According to Ethernet forwarding rules, these broadcast frames are copied to all
interfaces. Therefore, the entire network is full of broadcast frames. Assume that an Ethernet
uses GE connections, every link is full of broadcast frames at the speed of 1000 M/s. Other
data packets cannot be forwarded.
In a broadcast domain, if Layer 2 devices forward broadcast frames repeatedly, broadcast
storm will occur. The broadcast storm causes the MAC table unstable, affects services,
degrades communication quality, or even interrupts communication.
To prevent loops and ensure network reliability, the following loop prevention protocols can
be configured on switches:
 STP/RSTP/MSTP
 RRPP
 SEP
 Smart Link
 ERPS
In addition, Huawei S series switches support the following loop detection functions:
 Loop Detection

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 2


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

 Loopback Detection
This document describes how to identify Layer 2 loops.

Purpose
This is a guide for Huawei engineers to remove Layer 2 loops, including:
 Helping frontline service engineers describe the fault symptom and determine fault range
 Helping GTAC engineers collect NE information, analyze abnormalities of NEs, and
quickly locate the faulty NE and service
 Helping R&D engineers locate the fault

1.2 How to Detect a Loop


1.2.1 General Method
On a stably running network, the following factors may cause a fault:
 Network adjustment: such as network topology adjustment, configuration modification,
and upgrade/migration
 Network environment change: such as network storm, user online behavior change
(holidays, promotion activity, use of smart terminals), power/temperature change, fiber
disconnection, change to daylight saving time, microwave transmission affected by
weather change (rain/fog), and accident (flood/fire/earthquake/lightening)
 Network device failure: such as software bug, hardware aging (board/fiber/optical
module)
The abnormalities will be reflected in the traps, logs, traffic statistics, or port status on the
certain NE. Therefore, to locate a fault, you need to quickly determine the fault occurrence
time and fault impact range, know the operations that have been performed and affected NEs,
and find out the faulty NE to locate the root cause.
If one or more symptoms in Figure 1.1 appear, there is a high probability that a Layer 2 loop
has occurred.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 3


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 Layer 2 loop symptoms

1.2.2 Fault Diagnosis Process


Check whether a Layer 2 loop occurs . There are four methods: checking interface traffic,
checking MAF flapping traps, configuring loop detection, and checking CPU usage. The four
methods can be used in any sequence. Choose one or multiple methods to accurately
determine the fault type.

1.2.3 Fault Locating Procedure


1.2.3.1 Checking Interface Traffic
Step 1 Check brief information about interface traffic.
Run the display interface brief command to check traffic on all interfaces. If values of InUi
and OutUi of an interface gradually increase to the interface rate limit, a loop is occurring on
the interface.
First query:
<HUAWEI> display interface brief | include up

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 4


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

PHY: Physical
*down: administratively down
^down: standby
~down: LDT down
#down: LBDT down
(l): loopback
(s): spoofing
(E): E-Trunk down
(b): BFD down
(e): ETHOAM down
(dl): DLDP down
(d): Dampening Suppressed
(ld): LDT block
(lb): LBDT block
InUti/OutUti: input utility/output utility
Interface PHY Protocol InUti OutUti inErrors outErrors
Ethernet0/0/0/0 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/2 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/16 up up 0.01% 0.56% 0.56% 0
GigabitEthernet1/0/12 up up 0.01% 0.56% 0.56% 0

Last query:
<HUAWEI> display interface brief| include up

PHY: Physical
*down: administratively down
^down: standby
~down: LDT down
#down: LBDT down
(l): loopback
(s): spoofing
(E): E-Trunk down
(b): BFD down
(e): ETHOAM down
(dl): DLDP down
(d): Dampening Suppressed
(ld): LDT block
(lb): LBDT block
InUti/OutUti: input utility/output utility
Interface PHY Protocol InUti OutUti inErrors outErrors
Ethernet0/0/0/0 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/2 up up 0.01% 0.01% 0 0
GigabitEthernet0/0/16 up up 76% 76% 0 0
GigabitEthernet1/0/12 up up 76% 76% 0 0

Compare the displayed network traffic with the service traffic collected when network
services are normal. You can obtain the service traffic bandwidth from the network
monitoring diagram on the NMS.
 Determine whether a loop has occurred:
− If the current network traffic volume is much higher than service traffic volume in
normal situation, a Layer 2 loop may be occurring.
− If current traffic volume is normal and broadcast suppression is not configured (the
broadcast-suppression { percent-value | cir cir-value [ cbs cbs-value ] | packets

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 5


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

packets-per-second } command is not run on interfaces), no Layer 2 loop has


occurred.
− If the current network traffic volume is higher than service traffic in normal
situation and broadcast suppression is deployed, go to 1.2.3.2Checking MAC
Address Flapping.
 Check the loop based on the number of interfaces that have a large amount of traffic as
well as the outbound and inbound traffic on the interface as follows:
− If only one interface on a switch has a large amount of inbound and outbound
traffic, there may be a loop on this interface (Loopback on a Single Port).
− If two interfaces on one switch have a large amount of traffic, there may be a loop
on this interface (Dual-Port Loop Causes Protocol Flapping).
− If only one interface on a switch has a large amount of inbound or outbound traffic,
a loop may be caused by upstream or downstream device (Loopback on the
Downstream Device).
If you suspect that a loop has occurred, determine whether the loop has occurred as
follows:
Step 2 View details about the interface where traffic volume is abnormal.
To avoid impact of historical packet statistics, run the reset counters interface [ interface-type
[ interface-number ] ] command in the user view to clear historical statistics on the interface.
This command is performed with customer's permission.
Run the display interface [ interface-type [ interface-number ] ] command in any view or run
the display this interface command in the interface view to check the current running status
of the interface. View the Broadcast and Multicast fields to determine whether there are a
large number of broadcast and multicast packets in the inbound and outbound directions.
If an interface has much more broadcast and multicast packets than other interfaces, there is a
high probability that a loop has occurred. If not, perform 1.2.3.2Checking MAC Address
Flapping.
<HUAWEI> display interface XGigabitEthernet 2/0/1

XGigabitEthernet2/0/1 current state : UP


Line protocol current state : UP
Description:
Switch Port, PVID : 1, TPID : 8100(Hex), The Maximum Frame Length is 9216
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 00e0-fc01-38f9
Last physical up time : 2010-12-15 14:24:47 UTC-03:00
Last physical down time : 2010-12-15 14:23:54 UTC-03:00
Current system time: 2010-12-15 19:13:33-03:00
Port Mode: COMMON FIBER, Transceiver: 10GBASE_SR_SFP
Speed : 10000, Loopback: NONE
Duplex: FULL, Negotiation: DISABLE
Mdi : NORMAL, Flow-control: DISABLE
Last 300 seconds input rate 1432 bits/sec, 1 packets/sec
Last 300 seconds output rate 1272 bits/sec, 1 packets/sec
Input peak rate 12390544 bits/sec, Record time: 2010-12-15 14:31:02
Output peak rate 11624 bits/sec, Record time: 2010-12-15 14:24:48

Input: 3804624 packets, 578247577 bytes


Unicast: 3784099, Multicast: 20524
Broadcast: 1, Jumbo: 0
Discard: 0, Pause: 0

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 6


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Frames: 0

Total Error: 0
CRC: 0, Giants: 0
Jabbers: 0, Fragments: 0
Runts: 0, DropEvents: 0
Alignments: 0, Symbols: 0
Ignoreds: 0

Output: 19396 packets, 2796198 bytes


Unicast: 9, Multicast: 19386
Broadcast: 1, Jumbo: 0
Discard: 0, Pause: 0

Total Error: 0
Collisions: 0, ExcessiveCollisions: 0
Late Collisions: 0, Deferreds: 0
Buffers Purged: 0

Input bandwidth utilization threshold : 90.00%


Output bandwidth utilization threshold: 90.00%
Input bandwidth utilization : 0%
Output bandwidth utilization : 0%

----End

1.2.3.2 Checking MAC Address Flapping


MAC address flapping occurs when a MAC address is learned by two interfaces in the same
VLAN. The MAC address entry learned later overwrites the earlier one.
MAC address flapping may be caused by a network loop or a network attack from
unauthorized users. Therefore, MAC address flapping may not be caused by a loop, but a loop
will absolutely cause MAC flapping.
In Figure 2.1, when Switch A simultaneously sends packets in two directions, two interfaces
on Switch B receive the packets. If MAC address flapping is occurring on the two interfaces
of SwitchB, a loop may also occur on the two interfaces.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 7


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 2.1 MAC address flapping

All fixed and modular switches of all versions support MAC address flapping prevention
configurations including alarm generation and interface blocking upon MAC address
flapping.
MAC address flapping detection commands and alarms differ for fixed and modular switches
of different versions.
1. Modular switches:
− On a switch running V100R002, global MAC address flapping detection can take
effect only on non-S series boards. In addition, when detecting a MAC address
flapping, the switch can only send a trap. Run the following command to enable
MAC address flapping detection:
[HUAWEI] mac-flapping alarm enable
− In V100R003 and later versions, the switch supports VLAN-based MAC address
flapping detection and can perform actions when MAC address flapping is detected.
Run the following commands in the system or VLAN view to enable MAC address
flapping detection:
가 System view:
[HUAWEI] loop-detect eth-loop alarm-only
나 VLAN view:
<HUAWEI> system-view
[HUAWEI] vlan 10
[HUAWEI-vlan10] loop-detect eth-loop alarm-only
After enabling MAC address flapping detection, run the display trapbuffer command to
view MAC address flapping traps (OID: 1.3.6.1.4.1.2011.5.25.160.3.7 or OID:
1.3.6.1.4.1.2011.5.25.42.2.1.7.12). Table 1.1 describes MAC address flapping detection
traps in different versions.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 8


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Table 1.1 MAC address flapping detection traps on modular switches of different versions
Version Trap

V100R002 Global detection L2IF/4/MAC_FLAPPING_


ALARM:OID
1.3.6.1.4.1.2011.5.25.42.2.1.
7.12The mac-address has
flap value .
(BaseTrapSeverity=0,
BaseTrapProbableCause=0,
BaseTrapEventType=4,
L2IfPort=549,entPhysicalIn
dex=1, MacAdd=0000-
0000-002b,vlanid=1001,
FormerIfDescName=Ethern
et3/0/2,CurrentIfDescName
=Ethernet3/0/3,DeviceName
=S9306-169)
VLAN-based detection Not supported
V100R003 Global detection L2IFPPI/4/MAC_FLAPPIN
G_ALARM:OID
1.3.6.1.4.1.2011.5.25.42.2.1.
7.12The mac-address has
flap value .
(L2IfPort=0,entPhysicalInde
x=0, BaseTrapSeverity=4,
BaseTrapProbableCause=54
9, BaseTrapEventType=1,
MacAdd=00e0-fc00-
4447,vlanid=1001,
FormerIfDescName=Gigabit
Ethernet6/0/6,CurrentIfDesc
Name=GigabitEthernet6/0/7,
DeviceName=9306-
222.159)
VLAN-based detection L2IFPPI/4/MFLPVLANAL
ARM:OID
1.3.6.1.4.1.2011.5.25.160.3.
7 Loop exist in vlan 1001,
for mac-flapping.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 9


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Version Trap

V100R006 and later Global detection L2IFPPI/4/MAC_FLAPPIN


versions G_ALARM:OID
1.3.6.1.4.1.2011.5.25.42.2.1.
7.12The mac-address has
flap value.
(L2IfPort=0,entPhysicalInde
x=0, BaseTrapSeverity=4,
BaseTrapProbableCause=54
9, BaseTrapEventType=1,
MacAdd=0025-9e6e-
1c55,vlanid=1001,
FormerIfDescName=Gigabit
Ethernet2/1/23,CurrentIfDes
cName=GigabitEthernet2/1/
22,DeviceName=9303-
222.157)
VLAN-based detection L2IFPPI/4/MFLPVLANAL
ARM:OID
1.3.6.1.4.1.2011.5.25.160.3.
7 Loop exists in vlan 1001,
for flapping mac-address
0025-9e6e-1c55 between
port GE2/1/23 and port
GE2/1/22.

2. Fixed switches:
Fixed switches (excluding the S2300/S2700) of V100R003 and later versions do not
support global MAC address flapping detection. They support only VLAN-based MAC
address flapping detection and actions such as sending traps and blocking interfaces
when MAC address flapping is detected. Run the following commands to enable MAC
address flapping detection:
VLAN view:
<HUAWEI> system-view
[HUAWEI] vlan 10
[HUAWEI-vlan10] loop-detect eth-loop alarm-only
After enabling MAC address flapping detection, run the display trapbuffer command to
view MAC address flapping traps (OID: 1.3.6.1.4.1.2011.5.25.160.3.7 or OID:
1.3.6.1.4.1.2011.5.25.42.2.1.7.12). Table 1.2 describes MAC address flapping detection
traps in different versions.

Table 1.2 MAC address flapping detection traps on fixed switches of different versions
Version Trap

V100R003 L2IF/4/MFLPPORTRESUME:OID
1.3.6.1.4.1.2011.5.25.160.3.7 Loop exist in
vlan
for(hwMflpVlanId:"[1001]";hwMflpVlanCf
gAlarmReason:"[for flapping mac-address

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 10


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Version Trap

0000-0000-002b between port GE0/0/24 and


port GE0/0/23]")
V100R005 L2IFPPI/4/MFLPVLANALARM:OID
1.3.6.1.4.1.2011.5.25.160.3.7 Loop exists in
vlan 1001, for flapping mac-address 0000-
0000-002b between port GE0/0/24 and port
GE0/0/23.
V100R006 L2IFPPI/4/MFLPVLANALARM:OID
1.3.6.1.4.1.2011.5.25.160.3.7 Loop exists in
vlan 1001, for flapping mac-address 0000-
0000-002b between port GE0/0/24 and port
GE0/0/23.
V200R001 L2IFPPI/4/MFLPVLANALARM:OID
1.3.6.1.4.1.2011.5.25.160.3.7MAC move
detected, VlanId = 1001, flapping mac-
address 0000-0000-002b between port
GE0/0/24 and port GE0/0/23.
V200R002 L2IFPPI/4/MFLPVLANALARM:OID
1.3.6.1.4.1.2011.5.25.160.3.7 MAC move
detected, VlanId = 1001, flapping mac-
address 0000-0000-002b between port
GE0/0/24 and port GE0/0/23.
V200R003 and later versions L2IFPPI/4/MFLPVLANALARM:OID
1.3.6.1.4.1.2011.5.25.160.3.7 MAC move
detected, VlanId = 1001, flapping mac-
address 0000-0000-002b between port
GE0/0/24 and port GE0/0/23.

1.2.3.3 Checking Whether a Loop Exists After Loop Detection Is Configured


Both loop detection and loopback detection periodically send dedicated detection packets
through interfaces, and then check whether the packets are returned (the sending and
receiving interfaces may be different).
 If detection packets are sent and received by the same interface, a loopback occurs on the
interface or a loop occurs on the network or device connected to the interface.
 If detection packets are received by another interface on the same device, a loop occurs
on the network or device connected to the interface.
The modular and fixed switches have the following differences in supporting loop detection"
 Loop Detection
Only modular switches support loop detection. When loop detection is configured on an
interface of a modular switch, the switch sends detection packets to detect loops in the
loop detection-enabled VLAN to which the interface belongs. If the switch receives the
detection packets sent from itself, a loop occurs on the network.
Enable loop detection:

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 11


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

[HUAWEI] loop-detection enable


[HUAWEI] loop-detection enable vlan { { vlan-id1 [ to vlan-id2 ] } & <1-10> | all }
After loop detection is enabled, run the display loop-detection command to check the loop
detection status.
<HUAWEI> display loop-detection
Loop Detection is enable.
Detection interval time is 5 seconds.
Following vlans enable loop-detection:
vlan 556
Following ports are blocked for loop:
NULL
Following ports are shutdown for loop:
NULL
Following ports are nolearning for loop:
NULL

Run the display loop-detection interface command to check the status of a specified port.
<Quidway> display loop-detection interface gigabitethernet 1/0/0
The port is enable.
The port's status list:
Status WorkMode Recovery-time EnabledVLAN
-----------------------------------------------------------------------
Normal Shutdown 200 556

Table 1.3 describes traps in different versions.

Table 1.3 Loop detection traps on modular switches


Version Trap

V100R002 LDT/4/DetectLoop:OID: 1.3.6.1.4.1.2011.5.25.174.3.1 InterfaceIndex: 12


InterfaceName: Ethernet3/0/1 VlanListLow: VlanListHigh:, The port
detected loop!
V100R003 LDT/4/DetectLoop:OID: 1.3.6.1.4.1.2011.5.25.174.3.1 InterfaceIndex: 7
InterfaceName: GigabitEthernet6/0/1 VlanListLow: 1000 VlanListHigh:
none, The port detected loop!
V100R006 LDT/4/DetectLoop:OID: 1.3.6.1.4.1.2011.5.25.174.3.1 The port detected
loop. (InterfaceIndex: 14 InterfaceName: GigabitEthernet1/0/1 VlanListLow:
1000 VlanListHigh: none)
V200R001 LDT/4/DETECTLOOP:OID 1.3.6.1.4.1.2011.5.25.174.3.1 The port detected
and later loop. (InterfaceIndex: 87 InterfaceName: Ethernet1/0/10 VlanListLow: 10
versions VlanListHigh: none)

 Loopback Detection
Fixed switches of all versions support loopback detection, and modular switches of
V200R001 and later versions support loopback detection.
After loopback detection is configured on a port, the port starts to send detection packets.
In a version before V200R003, the switch can detect a loop only when the detection
packets are sent and received by the same port. In V200R003 and later versions,

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 12


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

loopback detection allows the switch to detect a loop even if the detection packets are
sent and received by different ports.
Enable loop detection:
[HUAWEI] loopback-detect enable
[HUAWEI] loopback-detect packet vlan { vlan-id1 [ to vlan-id2 ] } &<1-8>
After loopback detection is enabled, run the display loopback-detect command to view
the configuration and port status.
<Quidway> display loopback-detect
Loopback-detect is enabled in the system view
Loopback-detect interval: 30
Loopback-deteck sending-packet interval: 5
Interface ProtocolID RecoverTime Action Status
-------------------------------------------------------------------------------
-
GigabitEthernet0/0/2 602 30 block NORMAL

The traps vary according to software versions. Table 1.4 provides loopback detection
trap messages in different versions.

Table 1.4 Loopback detection trap messages in different versions


Version Trap Message

V100R003 LDT/4/Porttrap:OID 1.3.6.1.4.1.2011.5.25.174.3.3Loopback does exist on


interface(27)GigabitEthernet0/0/22 ( VLAN 1000 ) , loopback detect status:
2.(1:normal; 2:block; 3:shutdown; 4:trap; 5:nolearn)
V100R006 LDT/4/Porttrap:OID 1.3.6.1.4.1.2011.5.25.174.3.3Loopback does exist on
interface(27)GigabitEthernet0/0/22 ( VLAN 1000 ) , loopback detect status:
2.(1:normal; 2:block; 3:shutdown; 4:trap; 5:nolearn)
V200R001 LBDT/4/PORTTRAP:OID 1.3.6.1.4.1.2011.5.25.174.3.3 Loopback does
and later exist on interface(97)XGigabitEthernet1/0/44 ( VLAN 1000 ) , loopback
versions detect status: 3.(1:normal; 2:block; 3:shutdown; 4:trap; 5:nolearn; 6:quitvlan)

Loop detection or loopback detection cannot be configured on upstream ports, because the
switch will be out-of-management if the upstream port is blocked. Tell the customers about
the risks of the configuration and obtain permission of the customers.

1.2.3.4 Checking the CPU Usage


When a switch has a high CPU usage, the CPU usage value in the display cpu-usage
command output is large, for example, the CPU usage exceeds 70% or the trap
basetrap_1.3.6.1.4.1.2011.5.25.129.2.4.1 hwCPUUtilizationRisingAlarm (the CPU usage
exceeds 90%) is reported.
Generally, the high CPU usage caused by software tasks will not remain for a long time.
Therefore, if the high CPU usage remains for five minutes, an attack or abnormality may
occur. In this situation, you need to view the running tasks and find out the task consuming
much CPU resource.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 13


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

To view CPU usage, run the display cpu-usage command in any view.
 If the PPI task has a high CPU usage, there is a high possibility that a loop occurs.
<HUAWEI> display cpu-usage

CPU Usage Stat. Cycle: 60 (Second)

CPU Usage : 91% Max: 96%

CPU Usage Stat. Time : 2015-12-15 22:01:30

CPU utilization for five seconds: 10%: one minute: 10%: five minutes: 10%

Max CPU Usage Stat. Time : 2015-12-15 14:24:08.

TaskName CPU Runtime(CPU Tick High/Tick Low) Task Explanation

VIDL 9% 8/cd0e39ff DOPRA IDLE

OS 9% 0/e14e38fe Operation System

bcmCNTR.0 1% 0/2d4b39e1 tS16

......
PPI 70% 0/ 512f8c PPI Product Process
Interface
......

 If the CPU usage of the PPI task is normal, run the display cpu-defend statistics [
packet-type packet-type ] { all | slot slot-id | mcu } command to check whether protocol
packets are discarded by CPCAR. If so, a loop may have occurred; otherwise, find out
the cause.
For example, to view VRRP packet statistics, run the display cpu-defend vrrp statistics
all. When information similar to the following is displayed, VRRP packets have been
lost because of a loop.
<HUAWEI> display cpu-defend vrrp statistics all
Statistics on mainboard:
-------------------------------------------------------------------------------
Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)
-------------------------------------------------------------------------------
vrrp 0 0 0 0
-------------------------------------------------------------------------------
Statistics on slot 1:
-------------------------------------------------------------------------------
Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)
-------------------------------------------------------------------------------
vrrp 0 0 0 0
-------------------------------------------------------------------------------
Statistics on slot 4:
-------------------------------------------------------------------------------
Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)
-------------------------------------------------------------------------------

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 14


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

vrrp 79880066214 2581617736 1174644777 37950869


-------------------------------------------------------------------------------

1.3 How to Quickly Remove a Loop


A loop on an Ethernet network leads to a data storm in a short period of time. When traffic
volume on an interface reaches the maximum load, link congestion may occur, affecting
services on the Ethernet network. Therefore, when determining a loop on the live network,
perform the following steps immediately to recover data service:
Step 1 Obtain the network topology and determine whether a loop has occurred.
A ring network topology is complex. Obtain the overall network topology, VLAN plan,
device name, system MAC address, management IP address, local interface name, and remote
interface name.
Complete topology information helps remove loops. If no topology is available, manually
draw a complete topology by starting from the device where the loop is detected and
recording device, interface, and VLAN information of each hop.
For details about how to locate a loop, see 1.2How to Detect a Loop.
Step 2 Remove the loop manually.

Do not affect the intermediate devices, ports, and VLANs related to remote login; otherwise,
the switch may be out-of-management or cannot be accessed.

Manual loop removal is required when a network storm seriously affects services and services
need to be restored as soon as possible. Three manual loop removal methods are available:
 Remove a port from the VLAN where the loop is detected.
This method has little impact on the network. Table 1.1 describes commands to be
executed on ports of different types.

Table 1.1 Commands used to remove ports from looping VLANs


Port Type Command Remarks

Access undo port default vlan This command may affect services
on the downstream device. Use it
with caution.
Trunk undo port trunk allow-pass vlan id None.
Hybrid undo port hybrid vlan id After this command is executed, the
port treats tagged and untagged
packets in the same way.

This method is used in the following cases:

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 15


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.4.1Services Are Interrupted When Ports Are Not Deleted from VLAN 1
1.6.4.6RRPP Multi-Instance Causes a Temporary RRPP Loop
 Shut down the port where the loop is generated.
This method can be used to remove a loop.
Before running the shutdown command in the interface view, ensure that data service
will not be affected. That is, the devices can communicate with each other in all VLANs.
This method is used in the following cases:
1.6.2.2Incorrect Device Connections Cause Broadcast Storm
1.6.3.3Network Construction Causes a Loop
 Remove the optical fiber from the port where the loop is occurring.
This method can be used to remove a loop.
This method is similar to shutting down the port where the loop is occurring, and is used
only when you cannot log in to the switch.
This method is used in the following case:
1.6.2.1RRPP Does Not Take Effect on an S9300 Because a Board Is Loose
Step 3 Check whether services are recovered.
Test communication quality by performing the operations such as ping and check whether
services are recovered.
Generally, there are redundant links and configurations in a ring topology; therefore, services
can automatically recover after the loop is eliminated.
----End

1.4 How to Find Out the Root Cause


1.4.1 Checking Whether the Loop Is Caused by Recent
Construction or Configuration Modification
Step 1 Check whether loops are caused by recent construction.
If loops are caused by construction, confirm with the construction personnel and learn about
the construction process, especially new lines. Determine the physical loop based on the
topology.
In the following cases, loops are caused by improper networking or operation:
1.6.3.1Improper Server Networking Causes MAC Address Flapping
1.6.3.3Network Construction Causes a Loop
1.6.3.2Incorrect Device Connection Triggers Root Protection
Step 2 Check whether loops are caused by recent configuration modification.
Table 1.1 describes the commands used in configuration modification that cause loops.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 16


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Table 1.1 Commands used in configuration modification that cause loops


Feature Command Cause Solution

Interface undo shutdown A port enters the Run the shutdown


management forwarding state. command in the
interface view to
shut down the port
or configure a loop
prevention protocol.
STP bpdu enable The bpdu enable Configure the bpdu
command is not enable command. In
used on a fixed V001R006 and later
switch. A fixed versions, the bpdu
switch can receive enable command is
and process STP used by default.
BPDUs only when
the bpdu enable
command is
configured on its
interface.
The bpdu disable Run the bpdu
command is not run disable command.
on a modular switch.
A modular switch
will transparently
transmit STP
BPDUs if this
command is not
used.
bpdu bridge enable This command Run the undo bpdu
configures a switch bridge enable
to transparently command.
transmit STP
BPDUs.
bpdu-tunnel stp This command Undo the command.
bridge role disables a switch
provider from processing STP
BPDUs.
RRPP rrpp enable If the rrpp enable Run the rrpp enable
command is not command.
used globally, the
blocked port cannot
be calculated.
SmartLink smartlink enable If the smartlink Run the smartlink
enable command is enable command.
not used, the
blocked interface
cannot be calculated.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 17


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

The loops in the following cases are caused by incorrect configurations.


1.6.4.10loopback internal Causes a Loop
1.6.4.11Services Are Interrupted After Smart Link Master and Slave Interfaces Are Switched
1.6.3.3Network Construction Causes a Loop
1.6.5.4STP Convergence Is Abnormal When an S9300 Interface Processes BPDUs
1.6.4.2Services Are Interrupted Because BPDU Is Not Enabled on Switch Ports
1.6.4.7RRPP Master Node's Working Mode Is Different from That of Transit Nodes, Which
Makes MAC Entries Fail to Be Updated
1.6.5.7Unicast Suppression Causes RRPP Flapping for One Hour
----End

1.4.2 Checking Whether a Typical Loop Issue Occurs


Loopback on a Single Port
During network deployment, a loopback usually occurs on a Tx-Rx interface because optical
fibers are connected incorrectly or the interface is damaged by high voltage. In Figure 1.1, a
self-loop occurs on an interface of the switch. As a result, packets sent from this interface are
looped back to the interface, which may cause traffic forwarding errors or MAC address
flapping on the interface.

Figure 1.1 A self-loop occurs on a switch

Prerequisite: No loop prevention protocol such as STP or LDT is configured on a switch.


Symptom: Traffic volume increases continuously in the inbound and outbound directions of
an interface.
Cause: A loopback occurs on an interface or a link.
Handling Method:
a. Disable internal loopback on the interface.
b. The following loop often occurs on a link:
Packets sent by an interface are received by the same interface, and a loop occurs between
two interfaces of a device.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 18


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

The loop is often caused by incorrect connections of fibers or network cables. To solve the
problem, correctly connect fibers or network cables.
In the following cases, loopback on a single interface occurs:
1.6.4.10loopback internal Causes a Loop
1.6.5.9Services on the RRPP Ring Consisting of CX600 and S3300 Are Interrupted

Loopback on the Downstream Device


In Figure 1.1, a loop occurs on the network or device connected to the switch. Packets sent
from Interface 1 are sent back through the downstream network or device.

Figure 1.1 Loopback occurs on the downstream device

Prerequisite: No loop prevention protocol such as STP or LDT is configured on a switch, and
the local device does not have a loop.
Symptom: Traffic volume increases continuously in the inbound and outbound directions of
an interface, and a loop occurs on the downstream device.
Cause: A loopback occurs in the downlink or a self-loop occurs.
Handling Method:
a. Search for the link where a loop occurs hop by hop.
b. Disable internal loopback on an interface of the downstream device.
c. The following loops are caused by link loops:
Packets sent by an interface on the downstream device are received by the same interface, and
a loop occurs between two interfaces of a downstream device. The loop is often caused by
incorrect connections of fibers or network cables. To solve the problem, correctly connect
fibers or network cables.
In the following cases, loopback occurs on the downstream device:
1.6.1.1MAC Address Flapping Occurs on a Non-Huawei Device

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 19


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.5.5STP Flapping Occurs Because the STP Timeout Interval on the ATAE Device Is
Incorrectly Calculated

Dual-Port Loop Causes Protocol Flapping


In Figure 1.1, a loop occurs on the network where the switch is located or between two
interfaces of the switch. Packets sent from Interface 1 are looped back to Interface 2.

Figure 1.1 Dual-port loop occurs

Prerequisite: A loop prevention protocol such as STP, RRPP, SEP, and SMLK is configured.
Symptom: Network convergence temporarily becomes invalid, or flapping remains
continuously.
Cause: Link flapping occurs, causing protocol packet forwarding failure and frequently
flapping due to timeout. For example:
 Packets over the link are lost or error packets occur. That is, protocol packets are
discarded.
 Protocol packets are discarded due to unknown unicast suppression or improper QoS
configuration.
Handling Method:
 If error packet or packet loss occurs, replace the problematic network cable, fiber, or
optical module.
 If packets are discarded to due the suppression function, modify the suppression and QoS
configurations.
 Check whether network congestion causes protocol packet loss, and interfaces are
unblocked due to protocol timeout, to form a loop. If this problem occurs, the network
needs to be optimized.
In the following cases, a loop occurs between two ports, causing protocol flapping:
1.6.3.3Network Construction Causes a Loop
1.6.7.2The OSPF Neighbor Relationship Is Down Due to a Loop on the S Switch
1.6.5.1A Large Number of TC BPDUs Cause an ARP Learning Error on a Modular
Switch
1.6.5.8Unknown Unicast Suppression Causes RRPP Flapping
1.6.5.7Unicast Suppression Causes RRPP Flapping for One Hour

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 20


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Abnormal Packet Forwarding on the Downstream Device Causes Pseudo Loop


The preceding loops are caused by improper network structure or configurations. On the live
network, the loop may also be caused by special packet forwarding process on a single
product or caused by interconnection with third-party products, for example, traffic burst,
MAC flapping alarm, and protocol packet congestion. These loops are called pseudo loops,
for example, Figure 1.1.

Figure 1.1 A loop occurs when a switch connects to a third-party device

Prerequisite: Layer 2 network convergence is normal, and blocked port status is correctly
delivered.
Symptom: MAC flapping alarms are frequently generated on LSW 3. It is suspicious that a
loop has occurred.
Cause: When the Layer 2 edge devices, such as STB, from some vendors cannot process
packets, they send the packets back.
Handling Method: Replace the edge devices.
In the following cases, the pseudo loop is caused by abnormal packet forwarding on the
downstream device:
1.6.1.2ATAEs Fail to Interwork with MSTP-enabled Switches Due to a Software Problem
1.6.5.10ERPS Becomes Invalid When RTN Interconnects with an S Switch

1.4.3 Collecting Information


If the problem persists after the operations in 1.4.1Checking Whether the Loop Is Caused by
Recent Construction or Configuration Modification and 1.4.2Checking Whether a Typical
Loop Issue Occurs are performed, the switch may have a software or hardware failure. Collect
related information and provide information to R&D engineers.
Compared with a single-device failure, a loop may affect multiple devices or even the entire
network. Table 1.1 lists the reference information, collection method, and requirements.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 21


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Table 1.1 Information collection


No. Loop Mandatory Value How to Content
Related ? Collect

1 Network Yes Help R&D Ask the Device


topology engineers customer for names, MAC
understand the network addresses of
network diagram or devices and
services and draw one. interfaces,
confirm The devices interface
network at each layer connections,
topology. must be and VLAN
clear. plan
2 Login Suggested Help R&D Ask the Device
method engineers customer for names, IP
remotely log the login addresses,
in to the method or user names
switch to make a and
check table. passwords,
running and roles on
status. the network
3 Initial Suggested Show the Provide the Suspected
conclusion frontline information. module,
analysis evidence,
progress and and initial
determine verification
the problem record
locating
direction.
4 All Yes Help R&D Collect All
configuratio engineers be information configuratio
ns familiar with from the ns on
entire devices one devices
network by one.
configuratio
n and
simulate the
network
environment
in the lab.
5 Operation Yes Allow R&D Use software Operation
record engineers to to record start time
check online recorded in
whether the operations. file names
problem is
relevant to
the factors
such as
operation
procedure,
commands,
and feature

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 22


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

No. Loop Mandatory Value How to Content


Related ? Collect

application.
6 Log Suggested Help R&D Obtaining Collect the
engineers files through logs
check FTP or recorded
whether the TFTP from 24
problem is hours before
caused by the problem
unknown occurs till
reasons. now.
7 Diagnostic Suggested Help R&D Obtaining Collect the
log engineers files through diagnostic
check FTP or logs
whether the TFTP recorded
problem is from 24
caused by hours before
unknown the problem
reasons. occurs till
now.
8 STP STP issue Allow R&D Collecting Run the
calculation Optional engineers to information display stp
historical analyze using history
records (if protocol commands command in
enabled) calculation on each the hidden or
process. device diagnostic
view on each
device.
9 display Suggested The device- Collecting Complete
diagnostic- level information information
information diagnostic using on each
(optional, information commands device
about 3 allows R&D on each
minutes per engineers to device
device) exclude
unknown
reasons and
issues.

1.5 Hardening and Optimizing the Network


Step 1 Configure loop prevention protocol.
If the loop is caused by a physical loop, configure a loop prevention protocol. Common used
loop prevention protocols include STP/RSTP/MSTP/VBST, RRPP, SEP, and ERPS. For
details, see the Configuration Guide.
Step 2 Improve link quality and reliability.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 23


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

If physical link quality is poor, a loss of protocol packets will cause a temporary loop. Check
the link and replace the fiber or optical module.
If protocol packets are discarded due to insufficient bandwidth, expand bandwidth or
configure link aggregation to improve link reliability.
Step 3 Configure broadcast suppression to improve network robustness.
To prevent loops from occurring again, configure broadcast suppression on the ports on the
ring. Based on experience, setting the broadcast suppression rate to 5% can effectively
prevent broadcast storm. You can also set the suppression rate according to the concurrent
broadcast traffic volume on the live network.
Step 4 Configure QoS to ensure preferential forwarding of protocol packets.
If protocol packets cannot be promptly forwarded due to network congestion, configure QoS
to ensure a high priority of protocol packets.
Case:
1.6.5.9Services on the RRPP Ring Consisting of CX600 and S3300 Are Interrupted
Step 5 Optimize the network structure.
Plan the access layer and aggregation layer properly.
If too many devices are located at one layer, allocate them into different domains according to
logical organization and physical locations.
Cases:
1.6.2.2Incorrect Device Connections Cause Broadcast Storm
1.6.3.1Improper Server Networking Causes MAC Address Flapping
----End

1.6 Typical Loop Troubleshooting Cases


This section classifies loop faults according to causes.

1.6.1 Interconnection Problem


1.6.1.1 MAC Address Flapping Occurs on a Non-Huawei Device

Involved Products and Versions


S switches of V200R002 and earlier versions

Network Diagram
In Figure 1.1, a firewall connects to three switches.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 24


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 MAC address flapping occurs on a non-Huawei device

Symptom
MAC address 00e0-fc09-bcf9 flaps on the firewall, affecting service forwarding.

Cause Analysis
On Huawei switches, only NDP uses the MAC address 00e0-fc09-bcf9 as the source MAC
address of protocol packets. NDP is enabled by default. Therefore, the firewall reports MAC
address flapping in this scenario, which affects service forwarding on the firewall. Usually,
such MAC address flapping does not affect services (unless an action is configured for MAC
address flapping on the device).
The NDP packets are BPDUs. In the latest version, neither the switch or the firewall learns
MAC addresses from BPDUs.

Handling Procedure
Run the ndp disable command to disable NDP globally.

Conclusion and Suggestion


None.

1.6.1.2 ATAEs Fail to Interwork with MSTP-enabled Switches Due to a


Software Problem

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, ATAE devices, Switch-1, and Switch-2 form a square-shaped loop.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 25


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 ATAEs fail to interwork with MSTP-enabled switches

Symptom
After STP is enabled, STP convergence is abnormal. Both Switch-1 and ATAE-SW-8 are root
bridges; ports connecting the switches and ports connecting the ATAE devices are normally
converged. However, ports connecting Switch-1 and Switch-2 to the ATAE devices are not
normally converged.

Cause Analysis
Switch-1 is the root bridge, and its system MAC address is 4c1f-cc82-d659. The software
version of ATAE devices is V200R013SPC005, and this version has a software problem: it
cannot normally process STP packets whose root bridge MAC address ends with 59.

Handling Procedure
1. Check STP convergence on each port. The result shows that two root bridges exist.
Both Switch-1 and ATAE-SW-8 are STP root bridges.
<ATAE-SW-8> disply stp brief
MSTID Port Role STP State Protection
0 GigabitEthernet0/7 DESI FORWARDING BPDU
0 GigabitEthernet0/15 DESI FORWARDING NONE //ATAE
interconnection
0 GigabitEthernet0/18 DESI FORWARDING NONE //Connecting to
Switch-2

2. Check STP and packet sending information on GigabitEthernet0/18 of ATAE-SW-8.


No service is deployed on ATAE-SW-8. The number of incoming multicast packets
increases on GigabitEthernet0/18. However, output of the display stp command shows
that the number of MSTP packets received by the port is 0.
Input(total): 818962 packets, 114519592 bytes
757300 broadcasts, 24 multicasts

----[Port18(GigabitEthernet0/18)][FORWARDING]----
Port Protocol :enabled
Port Role :CIST Designated Port
Port Priority :128

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 26


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Port Cost(Dot1T ) :Config=auto / Active=10000


Desg. Bridge/Port :32768.80fb-06ad-6d07 / 128.18
Port Edged :Config=disabled / Active=disabled
Point-to-point :Config=auto / Active=true
Transit Limit :3 packets/hello-time
Protection Type :None
Port Stp Mode :Stp
Port Protocol Type :Config=auto / Active=legacy
PortTimes :Hello 2s MaxAge 20s FwDly 15s RemHop 20
BPDU Sent :82117
TCN: 0, Config: 3391, RST: 0, MST: 78726
BPDU Received :0
TCN: 0, Config: 0, RST: 0, MST: 0

3. Configure port mirroring on GigabitEthernet0/18 of ATAE-SW-8. The result shows that


ATAE-SW-8 has received STP packets from Switch-1.

R&D engineers of ATAE confirm that the faulty ATAE uses a switching unit running the
V200R013SPC005 version. This version has a known software problem: it cannot normally
process STP packets whose root bridge MAC address ends with 59. This problem is solved in
V200R013SPC006 and later versions. After the root bridge is switched to Switch-2, MSTP
convergence becomes normal.
<ATAE-SW-8> display version
VRP (R) Software, Version 3.10, RELEASE 0010
Copyright (c) 2000-2008 HUAWEI TECH CO., LTD.
uptime is 0 week,0 day,2 hours,38 minutes

OSTA 2.0 V200R013 CN21XCBA switch system


OSTA 2.0 V200R013 CN21XCBA switch version: V200R013SPC005

128M bytes SDRAM


16384K bytes Flash Memory
Config Register points to FLASH

Hardware Version is VER.A


Release Logic Version is 0x03
Back Board Hardware Version is VER.A

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 27


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Back Board Logic Version is 0x02


Back Board Type is CN21XCRA

Upgrade the switching unit version of ATAE to the latest version V200R013SPC007.

Conclusion and Suggestion


Plan the interconnection between devices of different models during the network deployment
stage.
If an STP-related fault occurs during interconnection between switches and other devices,
check the configuration and packet forwarding first.

1.6.1.3 RRPP Temporary Loop Occurs Because the Interface Up Time on the
Switch and CX600 Is Different

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1 and , the S5700 has RRPP enabled; the S5700_1 and S5700_2 are used as the
master nodes of RRPP domains 1 and 2 respectively; other S5700s function as transit nodes;
the CX600s are not enabled with RRPP and they use different VPLS VSIs to transparently
transmit RRPP packets and data packets.

Figure 1.1 RRPP temporary loop occurs because the interface Up time on the switch and router is
different

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 28


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Symptom
When the LPU in slot 1 of CX600_1 fails and CX600_1 restarts, the Up time of GE1/1/1 on
CX600_1 is 8s or even 1 minute longer than the Up time of GE0/0/1 on S5700_1. After the
faulty LPU restarts, a temporary loop occurs for several seconds, which may cause service
exceptions.

Cause Analysis
1. After the board on a CX600 restarts, the bottom-layer physical status becomes Up first
no matter whether the interface negotiation modes are forcible or auto-negotiation. If the
system finds that board configuration restoration is not finished, the system does not
report physical status Up to the software layer. The route interface goes Up after one
minute. Therefore, the interface Up time of the router is longer than that of the switch.
2. The switch interface goes Up first. RRPP unblocks interface 6s after the interface goes
Up. At this time, the router has not reported Up to the software layer. When the software
layer of router reports Up, some data VSIs start to transparently transmit data packets.
The RRPP VSI of the router may be enabled late or cannot transparently transmit packets
in a short time the VSI is enabled. The LPU of the CX600 is busy and RRPP VSI is not
enabled, so a temporary loop occurs. According to the service configuration on the LPU
of the CX600, a temporary loop may last for about 10s. If the intermediate switch
receives many packets, eliminating the loop may require longer time.

Handling Procedure
Optimize the CX600 so that the CX600 can rapidly report the Up event.

Conclusion and Suggestion


None.

1.6.2 Hardware Connection Problem


1.6.2.1 RRPP Does Not Take Effect on an S9300 Because a Board Is Loose

Involved Products and Versions


Modular switches

Network Diagram
In Figure 1.1, four S9300 switches form an RRPP ring. The slave port on the master node is
not blocked.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 29


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 RRPP ring network

Symptom
The slave port of the master node on the RRPP ring network is not blocked.

Cause Analysis
The HG port on the MPU does not forward RRPP packets because the board is loose.

Handling Procedure
1. It is suspected that the RRPP delivery is abnormal.
2. Run the display diagnostic-information command to collect device information. The
command output shows that the HG port is not in the control VLAN. There is a
possibility that packets are discarded because the channel is unstable.
3. If the channel is unstable, remove and reinstall the board. If the problem is fixed, the
fault is caused by improper board connection.
4. Packet forwarding is normal, and fault is fixed.

Conclusion and Suggestion


When the problem such as protocol delivery failure and traffic interruption occurs, check the
fibers, optical modules, and board connections. Replace the hardware components such as
optical module and board if there are spare components.

1.6.2.2 Incorrect Device Connections Cause Broadcast Storm

Involved Products and Versions


S switches of all versions

Fault Symptom and Network Diagram


In Figure 1.1, during the network deployment of a carrier, the planning is improper and
connections are complex. Incorrect connections affect network services.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 30


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 Incorrect connections cause broadcast storm

Root Cause
Connected interfaces between switches are often access interfaces, VLAN planning and
assignment are improper, and connections are complex. In this situation, incorrect connections
cause loops and the upper-layer core device is affected.

Identification Method
Focus on the solution or workaround.

Solution
1. Provide proper network plan and VLAN assignment, reduce unnecessary connections,
and enable storm control.
2. Review the network plan if the networking is complex.
3. During network deployment and commissioning, shut down all interfaces connected to
the live network.
4. When the interface connected to the live network is restored, check whether there is
unexpected broadcast or multicast traffic on the interface for at least 20 minutes. If an
exception is detected, shut down the uplink interface.
5. If the indicator of a switch interface blinks fast or is steady on, heavy traffic may be
transmitted on the interface. Check whether there are loops.

Conclusion
None.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 31


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.3 Networking and Configuration Change


1.6.3.1 Improper Server Networking Causes MAC Address Flapping

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, two servers have their NICs bound together and forward packets in load
balancing mode. The two NICs share the same IP address and MAC address.

Figure 1.1 MAC address flapping and ARP flapping on modular switches cause service
interruption

Symptom
MAC address flapping persists on a switch. ARP entries of the servers are learned to the
interconnected ports of the two switches. As a result, external access to the server is
intermittently interrupted.

Cause Analysis
1. The ports on two switches connected to servers alternates between Up and Down and
MAC address flapping occurs on servers. The ports between two switches and ports
connected to servers have learned the servers' MAC addresses.
2. When a user requests to access a server through Switch-1, Switch-1 searches for the
outbound interface according to MAC address entries. Due to MAC address flapping,
there are two outbound interfaces (downstream interface GE4/0/9y connected to server
and Eth-Trunk1 connected to another switch). If Eth-Trunk1 is selected, the packets are
sent to Switch-2. Switch-2 has learned the MAC address of server on the interface
connected to Switch-1, so Switch-2 discards the packets (depending on the Layer 2 loop
prevention mechanism).

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 32


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Handling Procedure
1. The two servers are bound together in load balancing mode but connected to two
independent switches. The network is not symmetrical. It is recommended that the two
servers be configured to work in active/standby mode. This configuration will solve the
problem of MAC address flapping.
2. If load balancing between servers and cross-device networking are required, you are
advised to configure CSS on the switches and load balancing on CSS links.

Conclusion and Suggestion


Consider the possibility of loops and plan the workaround measure.

1.6.3.2 Incorrect Device Connection Triggers Root Protection

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, two S series switches and the ATAE switching boards form an STP ring. The
two ATAE switching boards can be considered as two switches that are connected through
GE0/15 ports. Swtich-1 is the root bridge and Swtich-2 is the backup root bridge. Eth-Trunk 0
is created between Switch-1 and Switch-2. In normal situations, GE0/19 of ATAE slot8 is the
blocked port. Switch-1 and Switch-2 have VRRP enabled and function as gateways of the
ATAE switching boards.

Figure 1.1 Root protection interrupts services

Symptom
When a network fault occurs, service traffic sent by the ATAE switching boards is interrupted.
Services are temporarily recovered after Swtich-1 is powered off.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 33


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Cause Analysis
On Switch-1, root protection is enabled on the ports connected to Switch-2 and ATAE slot7.
After an O&M switch with a higher priority is incorrectly connected to the network, root
protection takes effect. All ports that have root protection enabled are blocked and services are
interrupted.

Handling Procedure
After the fault occurs, check the VRRP state on Switch-1 and Switch-2. Both of the two
switches are the Master, indicating that VRRP heartbeat packet forwarding is faulty.
Normally, VRRP heartbeat packets are forwarded through the Eth-Trunk between the two
switches. If the Eth-Trunk negotiation fails after the fault occurs, STP reconverges and
heartbeat packets are forwarded through the ATAE switch board.
Power on Switch-1 but do not connect it to the network. Check the configuration file of
Switch-1. The configuration file shows that STP root protection (stp root-protection) is
enabled on all ports in Up state. After receiving STP BPDUs with a higher priority, the ports
enter the Discarding state and stop forwarding packets. Because Switch-1 is restarted, it is
unknown whether Switch-1 receives packets with a higher priority when the fault occurs.
Analyze the STP history calculation information of the ATAE switching board.
According to the STP history calculation information, GE0/19 of ATAE slot8 receives STP
BPDUs from the device whose MAC address is 000f-e2f6-1d18 and the priority is 0,
triggering STP recalculation.
GigabitEthernet0/19 Alte->Desi at 2011/10/29 04:38:06
{0.5489-98f5-26bf 18 4096. 5489-98f5-834d 0 4096. 5489-98f5-834d 128.18}

GigabitEthernet0/17 Desi->Root at 2011/10/29 04:38:06


{0.000f-e2f6-1d18 0 0.000f-e2f6-1d18 0 0.000f-e2f6-1d18 128.16}

GigabitEthernet0/15 Root->Desi at 2011/10/29 04:38:06


{0. 5489-98f5-26bf 20000 32768.0018-8200-5428 0 32768.0018-8200-5428 128.14}

STP selects the root bridge according to the bridge ID (the bridge priority and MAC address).
When two devices have the same bridge priority, the device with a smaller system MAC
address has a smaller bridge ID and a higher priority. When the fault occurs, ATAE slot8
receives STP BPDUs with a higher priority (0.000f-e2f6-1d18) than the priority (0.000f-e2f6-
26bf) of the original root bridge Switch-1. As a result, ports configured with STP root
protection on Switch-1 are blocked. VRRP heartbeat packets cannot be forwarded between
Switch-1 and Switch-2. Both the two switches become the VRRP master and services are
interrupted.
It is found that 000f-e2f6-1d18 is the system MAC address of the O&M switch connected to
GE0/17. The switch is incorrectly connected to the network when the fault occurs.
Disable STP on ports that are not added to the STP ring on the ATAE switching board.

Conclusion and Suggestion


If a device with a higher priority sends packets to preempt to be the root bridge, services may
be interrupted. When configuring root protection to protect the root bridge, consider the
situation and prevent it.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 34


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.3.3 Network Construction Causes a Loop

Involved Products and Versions


S9300 V100R003C00SPC200

Network Diagram
After network (in Figure 1.1) restructuring and migration, the original core devices (Layer 3
devices) are re-deployed as access devices AS (Layer 2 devices). VRRP is configured on
DS_01 and DS_02.

Figure 1.1 A loop causes intermittent service interruption

Symptom
Ping the management IP address of the AS on the Layer 3 device DS. The command output
shows that the ping fails and the VRRP group status of the DS frequently alternates between
master and backup.
The following traps are reported on DS_02:
Sep 17 2013 21:46:11+08:00 DS_02 VRRP/3/VRRPMASTERDOWN:OID
1.3.6.1.4.1.2011.5.25.127.2.30.1 The state of VRRP changed from master to other
state.(VrrpIfIndex=143, VrId=48, IfIndex=143, IPAddress=11.91.127.239,
NodeName=DS_02, IfName=Vlanif948, CurrentState=2, ChangeReason=priority
calculation)

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 35


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Sep 17 2013 21:46:11+08:00 DS_02 %%01VRRP/4/STATEWARNINGMEV1R3(l):Virtual Router


state BACKUP changed to MASTER, because of protocol timer expired.
(Interface=Vlanif948, VrId=48).

Sep 17 2013 21:46:11+08:00 DS_02 %%01VRRP/4/STATEWARNINGMEV1R3(l):Virtual Router


state MASTER changed to BACKUP, because of priority calculation.
(Interface=Vlanif948, VrId=48)
.

The VRRP group status frequently alternates. Check the VRRP group status after the
switchover. All VRRP groups are in Backup state.
<DS_02> display vrrp brief
VRID State Interface Type Virtual IP
--------------------------------------------------------
3 Backup Vlanif903 Normal 10.93.4.30
5 Backup Vlanif599 Normal 11.91.127.94
14 Backup Vlanif914 Normal 10.93.41.126
24 Backup Vlanif924 Normal 10.93.32.126
25 Backup Vlanif925 Normal 10.93.32.254
…………

Cause Analysis
A loop exists on the network.

Handling Procedure
1. Run the display cpu-defend vrrp statistics all command to check statistics of the VRRP
packets. The command output shows a large number of packets are dropped on DS_02.
[DS_02] display cpu-defend vrrp statistics all
Statistics on mainboard:
-------------------------------------------------------------------------------

Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)


-------------------------------------------------------------------------------

vrrp 0 0 0 0
-------------------------------------------------------------------------------

Statistics on slot 1:
-------------------------------------------------------------------------------

Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)


-------------------------------------------------------------------------------

vrrp 0 0 0 0
-------------------------------------------------------------------------------

Statistics on slot 4:
-------------------------------------------------------------------------------

Packet Type Pass(Bytes) Drop(Bytes) Pass(Packets) Drop(Packets)


-------------------------------------------------------------------------------

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 36


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

vrrp 79880066214 2581617736 1174644777 37950869


-------------------------------------------------------------------------------

2. Run the display interface brief command to check interface bandwidth usage.
[DS_02] display interface brief
…………
Interface PHY Protocol InUti OutUti inErrors outErrors
Eth-Trunk1 up up 31% 31% 0 0
GigabitEthernet4/0/22 up up 0.72% 81% 0 0
GigabitEthernet4/0/23 up up 81% 0.73% 2 0
Ethernet0/0/0 down down 0% 0% 0 0
…………
GigabitEthernet4/0/0 up up 0% 81% 0 0
GigabitEthernet4/0/1 up up 0% 81% 0 0
GigabitEthernet4/0/2 up up 0% 81% 2 0
GigabitEthernet4/0/3 up up 0% 81% 0 0
GigabitEthernet4/0/4 up up 0% 81% 0 0
GigabitEthernet4/0/5 up up 0% 81% 0 0
GigabitEthernet4/0/6 up up 0% 81% 0 0
GigabitEthernet4/0/7 up up 0% 81% 0 0
GigabitEthernet4/0/8 up up 0% 82% 0 0
GigabitEthernet4/0/9 up up 0% 82% 0 0
GigabitEthernet4/0/10 up up 0% 82% 0 0
GigabitEthernet4/0/11 down down 0% 0% 0 0
GigabitEthernet4/0/12 up up 0% 82% 0 0
GigabitEthernet4/0/13 up up 0% 82% 0 0
GigabitEthernet4/0/14 up up 0% 82% 0 0
GigabitEthernet4/0/15 up up 0% 82% 0 0
GigabitEthernet4/0/16 up up 0% 82% 0 0
GigabitEthernet4/0/17 up up 0.01% 82% 0 0
GigabitEthernet4/0/18 up up 82% 0% 0 0
GigabitEthernet4/0/19 up up 87% 82% 0 0
GigabitEthernet4/0/20 down down 0% 0% 0 0
GigabitEthernet4/0/21 up up 0.01% 0.01% 0 0
LoopBack500 up up(s) 0% 0% 0 0
NULL0 up up(s) 0% 0% 0 0
Vlanif599 up up -- -- 0 0
…………

As shown in the preceding information, the outbound traffic on the interface connecting
to the AS reaches 80%, which indicates that a loop occurs. The inbound traffic on
GigabitEthernet4/0/18 and GigabitEthernet4/0/19 also reaches 80%, which indicates that
the loop occurs on the AS devices connected to the two interfaces. Manually shut down
the two interfaces, and then check CPU-defend statistics and ping the management IP
address of another AS. You can find that the number of dropped VRRP packets stop
increasing and the ping command succeeds.
3. Interfaces GigabitEthernet4/0/18 and GigabitEthernet4/0/19 connect to AS_03 and
AS_05 respectively. Both are non-Huawei and Layer 3 devices, on which STP is
disabled. When the two devices are re-deployed as Layer 2 devices, the command for
enabling STP is not configured, resulting in the loop.
Enable STP, and then enable GigabitEthernet4/0/18 and GigabitEthernet4/0/19 on the
DS. Check STP status and traffic on the interfaces, and you can find that services are
recovered.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 37


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Conclusion and Suggestion


When the network traffic is unstable, check the traffic on interfaces to determine whether
loops occur. If a loop occurs, locate the source based on the information about packets
received and sent by the interfaces. Shut down related interfaces temporarily. Find out the root
cause and resolve the problem accordingly.

1.6.3.4 SEP Deletion on a Faulty Port Causes the Switch to Be Out of


Management

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, SwitchA, SwitchB, SwitchC, SwitchG, SwitchF, and SwitchE form SEP
Segment 1, while SwitchC, SwitchD, and SwitchE form SEP Segment 2.

Figure 1.1 SEP deletion on a faulty port causes the switch to be out of management

Symptom
The link between SwitchC and SwitchD is faulty. After the SEP configuration is deleted on
the faulty port of SwitchD, SwitchD is out of management.

Cause Analysis
When the link between SwitchC and SwitchD is faulty, the blocked port in SEP Segment 2 is
unblocked. The two faulty ports are in Discarding state. After the SEP configuration is
deleted on the faulty port of SwitchD, SEP Segment 2 selects a new blocked port from the two
connected ports of SwitchD and SwitchE. Both links connecting SwitchD to SwitchC and
SwitchE fail. As a result, SwitchD cannot be managed.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 38


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Handling Procedure
Run the display sep topology segment segment-id command to view the current topology
information and locate the faulty port.
<SwitchD> display sep topology segment 2
SEP segment 2
SEP detects a segment failure that may be caused by an incomplete topology
-----------------------------------------------------------------
System Name Port Name Port Role Port Status
-----------------------------------------------------------------
SwitchE GE0/0/3 secondary forwarding
SwitchC GE0/0/1 common forwarding
SwitchD GE0/0/2 common discarding

When deleting an SEP configuration in an open ring scenario, you are advised to delete the
configuration from one end of the open ring. When only one SEP-enabled port is left, shut
down the port and then delete the SEP configuration on the port.

Conclusion and Suggestion


When deleting an SEP configuration, you need to consider the deployment of service VLANs
in the SEP segment. Otherwise, a switch may be out of management or services may be
interrupted because of multiple blocked ports.

1.6.4 Misconfigurations
1.6.4.1 Services Are Interrupted When Ports Are Not Deleted from VLAN 1

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, the switch connects to routers through dual links and connects to downlink
access devices.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 39


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 Services are interrupted due to a Layer 2 loop

Symptom
All services on the dual links of the switch are interrupted. After the switch restarts, services
are restored for a short time. Then the problem occurs again.

Cause Analysis
A loop occurs on the access network, causing broadcast storm. As a result, the bandwidth of
uplink ports on the switch is occupied and the OSPF neighbor relationship is Down. After the
switch restarts, broadcast storm is eliminated and services are restored. When broadcast storm
recurs, the fault occurs.

Handling Procedure
1. Check the log file. The log file shows that the OSPF neighbor relationship is Down
because the remote device does not receive OSPF Hello packets in a timely manner.
NBR_CHG_DOWN(l): Neighbor event:neighbor state changed to Down. (ProcessId=88,
NeighborAddress=x.x.x.x, NeighborEvent=KillNbr, NeighborPreviousState=Loading,
NeighborCurrentState=Down)

NBR_DOWN_REASON(l): Neighbor state leaves full or changed to Down.


(ProcessId=88, NeighborRouterId= x.x.x.x,, NeighborAreaId=0,
NeighborInterface=Vlanif4,NeighborDownImmediate reason=Neighbor Down Due to
Kill Neighbor, NeighborDownPrimeReason=Physical Interface State Change,
NeighborChangeTime=

2. Check the diagnostic log file. The file shows that there is abnormal traffic on the
interfaces. There are alarms about outgoing traffic on GE1/0/0 and GE1/0/1, and alarms
about incoming traffic on GE1/0/3 and GE1/0/4.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 40


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Interface GigabitEthernet1/0/0's flow is abnormal. (Speed=1000Mbps,


CurrentInSpeed=0Mbps, CurrentOutSpeed=849Mbps, File=IFPDT_FUNC_C, Line=13072)
Interface GigabitEthernet1/0/1's flow is abnormal. (Speed=1000Mbps,
CurrentInSpeed=3Mbps, CurrentOutSpeed=850Mbps, File=IFPDT_FUNC_C, Line=13072)

Interface GigabitEthernet1/0/3's flow is abnormal. (Speed=1000Mbps,


CurrentInSpeed=847Mbps, CurrentOutSpeed=846Mbps, File=IFPDT_FUNC_C, Line=13072)

Interface GigabitEthernet1/0/4's flow is abnormal. (Speed=1000Mbps,


CurrentInSpeed=849Mbps, CurrentOutSpeed=849Mbps, File=IFPDT_FUNC_C, Line=13072)

Interface GigabitEthernet1/0/6's flow is abnormal. (Speed=1000Mbps,


CurrentInSpeed=0Mbps, CurrentOutSpeed=849Mbps, File=IFPDT_FUNC_C, Line=13072)

Interface GigabitEthernet1/0/10's flow is abnormal. (Speed=1000Mbps,


CurrentInSpeed=0Mbps, CurrentOutSpeed=849Mbps, File=IFPDT_FUNC_C, Line=13072)
Interface GigabitEthernet1/0/11's flow is abnormal. (Speed=1000Mbps,
CurrentInSpeed=0Mbps, CurrentOutSpeed=849Mbps, File=IFPDT_FUNC_C, Line=13072)

3. Analyze the configurations of interfaces where there are alarms about abnormal traffic.
These interfaces all join VLAN 1. Traffic from VALN 1 on GE1/0/3 and GE1/0/4 is
broadcast to all other interfaces. As a result, the outgoing traffic on uplink interfaces is
abnormal and OSPF Hello packets are discarded. A loop has occurred in VLAN 1. After
GE1/0/3 and GE1/0/4 are deleted from VLAN 1, the fault is rectified.

Conclusion
Loops often occur in VLAN 1. If traffic on interfaces is abnormal, check the configurations of
the interfaces and check whether the interfaces are in VLAN 1. In addition, pay attention to
the number of broadcast packets on the interface.

1.6.4.2 Services Are Interrupted Because BPDU Is Not Enabled on Switch


Ports

Involved Products and Versions


S2700&S3700&S5700 V100R005

Network Diagram
In Figure 1.1, the switches run V100R005C01SPC100 and have global STP enabled. The
switches connect to multiple Cisco switches to constitute multiple STP rings.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 41


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 Services are interrupted because switch ports are not configured with BPDU.

Symptom
When services are interrupted, log in to the switches. There are many broadcast packets on
interconnected ports and loops occur.

Root Cause
The preceding configuration shows that global STP is enabled on the two switches but bpdu
enable is not configured on interconnected ports.
#
interface GigabitEthernet0/0/4
port link-type access
port default vlan 10
loopback-detect enable
undo ntdp enable
undo ndp enable
#

On the switch ports enabled with Layer 2 protocols such as STP and LACP, the bpdu enable
command needs to be configured so that received protocol packets can be sent to the CPU for
processing. Otherwise, protocol packets are discarded and protocol negotiation cannot be
implemented.

Handling Procedure
There are loops on the network. First check whether STP convergence is normal. When there
is no blocked port in the STP ring, run the display stp interface command to check the role
of the port in the spanning tree and check whether STP BPDUs are received and sent
normally. For example:
Port Role :Designated Port
Port Priority :128
Port Cost(Dot1T ) :Config=auto / Active=20000 //Path cost of the port
Designated Bridge/Port :4096.5489-98f5-a433 / 128.34 //Specified bridge ID
BPDU Sent :726
TCN: 0, Config: 0, RST: 0, MST: 726

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 42


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

BPDU Received :0
TCN: 0, Config: 0, RST: 0, MST: 0

If interconnected STP-enabled ports are designated ports, STP negotiation fails. Check
whether bpdu enable is configured on the ports. If bpdu enable is not configured, configure
bpdu enable on the ports that participate in STP calculation.

Conclusion
For the X7 series switches of modular switches, bpdu enable does not need to be configured
on the ports that participate in STP calculation. bpdu disable or bpdu bridge disable is
configured by default.
For fixed switches of versions earlier than V100R006, bpdu enable needs to be configured on
the ports that participate in STP calculation. Otherwise, the switch does not process received
STP BPDUs. For fixed switches of V100R006 and later versions, bpdu enable is configured
on ports by default.
For modular switches, bpdu enable does not need to be configured on the ports that
participate in STP calculation. bpdu disable or bpdu bridge disable is configured by default.

1.6.4.3 Switch Ports Connected to Terminals Are Not Configured as STP


Edge Ports. When Booting from the Network Adapter, Some Terminals
Cannot Obtain IP Addresses

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, PCs connect to switches and obtain IP addresses through DHCP.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 43


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 PCs booting from network adapters fail to obtain IP addresses

Symptom
When some types of terminals (such as Lenovo PC) start, they cannot obtain IP addresses
from the DHCP server and cannot go online.

Cause Analysis
The switches connected to terminals run STP but the connected ports are not configured as
STP edge ports.
When the problematic terminals boot from network adapters, the corresponding switch ports
alternate between Up and Down. The terminals then send four messages to request IP
addresses. Since the ports are not configured as STP edge ports, port disconnection will
trigger the STP protocol to recalculate the network topology. The network convergence takes
about 30s. During this period, traffic cannot be forwarded by the ports. Therefore, the ports
discard the request messages from the terminals. The terminals send only four messages to
request IP addresses and consider an IP address obtaining failure if they receive no response
to the four messages. Therefore, the terminals cannot obtain IP addresses.

Handling Procedure
1. STP is enabled on the switch and switch ports connected to the terminals are not
configured as STP edge ports.
2. The switch ports connected to the terminals alternate between Up and Down when the
terminals boot from the network adapter.
3. Run the stp edged-port enable command to configure switch ports connected to the
terminals as edge ports.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 44


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Conclusion and Suggestion


If STP is enabled on a switch, the ports that connect the switch to terminals should be all
configured as edge ports. In V200R001 and later versions, the switch can automatically probe
and set the ports connected to terminals as edge ports.

1.6.4.4 STP Convergence Cannot Be Adjusted in Other MSTIs But Not


MSTI 0 Because MST Region Configurations Are Different

Involved Products and Versions


S7700 V100R003/V100R006/V200R001/V200R002/V200R003/V200R005
S9700 V200R001/V200R002/V200R003/V200R005
S12700 V200R005
S2700&S3700&S5700 V100R005/V100R006
S3700&S5700&S6700 V100R006/V200R001/V200R002/V200R003/V200R005

Network Diagram
In Figure 1.1, Switch-1 and Switch-2 are connected through GE0/0/20, GE0/0/23, and
GE0/0/24. GE0/0/20 is added to VLAN 99 and VLAN 101, and GE0/0/23 is added to only
VLAN 99, and GE0/0/24 is added to only VLAN 101. VLAN 99 maps MSTI 1, and VLAN
101 maps MSTI 2.

Figure 1.1 STP convergence cannot be adjusted

Symptom
The STP convergence result on two switches is as follows:
<Switch-1> display stp brief
MSTID Port Role STP State Protection
0 GigabitEthernet0/0/20 DESI FORWARDING NONE
0 GigabitEthernet0/0/23 DESI FORWARDING NONE
0 GigabitEthernet0/0/24 DESI FORWARDING NONE
1 GigabitEthernet0/0/20 DESI FORWARDING NONE
1 GigabitEthernet0/0/23 DESI FORWARDING NONE
2 GigabitEthernet0/0/20 DESI FORWARDING NONE
2 GigabitEthernet0/0/24 DESI FORWARDING NONE
<Switch-2> display stp brief
MSTID Port Role STP State Protection
0 GigabitEthernet0/0/20 ROOT FORWARDING NONE
0 GigabitEthernet0/0/23 ALTE DISCARDING NONE
0 GigabitEthernet0/0/24 ALTE DISCARDING NONE

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 45


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1 GigabitEthernet0/0/20 MAST FORWARDING NONE


1 GigabitEthernet0/0/23 ALTE DISCARDING NONE
2 GigabitEthernet0/0/20 MAST FORWARDING NONE
2 GigabitEthernet0/0/24 ALTE DISCARDING NONE

GE0/0/20 are in forwarding state in MSTIs 1 and 2. The customer requires that the STP status
of GE0/0/20 in different MSTIs is different. After the costs of GE0/0/20 in different MSTIs
are adjusted, the status remains unchanged.

Cause Analysis
On the two switches, the MST region names are different. That is, the two switches belong to
different regions. STP or RSTP is used between different regions for them to converge. The
convergence result of MSTI 0 takes effect for all MSTIs.

Handling Procedure
After MSTP multi-instance is configured on the two switches, convergence can be performed
in each MSTI. The convergence result shows that the configurations are correct. The
convergence results in MSTIs 1 and 2 are the same as the convergence result in MSTI 0.
Check whether the two switches are in the same MST region.
The MST region configurations on the two switches are as follows:
Switch-1:
stp region-configuration
region-name vlan101
instance 1 vlan 101
instance 2 vlan 99
active region-configuration

Switch-2:
stp region-configuration
region-name vlan99
instance 1 vlan 101
instance 2 vlan 99
active region-configuration

According to the preceding configurations, the MST region names are different. Two devices
belong to the same MST region only when they have the same MST region name, mapping
between MSTIs and VLANs, format selector, and revision level.
Convergence can be performed independently in MSTIs of the same MST region. Configure
the same MST region name for two switches and adjust the costs of GE0/0/20 in different
MSTIs so that the STP status of GE0/0/20 in different MSTIs is different.

Conclusion
By default, S series switches use the system MAC address as the region name, for example,
00d0d0c7ec77.
When a switch works in MSTP mode and several MSTIs are configured in an MST region,
pay attention to the MST region configuration and MSTI mapping the VLAN where the
interface is added.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 46


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.4.5 Inconsistent MSTP Packet Formats Cause Ports to Be Down

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, Switch-1, Switch-2, and two H3C S6500s form an MSTP ring.

Figure 1.1 Inconsistent MSTP packet formats on two ends cause ports to be Down

Symptom
After Switch-1 restarts and goes online again, GE0/0/4 of S6500-1 automatically goes down.
You need to run the undo shutdown command to manually recover the port. The following
alarm message is printed:
%Jul 5 08:13:33 2011 S6500-1 L2INF/5/PORT LINK STATUS CHANGE:
GigabitEthernet0/0/4: is UP
%Jul 5 08:13:42 2011 S6500-1 MSTP/3/BPDUFORMATERROR:Port GigabitEthernet0/0/4
received different format of MSTP BPDU packets continually! Shut down it in order
to voiding broadcast
%Jul 5 08:13:43 2011 S6500-1 L2INF/5/PORT LINK STATUS CHANGE:
GigabitEthernet0/0/4: is DOWN

Cause Analysis
MSTP retains default packet formats on the ports between switches and S6500 switches.
However, these default MSTP packet formats are different. The ports on S6500s are shut
down.
The default stp compliance implementation mode on ports of Huawei switches is auto, in
which the ports send packets in dot1s format. However, the H3C S6500s send packets in
legacy format by default. After going Up, the port of an S6500 consecutively sends three
legacy packets. After going Up, the port of a Huawei switch sends one dot1s packet. The

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 47


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

S6500 replies with a dot1s packet and switch replies to the legacy packet of the S6500. After
that, the S6500 and Huawei switch exchange dot1s packets.
The S6500 uses a special mechanism to check the packet format: if it receives three or more
legacy and dot1s packets on a port within 10 seconds, it shuts down the port.

Handling Procedure
1. When the ports go Up after a reboot of the Huawei switches, run the display stp
interface command on the S6500s to check information about the interconnected ports.
In the command output, MSTP BPDU format displays legacy.
[S6500] display stp interface
......
----[Port4(GigabitEthernet0/0/4)][FORWARDING]----
Port Protocol :enabled
Port Role :CIST Designated Port
Port Priority :128
Port Cost(Legacy) :Config=auto / Active=20
Desg. Bridge/Port :32768.000f-e2e0-5501 / 128.4
Port Edged :Config=disabled / Active=disabled
Point-to-point :Config=auto / Active=true
Transit Limit :3 packets/hello-time
Protection Type :None
Receive/Send
MSTP BPDU format :legacy
Port Config
Digest Snooping :disabled
Num of Vlans Mapped :0
PortTimes :Hello 2s MaxAge 20s FwDly 15s RemHop 0
BPDU Sent :56143
TCN: 0, Config: 0, RST: 0, MST: 56143
BPDU Received :56734
TCN: 0, Config: 0, RST: 0, MST: 56734
......

2. Run the stp compliance legacy command to set the port mode between switch and
S6500 to legacy.

Conclusion
When Huawei switches connect to non-Huawei devices, check whether the MSTP packet
format on the remote port is auto and whether special check mechanism is used.
If the STP compliance mode of a Huawei switch's port is not auto, and the format the packets
received by the port differs from the configured format, the switch prints the following log:
MSTP/3/PACKET_ERR_COMPLIAN:The port compliance protocol type of the the packet
received by MSTP from the port [port-name] is invalid.

If the preceding log is generated, handle the problem as follows:


1. Use a packet capture tool to record the received error packets.
2. Record the port information on the remote end, including the manufacturer, version, and
configuration.
If a Huawei device is used, run the display version, display interface, or display
current-configuration command to check the device version and configuration.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 48


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

If a non-Huawei device is used, run the related commands provided by the manufacturer
to obtain information on the device.
3. If MSTP sets the STP status of a port incorrectly after receiving invalid packets, loops
may occur on the Layer 2 network. It is recommended that the port be shut down to
avoid broadcast storm. To view STP status and check whether loops occur, run the
display stp brief command. If loops do not exist or are removed, run the undo
shutdown command to enable the port.

1.6.4.6 RRPP Multi-Instance Causes a Temporary RRPP Loop

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, SwitchA, SwitchB, SwitchC, and SwitchD form an RRPP ring. According to
data planning, the RRPP ring protects data of VLAN 10 and VLAN 20, which are added to
instance 1. The protected VLANs are bound to instance 1.

Figure 1.1 RRPP multi-instance causes a temporary loop

Symptom
On the preceding network, a loop occurs in VLAN 1.

Cause Analysis
1. Run the display current-configuration interface GigabitEthernet 1/0/1 command to
check configuration on ports of the RRPP ring. If the command output does not contain
undo port trunk allow-pass vlan 1, the ports all join VLAN 1 by default.
[SwitchA] display current-configuration interface GigabitEthernet 1/0/0
#
interface GigabitEthernet1/0/0
port link-type trunk
port trunk allow-pass vlan 10 20
stp disable
#
return

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 49


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

2. Run the display stp region-configuration command to check the multi-instance


configuration.
[SwitchA] display stp region-configuration
Oper configuration
Format selector :0
Region name :00e084701700
Revision level :0

Instance VLANs Mapped


0 1 to 9, 11 to 19, 21 to 4094
1 10, 20

3. Run the display current-configuration configuration rrpp-domain-region command


to check the RRPP configuration. The RRPP domain protects VLANs in instance 1.
VLAN 1 is not added to instance 1 so the RRPP ring cannot protect data of VLAN 1. As
a result, a loop occurs in VLAN 1 on the RRPP ring.
[SwitchA] display current-configuration configuration rrpp-domain-region
#
rrpp domain 1
control-vlan 1025
protected-vlan reference-instance 1
ring 1 node-mode transit primary-port GigabitEthernet1/0/1 secondary-port
GigabitEthernet1/0/2 level 0
ring 1 enable
#
return

Handling Procedure
You can use either of the following two methods to eliminate the loop in VLAN 1.
Method 1: Add VLAN 1 to instance 1 on SwitchA, SwitchB, SwitchC, and SwitchD. SwitchA
is used as an example.
[SwitchA] stp region-configuration
Info: Please activate the stp region-configuration after it is modified.
[SwitchA-mst-region] instance 1 vlan 1 10 20
[SwitchA-mst-region] active region-configuration
Info: This operation may take a few seconds. Please wait for a moment...done.
[SwitchA-mst-region] quit
[SwitchA] display stp region-configuration
Oper configuration
Format selector :0
Region name :00e084701700
Revision level :0

Instance VLANs Mapped


0 2 to 9, 11 to 19, 21 to 4094
1 1, 10, 20

Method 2: Delete VLAN 1 on ports connected to the RRPP ring if VLAN 1 is not required.
SwitchA is used as an example.
[SwitchA] interface GigabitEthernet1/0/1
[SwitchA-GigabitEthernet1/0/1] undo port trunk allow-pass vlan 1

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 50


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Conclusion and Suggestion


When planning protected VLANs for an RRPP ring, note that the ports on the RRPP ring join
VLAN 1 by default. Configure VLAN 1 as a protected VLAN to prevent loops.

1.6.4.7 RRPP Master Node's Working Mode Is Different from That of


Transit Nodes, Which Makes MAC Entries Fail to Be Updated

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, SwitchA, SwitchB, SwitchC, and SwitchD form an RRPP ring. SwitchA is the
master node. SwitchB, SwitchC, and SwitchD are transit nodes.

Figure 1.1 RRPP master node has a different mode than other transit nodes

Symptom
When links among SwitchB, SwitchC, and SwitchD fail and recover, MAC and ARP entries
on transit nodes are not updated. As a result, traffic forwarding is affected.

Cause Analysis
On the RRPP master node SwitchA, the configured RRPP mode is defined by international
standards, while that on SwitchB, SwitchC, and SwitchD is defined by Huawei's standard by
default. When a transit node fails, the common or complete packet sent by the RRPP master
node SwitchA is not processed. As a result, the MAC and ARP entries are not updated, and the
traffic forwarding is affected.

Handling Procedure
1. Check whether the RRPP master node is SwitchA.
<SwitchA> display rrpp verbose domain 1
Domain Index : 1

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 51


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Control VLAN : major 2 sub 3


Protected VLAN : Reference Instance 1
Hello Timer : 1 sec(default is 1 sec) Fail Timer : 6 sec(default is 6 sec)
RRPP Ring : 2
Ring Level : 1
Node Mode : Master //RRPP master node

2. Check the configuration of the RRPP master node SwitchA.


rrpp working-mode GB //The configured RRPP mode is defined by international
standards.
rrpp enable

Check the configuration of transit nodes.


rrpp enable //By default, the working mode is defined by Huawei's
standard.

You can see from the preceding command output, the RRPP mode configured globally on
SwitchA is defined by international standards (rrpp working-mode GB), while the RRPP
mode on transit nodes is the default working mode defined by Huawei's standard.

Conclusion
All nodes on the RRP ring must be configured with the same working mode: either the
working mode defined by international standards or that defined by Huawei's standard.

1.6.4.8 Users on a Transit Node and Downstream Nodes Connected to the


Transit Node Cannot Go Online

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, SwitchA functions as the master node of the RRPP ring. Normally, GE1/0/0 is
the primary interface, and GE2/0/0 is the secondary interface (blocked interface).

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 52


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 Users fail to go online on a transit node and downstream nodes connected to the transit
node

Symptom
When the primary interface on a transit node becomes Down and recovers, users on the transit
node and other downstream nodes connected to the transit node cannot go online. The fault is
rectified several minutes later.

Cause Analysis
The master and transit nodes use different RRPP working modes. The master node works in
GB mode and the transit node works in HW mode. As a result, the transit node cannot process
Flush packets of the master node.

Handling Procedure
1. Check whether the device works in GB or HW mode, that is, check whether rrpp
working-mode gb or rrpp working-mode hw is configured.
2. Run the display rrpp brief command to check the RRPP Working Mode field. Check
whether the value of the RRPP Working Mode field on nodes of the RRPP network is
the same.
<Quidway> display rrpp brief
Abbreviations for Switch Node Mode :
M - Master , T - Transit , E - Edge , A - Assistant-Edge
RRPP Protocol Status: Disable
RRPP Working Mode: HW
RRPP Linkup Delay Timer: 1 sec (0 sec default)
Number of RRPP Domains: 1

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 53


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Conclusion
If this problem occurs, check whether MAC address entries and ARP entries of the device are
updated. If not, check whether the RRPP working mode is the same.

1.6.4.9 An RRPP Loop Occurs Due to Original Multi-Instance


Configuration

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, SwitchA, SwitchB, and SwitchC constitute an RRPP ring. SwitchB is the
master node while SwitchC is the transit node. GE2/0/4 and GE1/0/5 of SwitchA allow
packets from control VLAN 2515 of the RRPP ring to pass through. VLANs in instance 0 on
SwitchB and SwitchC are configured as protected VLANs.

Figure 1.1 Original multi-instance configuration causes an RRPP loop

Symptom
The original multi-instance configuration on the master node SwitchB causes a loop in a
VLAN that is not in instance 0. As a result, many access devices cannot be managed.

Cause Analysis
1. Check the RRPP configuration on SwitchB.
2. Run the display current-configuration configuration rrpp-domain-region command
to check RRPP domain configuration.
[SwitchB] display current-configuration configuration rrpp-domain-region
#
rrpp domain 1
control-vlan 2515
protected-vlan reference-instance 0
ring 1 node-mode master primary-port GigabitEthernet0/0/1 secondary-port
GigabitEthernet0/0/2 level 0

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 54


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

ring 1 enable
#
return

3. Check multi-instance configuration.


Run the display stp region-configuration command to check multi-instance
configuration on SwitchB.
[SwitchB] display stp region-configuration
Oper configuration
Format selector :0
Region name :00259e5cec21
Revision level :0

Instance Vlans Mapped


0 1 to 2499, 2501 to 2542, 2544 to 2572, 2574 to 4094
1 2500, 2543, 2573

4. Check the VLAN configuration.


Run the display vlan command to check ports in VLANs of instance 1.
Configuration of SwitchB:
[SwitchB] display vlan 2500
VLAN ID Type Status MAC Learning
----------------------------------------------------------
2500 common enable enable
----------------
Tagged Port: GigabitEthernet0/0/1 GigabitEthernet0/0/2

----------------
Interface Physical
GigabitEthernet0/0/1 UP
GigabitEthernet0/0/2 DOWN

Configuration of SwitchC:
[SwitchC] display vlan 2500
VLAN ID Type Status MAC Learning
----------------------------------------------------------
2500 common enable enable
----------------
Tagged Port: GigabitEthernet0/0/1 GigabitEthernet0/0/2

----------------
Interface Physical
GigabitEthernet0/1/1 UP
GigabitEthernet0/1/2 UP

Configuration of SwitchA:
[SwitchA] display vlan 2500
VLAN ID Type Status MAC Learning Broadcast/Multicast/Unicast Property

-------------------------------------------------------------------------------
-
2500 common enable enable forward forward forward default

----------------

Tagged Port: GigabitEthernet2/0/0 GigabitEthernet2/0/1

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 55


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

GigabitEthernet2/0/2 GigabitEthernet2/0/4
GigabitEthernet2/0/5 GigabitEthernet2/0/6

----------------
Interface Physical
GigabitEthernet2/0/0 UP
GigabitEthernet2/0/1 UP
GigabitEthernet2/0/2 UP
GigabitEthernet2/0/4 UP
GigabitEthernet2/0/5 DOWN
GigabitEthernet2/0/6 UP

Beside all ports on the ring, some ports not on the ring allow packets from VLAN 2500 to
pass through. VLAN 2500 is in instance 1. The RRPP ring only protects VLANs of instance
0. As a result, a loop occurs in VLAN 2500.

Handling Procedure
In this case, the RRPP ring is deployed to protect all VLANs. Instance 1 can be deleted.
SwitchB is used as an example.
[SwitchB] stp region-configuration
Info: Please activate the stp region-configuration after it is modified.
[SwitchB-mst-region] undo instance 1
[SwitchB-mst-region] active region-configuration
Info: This operation may take a few seconds. Please wait for a moment...done.
[SwitchB-mst-region] quit
[SwitchB] display stp region-configuration
Oper configuration
Format selector :0
Region name :00259e5cec21
Revision level :0

Instance Vlans Mapped


0 1 to 4094

Conclusion and Suggestion


When deploying an RRPP ring, ensure that original multi-instance configuration does not
affect the RRPP ring deployment.

1.6.4.10 loopback internal Causes a Loop

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, a PC connects to a switch through a L2 switch, and requests to access the
internal server.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 56


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 loopback internal causes a loop

Symptom
When a PC connected to the switch accesses an intranet server, severe packet loss occurs and
services are interrupted.

Cause Analysis
loopback internal is configured on a switch, causing a MAC address flapping.

Handling Procedure
Delete loopback internal on L2 Switch.
1. Run the loop-detect eth-loop alarm-only command in the VLAN view to enable MAC
flapping detection.
2. Run the display trapbuffer command to view alarm information. Check whether MAC
flapping alarm has been reported. The alarm information shows that MAC flapping has
occurred on GE0/0/1. Check the configurations of downstream device.
3. Run the display current-configuration command on L2 switch to display the interface
configurations. loopback internal has been configured on the interface. Therefore, the
server's MAC address is learned by the ports between switch and l2 switch.

Conclusion and Suggestion


MAC address flapping is one of the most common causes for packet loss during Layer 2
forwarding. If such a problem occurs, check whether MAC address flapping occurs.

1.6.4.11 Services Are Interrupted After Smart Link Master and Slave
Interfaces Are Switched

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, SwitchA is configured with Smart Link, and GE1/0/2 is the master interface and
GE1/0/3 is the slave interface.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 57


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 Services are interrupted after master and slave interfaces are switched

Symptom
On SwitchA, GE1/0/2 fails, services are switched to the slave interface. As a result, services
are interrupted. Services can be recovered only after MAC addresses and IP addresses are
manually updated.

Cause Analysis
When a link failover occurs in the Smart Link group, original forwarding entries are
applicable to the new topology. MAC address entries and ARP entries on the entire network
need to be updated. The Smart Link group sends Flush packets to instruct upstream devices to
update their MAC address entries and ARP entries. Upstream devices can update their MAC
address entries and ARP entries only when they are enabled to receive Flush packets. If a
device rejects Flush packets, it cannot forward packets correctly after a link failover occurs in
the Smart Link group.
The interface on SwitchD is not configured to receive Flush packets. When Flush packets of
SwitchA during the failover reach SwitchD, SwitchD does not update its ARP entries
(GE1/0/2 on SwitchA needs to be changed to GE1/0/3). In this case, traffic passing SwitchD
is still sent to the original link that has been blocked. As a result, packets cannot pass.

Handling Procedure
Check whether the smart-link flush receive control-vlan vlan-id command is configured on
interfaces (GE1/0/2 and GE1/0/3 on SwitchB, GE1/0/3 and GE1/0/4 on SwitchC, and
GE1/0/4 and GE1/0/5 on SwitchD) of active and standby links of SwitchB, SwitchC, and
SwitchD.
There is no smart-link flush receive control-vlan vlan-id command configuration on the
interfaces. Run the smart-link flush receive control-vlan vlan-id command on interfaces of
active and standby links of SwitchB, SwitchC, and SwitchD. Ensure that the control VLAN
ID and password in the Flush packet are the same as those configured on SwitchA.

Conclusion and Suggestion


An interface can receive Flush packets only when the interface is configured with the control
VLAN ID and is added to the control VLAN.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 58


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

You only need to configure interfaces on active and standby links between Smart Link devices
and destination devices to receive Flush packets from a specified control VLAN.

1.6.5 Improper Configurations


1.6.5.1 A Large Number of TC BPDUs Cause an ARP Learning Error on a
Modular Switch

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, Switch-A and Switch-B are directly connected through an Eth-Trunk, and
VRRP is run on them. Switch-A is the VRRP master and Switch-B is the VRRP backup.
Switch-A and Switch-B function as a Layer 3 gateway to connect to access switches. STP is
enabled on these devices. The Layer 2 access switches directly connect to users.

Figure 1.1 A large number of TC BPDUs cause an ARP learning error on a modular switch

Symptom
An ARP learning error occurs on Switch-A. Many incomplete ARP entries exist on the switch.
Switch-A cannot learn the ARP entries of users sometimes, affecting service stability.

Cause Analysis
On the Layer 2 access switches, the stp edged-port enable command is not run on STP edge
ports. When the status of edge ports changes, a TC BPDU is sent to the VRRP group. The
VRRP group starts STP convergence, and then clears ARP entries or detects aged ARP entries.
Too many ARP entries exist in the VRRP group, so the VRRP group sends many ARP request
packets for probing and receives many ARP reply packets. The rate of ARP packets exceeds

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 59


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

the CIR value. As a result, some ARP reply packets are discarded and ARP entries are aged
out. Services of the corresponding users are abnormal. When the VRRP group frequently
receives such TC BPDUs, services are unstable.

Handling Procedure
1. Log in to Switch-A to view the ARP entries on VLANIF 27. This VLANIF interface
connects to the servers where users are online for a long time. View statistics for a long
time. You can find that the total number of ARP packets on the interface alternates
between 50 and 20. There are ARP entries in Incomplete state, and the IP addresses also
change frequently. The aging time of learned ARP entries becomes 0 sometimes.
<Switch-A> display arp interface vlanif 27
IP ADDRESS MAC ADDRESS EXPIRE(M) TYPE INTERFACE VPN-INSTANCE
VLAN/CEVLAN
------------------------------------------------------------------------------
132.212.4.3 0025-9e7f-fd01 I - Vlanif27
132.212.4.129 0014-38b9-73c3 0 D-0 Eth4/0/42
27/-
132.212.4.133 00e0-fc94-cddd 0 D-0 Eth4/0/42
27/-
132.212.4.203 0018-7172-5901 0 D-0 Eth4/0/42
27/-
132.212.4.107 0011-43a3-388f 0 D-0 Eth4/0/42

The switch has received TC BPDUs, and aged out ARP entries.
2. Run the display stp tc command to view the TC BPDUs received by the interface.
[Switch-A-hidecmd] display stp tc
---------- Stp Instance 0 tc or tcn count ----------
Port GigabitEthernet1/0/0 0
Port GigabitEthernet1/0/1 0
Port GigabitEthernet1/0/2 0
Port GigabitEthernet1/0/3 0
Port GigabitEthernet1/0/4 87
Port GigabitEthernet1/0/5 123
Port GigabitEthernet1/0/6 99
Port GigabitEthernet1/0/8 71
Port GigabitEthernet1/0/9 173
Port GigabitEthernet1/0/10 146
Port GigabitEthernet1/0/13 8
Port GigabitEthernet1/0/21 0

3. Analyze the log. The log shows the received TC BPDUs and ARP entry aging records.
Apr 19 2011 09:59:58 DCN_S9306_A %%01MSTP/6/RECEIVE_MSTITC(l): MSTP received
BPDU with TC, MSTP process 0 instance 0, port name is Ethernet4/0/46.

The log also shows that ARP reply packets have been discarded due to CPCAR
exceeding.
Apr 19 2011 09:28:13 DCN_S9306_A %%01QOSE/4/CPCAR_DROP_LPU(l): Some packets are
dropped by cpcar on the LPU in slot 1. (Protocol=arp-reply, Drop-Count=061)

The preceding information indicates that the switch has frequently received TC BPDUs and
aged out ARP entries. The device needs to send a large number of ARP probe packets, and
user terminals return many ARP reply packets, whose rate exceeds the CIR. Therefore, most
ARP reply packets are discarded. The ARP entries are aged out and deleted, affecting services.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 60


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

The TC BPDUs received by the switch are sent from downstream access switches. Access
switches are directly connected to PCs, and STP is enabled on their interfaces; however, the
stp edged-port enable command is not run. When PCs are powered on and off, many edge
ports alternate between Up and Down. The switch repeatedly sends TC BPDUs.
After the stp edged-port enable command is run on the edge ports, the problem does not
recur within several days, and user services are normal.

Conclusion and Suggestion


The problem has occurred many times. The switch functions as a gateway, the Layer 2
switches are connected to PCs, and STP is run on the network to prevent loops. Generally, the
stp edged-port enable command is not run on STP edge ports of Layer 2 switches. When
PCs are powered on and off, the edge ports alternate between Up and Down. The switch sends
TC BPDUs to the STP root port. The gateway frequently performs STP convergence and
deletes ARP entries, causing an error in ARP learning.
In this scenario, the following configurations are recommended:
1. Run the stp converge normal command on the switch. After receiving a TC BPDU, the
switch does not delete the ARP entry immediately, but initiates an ARP probe. If the ARP
probe fails, the switch deletes the ARP entry. The command reduces impact on traffic
forwarding.
2. Run the stp edged-port enable command on the STP edge ports of Layer 2 switches, so
that the edge port status change will not cause repeated STP convergence.

1.6.5.2 Many TC BPDUs Cause a High CPU Usage

Involved Products and Versions


S switches of all versions

Network Diagram
None.

Symptom
1. In Figure 1.1, the CPU usage of a switch displayed on the network management system
(NMS) is high.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 61


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 CPU usage on the NMS

2. Logs indicating a high CPU usage are generated on the switch.


Switch %%01VOSCPU/4/CPU_USAGE_HIGH(l)[31]:The CPU is overloaded(CpuUsage=96%,
Threshold=95%), and the tasks with top three CPU occupancy are:
FTS total : 18%
SRMT total : 11%
SOCK total : 8%
Switch %%01VOSCPU/4/CPU_USAGE_HIGH(l)[60]:The CPU is overloaded(CpuUsage=100%,
Threshold=95%), and the tasks with top three CPU occupancy are:
PPI total : 41%
SRMT total : 10%
FTS total : 8%

3. There are also logs indicating that a large number of ARP packets are discarded because
of CPCAR exceeding.
Switch %%01DEFD/4/CPCAR_DROP_MPU(l)[56]:Rate of packets to cpu exceeded the
CPCAR limit on the MPU. (Protocol=arp-miss, ExceededPacketCount=016956)
Switch %%01DEFD/4/CPCAR_DROP_MPU(l)[57]:Rate of packets to cpu exceeded the
CPCAR limit on the MPU. (Protocol=arp-reply, ExceededPacketCount=020699)
Switch %%01DEFD/4/CPCAR_DROP_MPU(l)[58]:Rate of packets to cpu exceeded the
CPCAR limit on the MPU. (Protocol=arp-request, ExceededPacketCount=0574

4. Collect statistics about transmitted and received TC BPDUs on interfaces.


In Figure 1.2, the number of received TC BPDUs increases on all STP-enabled ports.

Figure 1.2 Increase in the number of TC BPDUs on ports

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 62


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Root Cause
Based on statistics about TC BPDUs, the number of TC BPDUs received on STP-enabled
ports is large and keeps increasing. After receiving TC BPDUs, the switch deletes MAC
address entries and updates ARP entries. The switch has to process a large number of ARP
Miss, ARP Request, and ARP Reply packets, leading to a high CPU usage. OSPF Hello
packets and VRRP heartbeat packets cannot be processed in a timely manner, resulting in
protocol flapping.

Identification Method
1. Run the stp tc-protection command in the system view.
This command ensures that the switch updates entries once every 2 seconds when
receiving a large number of TC BPDUs. This configuration prevents a high CPU usage
caused by frequent updates of MAC address entries and ARP entries.
2. Run the arp topology-change disable and mac-address update arp commands in the
system view.
By default, the switch deletes the MAC address entries and ages out ARP entries after
receiving TC BPDUs. If there are many ARP entries on the switch, ARP entry relearning
triggers a large number of ARP packets on the network. After the arp topology-change
disable and mac-address update arp commands are configured, the switch updates the
outbound interfaces in ARP entries based on the changed outbound interfaces in the
MAC address entries upon network topology changes. The commands prevent
unnecessary ARP entry updates.

The mac-address update arp command has been available since V100R006, and the arp topology-
change disable command has been available since V200R001.

Conclusion
When deploying STP, you are advised to enable TC protection and configure all ports
connected to terminals as edge ports. These measures prevent status change of an interface
from causing flapping and re-convergence of the entire STP network. When this problem
occurs, pay attention to packet loss caused by CPCAR.

1.6.5.3 An MSTP Loop Causes a High CPU Usage

Involved Products and Versions


S5700 V200R001/V200R002/V200R003

Network Diagram
None.

Symptom
On an MSTP network, an S5700 has a high CPU usage.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 63


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Cause Analysis
When network topology recalculation occurs on an MSTP network, a large number of BPDUs
indicating topology changes will be advertised. The switch will then recalculate the topology,
causing a high CPU usage.

Handling Procedure
1. Run the display interface brief command to check interface bandwidth usage.
<HUAWEI> display interface brief
…………
Interface PHY Protocol InUti OutUti inErrors outErrors
GigabitEthernet4/0/1 up up 0.72% 81% 0 0
GigabitEthernet4/0/2 up up 81% 0.73% 2 0

2. Run the display stp tc-bpdu statistics command to check the number of TC and TCN
BPDUs received and sent by each interface. The command output shows that a large
number of TC BPDUs are received.
<HUAWEI> display stp tc-bpdu statistics
-------------------------- STP TC/TCN information --------------------------
MSTID Port TC(Send/Receive) TCN(Send/Receive)
0 GigabitEthernet4/0/1 3/2 0/0
0 GigabitEthernet1/0/10 14/9 0/0

It is difficult to locate the fault that causes topology changes. To resolve the high CPU
usage problem, perform the following operations:
− Run the arp topology-change disable command, so that ARP entries will not be
aged out or deleted when the network topology changes.
− Run the mac-address update arp command, so that the switch will update
outbound interfaces in ARP entries when outbound interfaces in MAC address
entries change.

The mac-address update arp command has been available since V100R006, and the arp topology-
change disable command has been available since V200R001.
After the preceding operations are performed, there is a noticeable decrease in the CPU
usage.

Conclusion and Suggestion


When a high CPU usage is detected on a switch on an MSTP network, check whether the
switch receives a large number of TC packets. If a large number of TC packets are received,
disable the switch from deleting aged ARP entries upon network topology changes, and
enable the switch to update outbound interfaces in ARP entries when outbound interfaces in
MAC address entries change.

1.6.5.4 STP Convergence Is Abnormal When an S9300 Interface Processes


BPDUs

Involved Products and Versions


All versions of modular switches

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 64


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Fault Symptom and Network Diagram


In Figure 1.1, the S9300 (V100R003C00SPC200) and S8500 constitute an RRPP ring, the
S9300 is the master node, and the ports marked in red on the S9300 are the blocked ports. The
S3328 and S5624 have STP enabled, and the S9300 and S8500 transparently transmit STP
BPDUs. STP convergence can be performed on the S3328 (the ports marked in red are the
blocked ports), and STP flapping occurs on the S5624.

Figure 1.1 Abnormal STP convergence when the S9300 ports process BPDUs

Root Cause
The S5624 has the Neighbor Discovery Protocol (NDP) enabled in the system and on all
ports. The bpdu bridge enable command is configured on the S9300 ports connected to the
S8500 and S5600. As a result, NDP packets sent by the S5624 are looped and sent to the CPU
of the S5624. In this case, STP BPDUs cannot be processed normally.

Identification Method
Analyze the configuration of the ports on the S9300 and S8500 that connect to the S5624 and
S3328. The configurations are the same except the allowed VLANs.
The configurations of ports on the S9300 are as follows:
#
interface GigabitEthernet1/0/2
description S5600 1/0/2
port link-type trunk
port trunk pvid vlan 4080
port trunk allow-pass vlan 10 4080
stp disable
l2protocol-tunnel stp enable
bpdu bridge enable
trust upstream default
#

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 65


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

interface GigabitEthernet1/0/14 //RRPP secondary port


description S8500A 3/1/11
port link-type trunk
undo port trunk allow-pass vlan 1
port trunk allow-pass vlan 10 42 4060 4070 4080 4093 to 4094
stp disable
bpdu bridge enable
trust upstream default
#

Reproduce the environment in the lab. After the interconnected ports of the S9300, S8500,
and S5600 become Up, the CPU usage of the S5600 rapidly reaches 100%. Obtain packets on
the S5600 ports. Many NDP packets with the destination MAC address of 0180-c200-000a
are received on the S5600 ports.
NDP is enabled on the S5600 in the system and on ports by default, so ports periodically send
NDP packets. After NDP packets reach the S9300, NDP packets can be forwarded normally,
though the bpdu bridge enable command is used on the primary and secondary ports of the
S9300 and the secondary port is blocked. As a result, NDP packets are looped between the
S8500 and S9300 and forwarded to the S5600 through GE1/0/2. Many NDP packets are sent
to the S5600. As a result, the S5600 cannot process STP BPDUs. NDP is not enabled on the
S3328 in the system and on ports by default. After receiving NDP packets, the S3328 discards
them. The STP convergence is normal.

Solution
Delete the bpdu bridge enable command configuration on the S9300 ports that are connected
to the S8500, S3328, and S5600.

Conclusion
The bpdu enable command on the S9300 V100R002 and the bpdu bridge enable command
in V100R003 and V100R006 are used to enable ports to forward BPDUs. The ports do not
discard the packets that have the destination MAC address as the BPDU MAC address and are
not sent to the CPU for processing. Instead, such packets are forwarded through the hardware.
The bpdu enable or bpdu bridge enable command is not required to implement Layer 2
protocol transparent transmission. You can run the display bpdu mac-address command to
check the BPDU MAC address.
On the S2300 or SS3300&5300 running V100R006 or an earlier version, the bpdu enable
command must be run on ports enabled with Layer 2 transparent transmission, except the
ports with bpdu-tunnel enable or l2protocol-tunnel enable configured; otherwise, the
related Layer 2 packets cannot be sent to the CPU.

1.6.5.5 STP Flapping Occurs Because the STP Timeout Interval on the
ATAE Device Is Incorrectly Calculated

Involved Products and Versions


S switches of all versions

Network Diagram
None.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 66


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Symptom
The switch connects to the ATAE device of an earlier version using STP. On the switch used
as the root bridge, the stp timer hello command is used to set the Hello time to 1s. When the
switch is busy in a short time or a few packets are discarded, STP flapping occurs on the
ATAE device.

Cause Analysis
The timeout interval of the ATAE device of an earlier version is three times the Hello time,
and is irrelevant to the timeout factor. When the Hello time on the root bridge is set to 1s, the
timeout interval on the ATAE device is 3s. When the switch is busy in a short time or a few
packets are discarded, STP flapping easily occurs on the ATAE device.
The timeout interval on the ATAE device of the latest version is changed to be the same as
that on the switch. The timeout interval is calculated using the following formula: Hello time
x Time factor x 3. The default Hello time is 2s and the time factor is 3, so the default timeout
interval is 18s.

Handling Procedure
1. Check whether the stp timer-factor command is configured on the ATAE device of an
earlier version.
2. Check whether the Hello time on the root bridge is 1s. That is, check whether the stp
timer hello 100 command is used. Here, the value 100 refers to 100 centiseconds.
3. During STP flapping, check whether the ATAE device first sends STP BPDUs with the
source MAC address of 00e0-fc09-bc-f9.
Two solutions are available:
Solution 1: Upgrade the version of the ATAE device to the latest version that supports the
time factor.
Solution 2: The ATAE device still uses STP, and the stp timer hello 300 command is used on
the root bridge and secondary root bridge so that the timeout interval of the ATAE device
reaches 9s.

Conclusion
If a switch does not receive any BPDU from the upstream device within the timeout interval,
the switch considers that the upstream device fails and recalculates the spanning tree.
Sometimes, the switch cannot receive BPDUs from the upstream device for a long time
because the upstream device is busy. In this case, the switch should not recalculate the
spanning tree. Therefore, you can set a longer timeout interval on a stable network to save
network resources.
The recommended timer factor is 5 to 7 on a stable network.

1.6.5.6 RSTP Cannot Provide Fast Convergence When the S6500 Port
Connected to the S6500 Changes from Down to Up

Involved Products and Versions


S switches of all versions

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 67


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Network Diagram
In Figure 1.1, two S6500s and the switch constitute an RSTP ring. When the network is
stable, the port connecting the switch and S6500-2 is the blocked port.

Figure 1.1 RSTP cannot provide fast convergence after the connected port between the switch and
S6500 changes from Down to Up

Symptom
Shut down the port on S6500-1 connected to the switch and restore the port to check the
RSTP fast convergence mechanism. After the link between S6500-1 and the switch recovers,
the port on S6500-1 remains in Discarding state and changes to Forwarding state after 30s.

Cause Analysis
Run the debugging stp all command to check whether there is the Agreement flag in the
Flags field. The following information shows that there is only the Proposal flag.
Port50(GigabitEthernet0/0/8) Rcvd Packet(Length: 43)
ProtocolVersionID : 02
BPDUType : 02( RST BPDU )
Flags : 0e( Proposal DESIGNATED )
Root Identifier : 0.000f-e2e0-7425
Root Path Cost : 0
Bridge Identifier : 0.000f-e2e0-7425
Port Identifier : 128.206
Message Age : 0
Max Age : 20
Hello Time : 2
Forward Delay : 15
Version 1 Length : 0

After the port between S6500-1 and the switch goes Up, the Proposal packets sent by S6500-1
do not carry the Agreement. As a result, the port cannot implement fast transition. That is, the
Proposal/Agreement mechanism does not take effect.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 68


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Handling Procedure
Run the stp no-agreement-check command on the port between the switch and S6500.

Conclusion and Suggestion


When a switch connects to a non-Huawei device, run the stp no-agreement-check command
on the switch to configure enhanced or common fast transition based on the
Proposal/Agreement mechanism of the non-Huawei device.

1.6.5.7 Unicast Suppression Causes RRPP Flapping for One Hour

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, after the loop on the RRPP master node is removed, the loop is generated again.
This process repeats.

Figure 1.1 RRPP ring network

Symptom
RRPP flapping keeps for over than one hour. No error (such as interface flapping) is recorded
in the log, and the interfaces on the RRPP ring do not have FCS count.

Cause Analysis
The test result shows that RRPP hello packets are discarded when unknown unicast traffic
volume on an interface increases. The RRPP ring status turns to Failed after three consecutive
packets are discarded, and is recovered when the next hello packet is received. The RRPP ring
status alternates between Failed and Complete.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 69


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Handling Procedure
1. Simulate the live network in the lab. The RRPP status on S3328 is normal.
[119-S3328TP-01]display rrpp verbose domain 1
Domain Index : 1
Control VLAN : major 4091 sub 4092
Protected VLAN : Reference Instance 0
Hello Timer : 1 sec(default is 1 sec) Fail Timer : 3 sec(default is 6 sec)
RRPP Ring :1
Ring Level :0
Node Mode : Master
Ring State : Complete
Is Enabled : Enable Is Actived : Yes
Primary port : GigabitEthernet0/0/1 Port status: UP
Secondary port : GigabitEthernet0/0/2 Port status: BLOCKED
2. Send unknown unicast traffic carrying RRPP control VLAN ID from the tester to S3328.

3. RRPP flapping occurs and the recovery interval is the same as that on the live network.
Jan 2 2008 20:02:48 119-S3328TP-01 %%01RRPP/4/PFWD(l): Domain 1 Ring 1 Port
GigabitEthernet0/0/2 has been set to forwarding state.
#Jan 2 20:02:50 2008 119-S3328TP-01 RRPP/4/RNGDN:1.3.6.1.4.1.2011.5.25.113.4.2:
Domain 1 ring 1 is failed.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 70


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Jan 2 2008 20:02:50 119-S3328TP-01 %%01RRPP/3/FAIL(l): Domain 1 Ring 1 failed.


Jan 2 2008 20:02:50 119-S3328TP-01 %%01RRPP/4/PBLK(l): Domain 1 Ring 1 Port
GigabitEthernet0/0/2 has been set to block state.
#Jan 2 20:02:53 2008 119-S3328TP-01 RRPP/6/RNGUP:1.3.6.1.4.1.2011.5.25.113.4.1:
Domain 1 ring 1 is restored.
#Jan 2 20:03:08 2008 119-S3328TP-01 RRPP/4/RNGDN:1.3.6.1.4.1.2011.5.25.113.4.2:
Domain 1 ring 1 is failed.
Jan 2 2008 20:03:08 119-S3328TP-01 %%01RRPP/4/PFWD(l): Domain 1 Ring 1 Port
GigabitEthernet0/0/2 has been set to forwarding state.
Jan 2 2008 20:03:08 119-S3328TP-01 %%01RRPP/3/FAIL(l): Domain 1 Ring 1 failed.
Jan 2 2008 20:03:08 119-S3328TP-01 %%01RRPP/4/PBLK(l): Domain 1 Ring 1 Port
GigabitEthernet0/0/2 has been set to block state.
4. The configuration of unknown unicast suppression causes RRPP flapping. Run the undo
unicast-suppression command to undo unicast suppression.

Conclusion and Suggestion


Do not configure unknown unicast suppression on the interfaces on an RRPP ring. Otherwise,
RRPP hello packets may be discarded when unknown unicast traffic volume increases. In this
situation, RRPP status is unstable and protocol flapping occurs.

1.6.5.8 Unknown Unicast Suppression Causes RRPP Flapping

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, SwitchA functions as the master node of the RRPP ring. Normally, GE1/0/0 is
the primary interface, and GE2/0/0 is the secondary interface (blocked interface).

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 71


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Figure 1.1 RRPP ring network

Symptom
After the loop on the RRPP master node is removed, the loop is generated again. This process
repeats.

Cause Analysis
Unknown unicast suppression is configured on a switch, and the destination MAC addresses
of RRPP packets are unknown unicast MAC addresses. When the volume of unknown unicast
traffic on an interface increases, RRPP packets are suppressed. As a result, the switch
considers that a fault occurs on the RRPP ring and unblocks the interface to form a loop.

Handling Procedure
1. Run the display rrpp statistics command. The command output shows that the switch
frequently sends and receives Link Down packets and the numbers of Send and Rcv
packets on the master and slave interfaces are different.
2. Check the configurations on the switch. The unicast-suppression command has been
run on the switch to suppress unknown unicast packets.
3. Run the undo unicast-suppression command in the interface view to undo unknown
unicast suppression. The fault is recovered.

Conclusion and Suggestion


If the RRPP, SEP, or ERPS protocol has been configured and destination MAC address of
protocol packets is unicast MAC address, do not configure unknown unicast suppression.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 72


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.5.9 Services on the RRPP Ring Consisting of CX600 and S3300 Are
Interrupted

Involved Products and Versions


S switches of all versions

Network Diagram
In Figure 1.1, a CX600 and two S3300 switches form an RRPP ring. Services on the ring are
interrupted.

Figure 1.1 CX600 and S3300s form an RRPP ring

Symptom
The port on CX600 connected to S3328 alternates between Up and Down, and a loop occurs
on the RRPP ring where the CX600 resides. In addition, the loop also occurs on other RRPP
rings connected to the CX600, causing service interruption. After the problematic RRPP rings
are manually broken, services are recovered.

Cause Analysis
Due to a poor link quality, there is a time difference between the Up time of two ports on
CX600 and S3328 after the fiber is removed and reinstalled. This causes a loss of RRPP
packets. Packets are duplicated into multiple copies and loops are quickly formed. Within a
short period (1s), traffic volume exceeds the chip scheduling capability, and the excess traffic
occupies bandwidth for service and protocol packets. The RRPP ring cannot be recovered,
affecting all RRPP rings connected to the same data VSI. As a result, services on S3300s are
abnormal.

Handling Procedure
1. No priority scheduling configuration is performed on the subinterface transparently
transmitting RRPP packets on CX600. Therefore, the CX600 treats RRPP packets as

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 73


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

common data packets. If traffic volume exceeds the bandwidth, RRPP packets may be
discarded randomly.
The counter information of the traffic scheduling chip TM shows that timeout and error have
occurred, indicating that the traffic volume was huge and exceeds the chip processing
capability. In this situation, many packets will be discarded.
[A83101-CX600X8-01-diagnose]
……
SD587V_ReadReg:(0xc4420414)=0x0000b997( 32318) RBE_TIMEOUT_MCELL_CTR
SD587V_ReadReg:(0xc442046c)=0x0000712e( 43261) RBE TX CHECK ERROR CTR
……

2. When the port of CX600 connected to an RRPP ring goes Down, the master node on the
ring unblocks the secondary port. However, due to the time difference, there is a high
probability that a loop is generated within 1s. The secondary port is blocked as long as
the next RRPP packet is received. However, many RRPP packets are discarded.
Jun 7 2010 15:38:16 A83101-CX600X8-01 %%01PHY/4/PHY_STATUS_UP(l)
[25555]:Slot=1;GigabitEthernet1/1/6 change status to up.

3. When a loop is detected, the related subinterface is blocked and a trap is reported. Loop
detection is a Huawei property protocol, the protocol packets do not carry priority
information, so the packets may be discarded. No all loops have block information.
Jun 7 2010 15:38:28 A83101-CX600X8-01 FLD/4/TRAP:Slot=1;
1.3.6.1.4.1.2011.25.180.2.3 This interface is blocked.(PortIndex = 191,PortName =
GigabitEthernet1/1/3.99)
Jun 7 2010 15:38:29 A83101-CX600X8-01 FLD/4/TRAP:Slot=1;
1.3.6.1.4.1.2011.25.180.2.3 This interface is blocked.(PortIndex = 122,PortName =
GigabitEthernet1/1/9.99)
Jun 7 2010 15:38:30 A83101-CX600X8-01 FLD/4/TRAP:Slot=1;
1.3.6.1.4.1.2011.25.180.2.3 This interface is blocked.(PortIndex = 156,PortName =
GigabitEthernet1/1/5.99)

Conclusion and Suggestion


1. The trust 802.1p and qos wrr queue-index 7 weight 0 commands have been run on the
S3328s' ports on the live network to improve the priority of RRPP packets. Run the trust
upstream default and trust 8021p commands on the CX600's port to ensure preferred
forwarding of RRPP packets.
2. Configure attack defense on CX600 so that the routing protocols can still run normally if
a loop occurs.

1.6.5.10 ERPS Becomes Invalid When RTN Interconnects with an S Switch

Involved Products and Versions


S switches of all versions

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 74


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Network Diagram
In Figure 1.1, to improve network reliability, SwitchA and SwitchB are deployed on an RTN
network. The original chain network between RTNA and RTNB changes to a ring network.
ERPS is enabled on the RTN devices and switches.

Figure 1.1 ERPS becomes invalid when RTN interconnects with an S switch

Symptom
The RTN owner node and interfaces connected to switches are blocked.

Cause Analysis
ERPS packets of the RTN device and switch are different. EtherType is set to 0x8809 in
ERPS packets sent by the RTN device, which does not comply with protocol standards (the
standard value of EtherType is 0x8802). After receiving the ERPS packets, the switch cannot
forward them to upper-layer devices. As a result, the switch-side interface is also blocked.

Handling Procedure
1. Run the display erps verbose command to check for the ERPS interface state.
<HUAWEI> display erps verbose
Ring ID : 1
Description : Ring 1
Control Vlan : 4094
Protected Instance : 1
WTR Timer Setting (min) : 5 Running (s) : 0
Guard Timer Setting (csec) : 50 Running (csec) : 0
Holdoff Timer Setting (deciseconds) : 0 Running (deciseconds) : 0
Ring State : Pending
RAPS_MEL : 7
Time since last topology change : 0 days 0h:31m:36s
-------------------------------------------------------------------------------
-

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 75


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Port Port Role Port Status Signal Status


-------------------------------------------------------------------------------
-
Eth-Trunk1 Common Forwarding Non-failed
GE2/0/36 Common Discarding Non-failed

The RPL owner interface on the RTN side is blocked. GE2/0/36 of SwitchB is also
blocked. The ERPS port states are abnormal.
2. Run the display erps statistics command to view ERPS packet statistics, checking
whether the switch has received packets from RTN.
<SwitchB> display erps statistics
-------------------------------------------------------------------------------
-
Ring Port Directtion SF NR NRRB
-------------------------------------------------------------------------------
-
1 Eth-Trunk1 RX 0 80 0
1 Eth-Trunk1 TX 0 16 0
1 GE2/0/36 RX 0 0 0
1 GE2/0/36 TX 0 11 0

The statistics show that GE2/0/36 has not received ERPS packets from RTNB. Packets
are obtained from GE2/0/36. Analyze whether the RTN device and switch send different
ERPS packets.
The analysis result shows that EtherType is 0x8809 in ERPS packets sent by the RTN
device and 0x8802 in ERPS packets sent by the switch. The standard value of
EtherType should be 0x8802. ERPS implementation of the RTN device does not
comply with standards, and the switch cannot forward ERPS packets sent from the RTN
device, causing a fault.
If permitted by the customer, deploy another loop prevention technology, such as STP.

Conclusion and Suggestion


When designing a solution involving ERPS interconnection, analyze whether ERPS packets
of devices comply with standards to prevent such a fault. When faults related to ERPS
interconnection occur, analyze the ERPS packets first.

1.6.6 Pseudo Loops


1.6.6.1 MAC Address Flapping Occurs But No Loop Is Detected

Involved Products and Versions


S switches of all versions

Network Diagram
None.

Symptom
The switch generates a MAC address flapping alarm. Efforts are made to check for loops, but
the interface where the loop occurs fails to be located. The MAC address flapping problem
cannot be rectified.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 76


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

The alarm information is as follows:


Nov 21 2013 19:29:33 Quidway L2IFPPI/4/MAC_FLAPPING_ALARM:OID
1.3.6.1.4.1.2011.5.25.42.2.1.7.12The mac-address has flap value.
(L2IfPort=0,entPhysicalIndex=0, BaseTrapSeverity=4, BaseTrapProbableCause=549,
BaseTrapEventType=1, MacAdd=5654-4c83-05c0,vlanid=712,
FormerIfDescName=GigabitEthernet1/1/16,CurrentIfDescName=GigabitEthernet1/1/12,Dev
iceName=Quidway)

Alarm information differs for fixed and modular switches based on versions. The following alarm
information is only used as an example.

Cause Analysis
1. A loop exists on the network.
2. There are multiple terminals with the same MAC address.

Handling Procedure
The preceding alarm information shows that the device can learn the same MAC address from
multiple interfaces. There are two possible causes: a loop exists, or there are multiple
terminals using the same MAC address. The second cause can be further divided into two
scenarios: multiple Layer 2 devices using the same MAC address, or multiple user terminals
using the same MAC address.
If there is a loop on the network, the alarm usually involves many MAC addresses. In
addition, traffic is heavy on some interfaces, with a large number of broadcast packets. If you
disable one interface where the alarm is generated, the alarm is cleared. MAC address
flapping occurs regardless of the service traffic volume.
If multiple terminals use the same MAC address, the alarm usually involves only one MAC
address or a small number of MAC addresses, and the statistics show that the number of
received and sent packets is within a normal range. Change the MAC address learning priority
for an interface. If traffic of users connected to this interface becomes abnormal, multiple user
terminals are using the same MAC address. In this case, change the MAC addresses of the
user terminals. If user traffic remains normal, some Layer 2 devices are using the same MAC
address. In this case, check the configuration of the Layer 2 devices and change their MAC
addresses.

Conclusion and Suggestion


MAC address flapping may not be caused by loop. When MAC address flapping occurs,
troubleshoot the fault based on a detailed analysis of the symptoms.
MAC address flapping does not necessarily result from loops, but loops will definitely result
in MAC address flapping.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 77


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.7 Others
1.6.7.1 The S2300SI Configured with Loopback Detection Cannot Detect
Loops

Involved Products and Versions


S2300SI V100R006

Network Diagram
As shown in Figure 1.1, the S2300SI connects to the office network through the S2300EI.

Figure 1.1 Loops cannot be detected by the S2300SI configured with loopback detection

Symptom
The incoming traffic on the interface of the S2300SI connected to the S2300EI increases
continuously. There are loops on the office network, but the S2300SI configured with
loopback detection cannot detect loops.

Cause Analysis
The S2300SI supports loopback detection since V100R006. BPDUs are sent in untagged
mode, so the downstream device is required to transparently transmit BPDUs. The S2300SI
then can detect loops. The S2300EI terminates BPDUs or sends them to the CPU for
processing. That is, LBDT packets sent by an interface of the S2300SI cannot be forwarded
by the S2300EI. As a result, the S2300SI cannot detect loops.

Handling Procedure
The S2300SI is configured with loopback detection. Check whether the downstream device
connected to the S2300SI can transparently transmit BPDUs.

Conclusion and Suggestion


The S2300SI cannot send LBDT packets in tagged mode, so it is recommended that LBDT be
configured on the S2300EI and the S2300E is configured to send LBDT packets in tagged
mode.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 78


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.6.7.2 The OSPF Neighbor Relationship Is Down Due to a Loop on the S


Switch

Involved Products and Versions


S switches of all versions

Network Diagram
None.

Symptom
When a switch establishes an OSPF neighbor relationship with another device, the OSPF
neighbor relationship is often Down.

Cause Analysis
The rates of incoming and outgoing traffic on all Up interfaces of the switch are large. A loop
occurs on the link. As a result, OSPF packets are discarded and the OSPF neighbor
relationship is Down.
%%01IFPDT/4/ABNORMAL_FLOW(D): Interface GigabitEthernet12/0/7's flow is abnormal.
(Speed=1000Mbps, CurrentInSpeed=814Mbps, CurrentOutSpeed=813Mbps,
File=IFPDT_FUNC_C, Line=13072)
%%01IFPDT/4/ABNORMAL_FLOW(D): Interface GigabitEthernet12/0/12's flow is abnormal.
(Speed=1000Mbps, CurrentInSpeed=624Mbps, CurrentOutSpeed=813Mbps,
File=IFPDT_FUNC_C, Line=13072)
%%01IFPDT/4/ABNORMAL_FLOW(D): Interface GigabitEthernet12/0/13's flow is abnormal.
(Speed=1000Mbps, CurrentInSpeed=625Mbps, CurrentOutSpeed=813Mbps,
File=IFPDT_FUNC_C, Line=13072)

Handling Procedure
1. Check the interface traffic. The maximum rates of incoming and outgoing traffic on
interfaces are reached during the problem occurrence.
2. Analyze logs and check whether the traffic rate is abnormal and exceeds the threshold.
3. Check the network configuration and eliminate the loop.

Conclusion
When a loop occurs on the network, many packets are looped back and traffic on many
interfaces is abnormal. If there is heavy incoming and outgoing traffic on some interfaces,
there is a high probability that loops occur.

1.6.7.3 Packet Loss Due to a Loop in Layer 2 Forwarding

Involved Products and Versions


S switches of all versions

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 79


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

Network Diagram
As shown in Figure 1.1, the switch is connected to an enterprise customer through a leased
line. The switch functions as a Layer 2 aggregation switch, and the NE80 functions as the
gateway.

Figure 1.1 Network where layer 2 packet loss occurs due to loops

Symptom
The enterprise customer complains about the slow service response problem. When you ping
an enterprise terminal from the NE80, packet loss occurs.

Cause Analysis
A loop exists on the downstream network attached to GE10/0/6. As a result, the MAC address
of the NE80 flaps between GE10/0/6 and GE12/0/0 of the switch. When GE10/0/6 learns the
MAC address of the NE80, user packets cannot be forwarded to the gateway.

Handling Procedure
1. Enable MAC address flapping detection on the switch and check alarms.

Alarm information differs for fixed and modular switches based on versions. The following alarm
information is only used as an example.
#Jul 28 09:59:34 2012 Switch L2IF/4/mac_flapping_alarm:OID
1.3.6.1.4.1.2011.5.25.42.2.1.7.12The mac-address has flap value .
(BaseTrapSeverity=0, BaseTrapProbableCause=0, BaseTrapEventType=4,
L2IfPort=549,entPhysicalIndex=1, MacAdd=0025-9e03-02f1,vlanid=107,
FormerIfDescName=GigabitEthernet12/0/0,CurrentIfDescName=GigabitEthernet10/0/6,
DeviceName= Switch)

The preceding alarm information indicates that MAC address flapping has occurred.
2. Set the NE80 MAC address to a static MAC address on GE12/0/0.
3. Eliminate the loop on the downstream network connected to GE10/0/6.

Conclusion and Suggestion


To locate Layer 2 packet loss and intermittent disconnection problems, first check whether
MAC address flapping occurs in addition to checking basic factors such as network cables,
optical power of optical modules, and interface status. Then configure a static MAC address
and check whether the problems are resolved. Configuring a static MAC address only
prevents loops. To eliminate loops, configure a loop prevention protocol.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 80


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.7 FAQ
1.7.1 Can a Switch Transparently Transmit BPDUs?
 After the bpdu enable command is run on an interface, the interface sends received
BPDUs to the CPU for processing.
The local device determines whether to process BPDUs of a protocol depending on
whether the protocol is enabled. For example, whether STP BPDUs on an interface are
sent to the CPU depends on whether STP has been enabled on the interface using the stp
enable command.
 After the bpdu disable command is run on an interface, the interface discards BPDUs.
By default, an interface discards received BPDUs.
To configure a switch to transparently transmit BPDUs, enable Layer 2 protocol transparent
transmission on an interface by running the l2protocol-tunnel all enable command in the
interface view. To ensure successful forwarding of packets, configure the default VLAN on
the inbound and outbound interfaces of all devices on the forwarding path.

1.7.2 What Are the Basis for STP Calculation? Will STP Topology
Be Changed When Port Rate Is Changed?
A spanning tree is calculated based on two metrics: ID and path cost.
IDs used in STP calculation include bridge ID (BID) and port ID (PID). On an STP network,
the device with the smallest BID is elected as the root bridge. The port priority affects role
selection in a specified MSTI.
Path cost is a variable used for link selection. STP selects more "robust" links and block
redundant links based on path costs to prune a network into a loop-free tree topology.
Port rate is used for cost calculation. The change of port rate will cause the path cost change,
and trigger STP recalculation.

1.7.3 Does a Switch Support MAC Address Flapping Detection?


Modular and fixed switches support MAC address flapping detection in different situations.
 Modular switches
In V100R002, the switch supports global MAC address flapping detection on all LPUs
except the S series LPUs. When global detection is enabled, the switch can only send
trap messages when MAC address flapping is detected.
In V100R002, run the mac-flapping alarm enable command to enable MAC address
flapping detection.
In V100R003 and later versions, the switch supports VLAN-based MAC address
flapping detection and can perform actions when MAC address flapping is detected.
In V100R003 and later versions, the loop-detect eth-loop alarm-only command can be
run in the system or VLAN view to enable MAC address flapping detection.
By default, global MAC address flapping detection is disabled in V100R003 and enabled
in V100R006 and later versions.
Since V200R001, switches have supported global MAC address flapping detection,
VLAN whitelist, and quit-vlan action.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 81


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

 Fixed switches
Fixed switches (excluding S2700) of V100R003 and later versions do not support global
MAC address flapping detection. They support only VLAN-based MAC address
flapping detection and actions such as sending traps and blocking interfaces when MAC
address flapping is detected.
Run the following command in the VLAN view to enable MAC address flapping
detection:
VLAN view: loop-detect eth-loop alarm-only
Since V200R001, switches have supported global MAC address flapping detection,
VLAN whitelist, and quit-vlan action.

1.7.4 After LDT or LBDT Detects a Loop on an Interface, the


Interface Is Blocked. Can the Blocked Interface Continue to Send
Protocol Packets?
The LDT packet or tagged LBDT packet is a broadcast packet with the destination MAC
address of all Fs, so the blocked interface cannot send the LDT packet or tagged LBDT
packet. The destination MAC address of an untagged LBDT packet is a BPDU MAC address,
so the blocked interface can continue to send the untagged LBDT packet.

1.7.5 Can Loopback Detection Be Used with VLAN Mapping on


an Interface?
On an interface configured with VLAN mapping, if a loop exists in the VLAN before
mapping, the loop cannot be detected.

1.7.6 What Is the Destination MAC Address of SEP Packets?


The destination MAC address of P2P packets is 0025-9efb-3d6f, for example, SEP LSA
packets.
The destination MAC address of broadcast packets in an SEP segment is 0025-9efb-3d70, for
example, SEP edge interface election packets.

1.7.7 How Many Modes Is Available to Block an Interface?


Four modes are available. You can run the block port { sysname sysname interface {
interface-type interface-number | interface-name } | hop hop-id | optimal | middle }
command in the SEP segment view to configure a mode in which an interface is blocked.
The parameters are as follows:
 sysname: specifies the name of the device where the interface to be blocked resides.
 hop: specifies the interface with the specified hop count from the primary edge interface
as the blocked interface.
 optimal: specifies the interface with the highest priority as the blocked interface.
 middle: specifies the interface in the middle of the SEP segment as the blocked interface.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 82


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.7.8 After the SEP Topology Changes, Which Ring Network


Protocols Will Update Their Forwarding Tables?
SEP will request RRPP, Smart Link, STP, and VPLS to update their forwarding tables.
[Quidway-sep-segment1] tc-notify ?
rrpp Rapid ring protection protocol
segment Segment
smart-link Smart-link
stp Spanning Tree Protocol
vpls Virtual Private Switched Network Service

1.7.9 What Is the Destination MAC Address of RRPP Packets?


The following table describes the destination MAC address of RRPP packets.

Packet Type Destination MAC Address

HEALTH (HELLO) 000F-E207-8217


LINK-DOWN 000F-E207-8257
COMMON-FLUSH-FDB 000F-E207-8297
COMPLETE-FLUSH-FDB
EDGE-HELLO 000F-E207-82D7~000F-E207-8316
MAJOR-FAULT EDGE-HELLO and MAJOR-FAULT packets use the
same destination MAC address depending on the ring ID.

1.7.10 What Are the Notes About Configuring RRPP?


You are not advised to configure RRPP and MSTP together. Before creating an RRPP ring,
disable STP on the interfaces that need to be added to the RRPP ring.
The RRPP convergence speed depends on the number of domains and rings. The convergence
is fast when the number of domains and rings is small.

1.7.11 How Does RRPP Implement Fast Switching?


The RRPP switchover time is guaranteed by the switchover mechanism, which is irrelevant to
the interval for sending Hello packets. Although the minimum interval for sending Hello
packets is 1s, Hello packets are used only for loop detection.
The switchover mechanism of an RRPP ring is as follows:
 If a link in the ring is faulty, the interface directly connected to the link goes Down.
 The transit node immediately sends a Link-Down packet to the master node to report the
link status change.
 When receiving the Link-Down packet, the master node considers that the ring fails, so it
unblocks the secondary interface and sends a packet to instruct other transit nodes to
update Forwarding DataBases (FDBs).
 After other transit nodes update their FDBs, data streams are switched to a link in Up
state.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 83


Copyright © Huawei
Technologies Co., Ltd.
Layer 2 Loop Troubleshooting Layer 2 Loop TroubleshootingLayer 2 Loop
Layer 2 Loop Troubleshooting Troubleshooting

1.7.12 Why Does the display Command Not Display Statistics


About Health Packets on an RRPP Transit Node?
Because the RRPP transit node does not send Health packets, the statistics on Health packets
are 0. There are statistics on Health packets on the master node. Check whether received and
sent Health packets are normal according to the statistics on Health packets on the master
node.

You can run the display rrpp statistics domain domain-id command to check statistics on RRPP
packets.

1.7.13 How Is Load Balancing Implemented When RRPP Is


Deployed?
On an RRPP network, for the protected instance, each ring supports only one blocked port, so
load balancing cannot be implemented in each ring. You can configure two domains and add
interfaces to the two domains so that different blocked ports in two domains are used,
implementing load balancing.

1.7.14 What Is the Maximum Number of Devices That Can Be


Deployed in an RRPP Ring?
Primary and secondary ports of the master node send Hello packets. If the secondary port
periodically receives Hello packets, the master node considers the RRPP ring in Complete
state and blocks the secondary port to eliminate loops. Hello packets are forwarded based on
the chip, so the forwarding speed is fast. The maximum number of devices that can be
configured on an RRPP ring is not limited. When many devices are configured in an RRPP
ring, it takes a long time to rectify any link or node fault. It is recommended that a maximum
of 16 devices be configured in an RRPP ring.

1.7.15 Do S Series Switches Support Sub-rings?


V200R001 supports only single-ring network in which downlink sub-rings cannot be used.

1.7.16 Can ERPS Be Used with Other Ring Network Protocols on


the Same Network?
ERPS cannot be used with other ring network protocols on the same network.

1.7.17 Does ERPS on S Series Switches Support Load Balancing?


One or two ERPS rings can be configured in a physical ring. You can configure different
protected instances for different ERPS rings. Each blocked interface is valid for only the
VLAN protected in the local ERPS ring. Data of different VLANs is transmitted through
different paths. This implements load balancing and link backup.

1.7.18 Can ERPS Be Configured on an Eth-Trunk?


ERPS can be configured on an Eth-Trunk.

Issue 01 (2016-10-25) Huawei Proprietary and Confidential 84


Copyright © Huawei
Technologies Co., Ltd.

You might also like