MyPower Switch Manual SEO

MyPower Switch Technical
Manual
Maipu Communication Technology Co., Ltd

No. 16, Jiuxing Avenue
Hi-tech Park
Chengdu, Sichuan Province
Peoples Republic of China - 610041
Tel: (86) 28-85148850, 85148041
Fax: (86) 28-85148948, 85148139
URL: http:// www.maipu.com
Email: overseas@maipu.com
Maipu Confidential & Proprietary Information Page 1 of 628

MyPower Switch Technical Manual
All rights reserved. Printed in the Peoples Republic of China.
No part of this document may be reproduced, transmitted, transcribed, stored in a retrieval system, or translated into any
language or computer language, in any form or by any means, electronic, mechanical, magnetic, optical, chemical, manual
or otherwise without the prior written consent of Maipu Communication Technology Co., Ltd.
Maipu makes no representations or warranties with respect to this document contents and specifically disclaims any implied
warranties of merchantability or fitness for any specific purpose. Further, Maipu reserves the right to revise this document
and to make changes from time to time in its content without being obligated to notify any person of such revisions or
changes.
Maipu values and appreciates comments you may have concerning our products or this document. Please address comments
to:
Maipu Communication Technology Co., Ltd

No. 16, Jiuxing Avenue
Hi-tech Park
Chengdu, Sichuan Province
Peoples Republic of China - 610041
Tel: (86) 28-85148850, 85148041
Fax: (86) 28-85148948, 85148139
URL: http:// www.maipu.com
Email: overseas@maipu.com
All other products or services mentioned herein may be registered trademarks, trademarks, or service marks of their
respective manufacturers, companies, or organizations.

Maipu Feedback Form

Your opinion helps us improve the quality of our product documentation
and offer better services. Please fax your comments and suggestions to
(86) 28-85148948, 85148139 or email to overseas@maipu.com.
Document Title MyPower Switch Technical Manual

Product Document 3.0
Version Revision
Number
Evaluate this Presentation:
document (Introductions, procedures, illustrations, completeness, arrangement, appearance)
Good Fair Average Poor
Accessibility:
(Contents, index, headings, numbering)
Editorial:
(Language, vocabulary, readability, clarity, technical accuracy, content)
Your Please check suggestions to improve this document:
suggestions to Improve introduction Make more concise
improve the
Improve Contents Add more step-by-step procedures/tutorials
document
Improve arrangement Add more technical information
Include images Make it less technical
Add more detail Improve index
If you wish to be contacted, complete the following:

Name Company
Postcode Address
Telephone E-mail

Contents
Overview................................................................................................ 16
OSI Model ............................................................................................................ 16
Physical Layer .................................................................................................................... 17
Data Link Layer .................................................................................................................. 17
Network Layer.................................................................................................................... 18
Transmission Layer ............................................................................................................. 19
Session Layer..................................................................................................................... 19
Representation Layer .......................................................................................................... 19
Application Layer ................................................................................................................ 20
Application of OSI Model ....................................................................................... 20

Use Ping Command .............................................................................................. 21
Simple Ping ........................................................................................................................ 21
Expanded Ping ................................................................................................................... 22
System Displayed Information ............................................................................... 22

show process ..................................................................................................................... 22
show cpu ........................................................................................................................... 24
show stack......................................................................................................................... 26
show semaphore ................................................................................................................ 27
show memory .................................................................................................................... 28
show arp............................................................................................................................ 29
show ip socket.................................................................................................................... 29
show pool .......................................................................................................................... 30
netstat -m.......................................................................................................................... 38
show ip statistics ................................................................................................................ 39
show ip icmpstate............................................................................................................... 40
Switch Principles ................................................................................... 41

Development of the Switching Technology .............................................................. 41
Basic Working Principle of the Switch...................................................................... 42
Frame Forwarding .............................................................................................................. 43
Address Learning Process .................................................................................................... 44
Multiple Layer Switching Technology ...................................................................... 46

Comparison Between the Switch and Other Network Communication Products .......... 47
Switch and the Switch Hub .................................................................................................. 47

Switch and Router .............................................................................................................. 48
VLAN Technology................................................................................... 50
Overview and Principle .......................................................................................... 50
Overview ........................................................................................................................... 50
VLAN Principle .................................................................................................................... 51
VLAN Division ....................................................................................................... 51

Port-Based VLAN ................................................................................................................ 52
MAC-based VLAN................................................................................................................ 53
IP subnet-based VLAN ........................................................................................................ 53
Protocol-based VLAN........................................................................................................... 54
Typical Application ................................................................................................ 54

Link Aggregation ................................................................................... 55
Link Aggregation .................................................................................................. 55
Terms of the Link Aggregation ............................................................................................. 55
Functions of the Link Aggregation ........................................................................................ 56
LACP Protocol ..................................................................................................................... 56
Classification of Link Aggregation ........................................................................... 57

Manual Aggregation ............................................................................................................ 57
LACP Protocol Aggregation .................................................................................................. 58

MSTP ...................................................................................................... 62
STP Overview....................................................................................................... 62
RSTP Overview..................................................................................................... 63
MSTP Protocol ...................................................................................................... 64
Terms................................................................................................................................ 64
Introduction to the Protocol ................................................................................................. 65
MSTP Protection Function ...................................................................................... 67

BPDU Protection ................................................................................................................. 67
Root Protection................................................................................................................... 68
Loop Protection .................................................................................................................. 68
MSTP Typical Application ....................................................................................... 69

QinQ Technology.................................................................................... 71
New Requirements of Service Development ............................................................ 71
QinQ Supports Multiple Services............................................................................. 72
Realizing Modes of QinQ ........................................................................................ 73
Introduction to QinQ Application Scene ................................................................... 74
L2 Protocol Control Technology............................................................. 76
L2 protocol control Theory ..................................................................................... 76
L2 Protocol Tunnel .............................................................................................................. 76
L2 Protocol Discard ............................................................................................................. 77

L2 Protocol Peer ................................................................................................................. 77

L2 protocol Control Supports EVC Application........................................................................ 77
Realize L2 protocol control ..................................................................................... 78

Realize L2 Protocol Tunnel ................................................................................................... 78
Realize L2 Protocol Discard .................................................................................................. 78
Realize L2 Protocol Peer ...................................................................................................... 78

L2 Multicast............................................................................................ 80
Public Part of L2 Multicast ...................................................................................... 80
Terms................................................................................................................................ 80
Introduction ....................................................................................................................... 81
L2 Static Multicast and Its Application ..................................................................... 82

Terms................................................................................................................................ 82
Introduction ....................................................................................................................... 82
Typical Application .............................................................................................................. 83
IGMP Snooping and Its Application ......................................................................... 83

Terms................................................................................................................................ 84
Introduction ....................................................................................................................... 84
IGMP Proxy and Its Application .............................................................................. 87

Terms................................................................................................................................ 88
Introduction ....................................................................................................................... 88
MVR and Its Application......................................................................................... 89

Terms................................................................................................................................ 90
Introduction ....................................................................................................................... 90
MVP and Its Application ......................................................................................... 92

Terms................................................................................................................................ 92
Introduction ....................................................................................................................... 92
Security Technology .............................................................................. 95

802.1X Protocol and Application ............................................................................. 95
Related Terms .................................................................................................................... 96
Introduction ....................................................................................................................... 96
Typical Application ............................................................................................................ 106
DHCP Snooping and Its Application ...................................................................... 108

Related Terms .................................................................................................................. 109
Introduction ..................................................................................................................... 109
IP Source Guard and Its Application ..................................................................... 113

Related Terms .................................................................................................................. 114

Introduction ..................................................................................................................... 114

Key Points for Realization .................................................................................................. 114
Dynamic ARP Detection and Application................................................................ 116

Related Terms .................................................................................................................. 117
Introduction ..................................................................................................................... 117
Port Security ...................................................................................................... 120

Introduction ..................................................................................................................... 121
Port Monitoring ................................................................................................... 122

Introduction ..................................................................................................................... 122
Port Isolation...................................................................................................... 123

Related Terms .................................................................................................................. 124
Introduction ..................................................................................................................... 124
SPAN Technology................................................................................. 126

SPAN Technology ............................................................................................... 126
Related Terms of SPAN Technology.................................................................................... 126
IPv4 Unicast Routing ........................................................................... 132

Introduction to the IPv4 Unicast Routing ............................................................... 132
Static Routing Protocol ........................................................................................ 133
Introduction to the Static Route ......................................................................................... 134
Typical Application of the Static Route ................................................................................ 135
Troubleshooting of the Static Route.................................................................................... 136
M-VRF ............................................................................................................... 137

Terms of M-VRF ............................................................................................................... 137
Introduction to M-VRF ....................................................................................................... 138
Load Balancing ................................................................................................... 138

Types of Load Balancing.................................................................................................... 138
Modes of Load Balancing ................................................................................................... 139
Switching Types and Load Balancing .................................................................................. 139
RIP Dynamic Routing Protocol.............................................................................. 139

Terms of RIP Protocol........................................................................................................ 140
Introduction to the RIP Protocol ......................................................................................... 140
IRMP Dynamic Routing Protocol ........................................................................... 151

Related Terms of IRMP Protocol ......................................................................................... 151
Introduction to IRMP Protocol ............................................................................................ 151
IRMP Types ...................................................................................................................... 152

Different TLV Defined in IRMP ............................................................................................ 152

IRMP Unicast and Multicast Sending (Multicast Address 224.0.0.10) ..................................... 152
IRMP Packet Format (Take One IP Packet with IRMP Data as an Example) ............................ 153
OSPF Dynamic Routing Protocol ........................................................................... 153

Terms of OSPF Protocol..................................................................................................... 154
Introduction to OSPF......................................................................................................... 156
OSFP Features.................................................................................................................. 176
IS-IS Dynamic Routing Protocol ........................................................................... 179

Terms of IS-IS Protocol ..................................................................................................... 179
Introduction to the IS-IS Protocol....................................................................................... 180
Typical Application of the IS-IS Protocol.............................................................................. 189
BGP Dynamic Routing Protocol............................................................................. 192

Terms of BGP Protocol ...................................................................................................... 192
Introduction to the BGP Protocol ........................................................................................ 193
ACL Technology ................................................................................... 209

ACL Introduction and Application.......................................................................... 209
Basic Concepts of ACL....................................................................................................... 209
ACL Classification.............................................................................................................. 211
Introduction to Action Group................................................................................ 214

Introduction to IP+MAC Binding ........................................................................... 214
Introduction to Traffic Meter ................................................................................ 214
Related Terms .................................................................................................................. 214
Introduction to Traffic Meter .............................................................................................. 215
Introduction to Time Domain ............................................................................... 215

Related Terms .................................................................................................................. 215
Introduction to Time Domain ............................................................................................. 216
QoS Technology ................................................................................... 217

Priority Mapping ................................................................................................. 217
Related Terms .................................................................................................................. 217
Introduction to Priority Mapping ......................................................................................... 219
Queue Scheduling Mode ...................................................................................... 219

Related Terms .................................................................................................................. 219
Introduction to Queue Scheduling Mode ............................................................................. 220
Drop Mode ......................................................................................................... 221

Related Terms .................................................................................................................. 221
Introduction to Drop Mode ................................................................................................ 222
Speed Restriction................................................................................................ 222

Flow Shaping...................................................................................................... 223

VLAN-based Traffic Shaping................................................................................. 223

AAA Technology ................................................................................... 225
AAA Terms......................................................................................................... 225
Basic Theory of AAA............................................................................................ 226
Introduction to RADIUS....................................................................................... 227
Introduction to TACACS....................................................................................... 229
Introduction to ID Authentication Mechanism ........................................................ 231
Login Authentication ......................................................................................................... 231
Authenticate in Privileged Mode ......................................................................................... 232
EIPS Technology .................................................................................. 233

Sub Ring Mode EIPS ........................................................................................... 233
Basic Concepts of EIPS...................................................................................................... 233
EIPS Packet Format .......................................................................................................... 237
Basic Theory of EIPS ......................................................................................................... 240
EIPS Typical Application .................................................................................................... 245
Hierarchical EIPS ................................................................................................ 246

Basic Concepts and Abbreviations ...................................................................................... 246
Basic Network Topology of EIPS......................................................................................... 248
Port and Protocol Packets on Ring ...................................................................................... 253
EIPS Protocol Mechanism .................................................................................................. 257
Extended Functions............................................................................................. 262

Payload Balance Function .................................................................................................. 263
Topology Auto Collection Function ...................................................................................... 264
Networking Mode of Not Sending HELLO ............................................................................ 268
Uni-directional Detection Function ...................................................................................... 268
Reliability Realization ........................................................................................................ 270
ULFD Technology ................................................................................. 273

ULFD Protocol and Application.............................................................................. 273
Related Terms of ULFD Protocol ......................................................................................... 273
Introduction to ULFD Protocol ............................................................................................ 275
OAM Technology .................................................................................. 281

CFM Protocol and Application ............................................................................... 281
Terms of Ethernet CFM ..................................................................................................... 281
Introduction to Ethernet CFM Protocol ................................................................................ 282
E-LMI Protocol and Application ............................................................................. 292

Terms of E-LMI Protocol .................................................................................................... 293
Introduction to E-LMI Protocol ........................................................................................... 293
Definition of E-LMI Protocol................................................................................................ 293
Relation between E-LMI Protocol and 802.1a ...................................................................... 296
UNI-N End of E-LMI .......................................................................................................... 296

UNI-C of E-LMI ................................................................................................................. 298

Typical Applications........................................................................................................... 298
Ethernet OAM Protocol and Application ................................................................. 299

Related Terms of Ethernet OAM Protocol ............................................................................ 299
Introduction to Ethernet OAM Protocol................................................................................ 299
EVC Technology ................................................................................... 311

Related Terms .................................................................................................... 311
Application Description ........................................................................................ 312
Typical Application .............................................................................................. 315
LLDP Technology ................................................................................. 316
Overview ........................................................................................................... 316
LLDP Working Mechanism ................................................................................... 316
LLDPDU Transmitting Mechanism....................................................................................... 317
LLDPDU Receiving Mechanism ........................................................................................... 317
TLV Information Type ......................................................................................... 318

Basic Management TLV ..................................................................................................... 318
TLV Defined by Organization.............................................................................................. 319
Related TLV of LLDP-MED.................................................................................................. 319
Neighbor Storage Capability of LLDP..................................................................... 320

Typical Application of LLDP .................................................................................. 320
MAC Address Table Management Technology..................................... 322
Management and Application of MAC Address Table............................................... 322
Related Terms .................................................................................................................. 322
Introduction ..................................................................................................................... 323
PWE3 Technology (Only for S3400/S3900) ....................................... 325

Basic Concepts ................................................................................................... 325
Background of TDM Circuit Emulation Technology ............................................................... 326
Related Technology Standards........................................................................................... 326
Commonly-used Terms..................................................................................................... 327
Technical Theory ................................................................................................ 327

TDM PWE3 Technical Scheme............................................................................................ 328
Other Technical Schemes .................................................................................................. 331
Key Technologies.............................................................................................................. 331
Realizing Methods ............................................................................................... 334

PWE3 Packet Format......................................................................................................... 334
SAToP Protocol ................................................................................................................. 336
CESoPSN Protocol............................................................................................................. 337
HDLC Mode ...................................................................................................................... 339
Technology of Recovering Clock from Circuit Emulation packet ............................................. 340
PWE3 Typical Application ..................................................................................... 342

Performance Test Result ................................................................................................... 343

Loopback Detection Technology.......................................................... 344

Introduction to Loopback Detection ...................................................................... 344
Related Terms of Loopback Detection Protocol .................................................................... 344
Introduction to Loopback Detection Protocol ....................................................................... 344
Super VLAN Technology ...................................................................... 348

Super-VLAN Theory ............................................................................................ 348
Super-VLAN Realization....................................................................................... 349
L3 Multicast Technology ...................................................................... 352
Introduction to Multicast...................................................................................... 352
Related Terms of IP Multicast............................................................................................. 353
IP Multicast Address.......................................................................................................... 354
IP Multicast Features......................................................................................................... 355
IP Multicast Routing Protocol.............................................................................................. 356
IP Multicast Application...................................................................................................... 359
Related Terms of IGMP Protocol ........................................................................... 359

Introduction to IGMP Protocol .............................................................................. 360
IGMP Protocol Theory........................................................................................................ 360
IGMP V1 .......................................................................................................................... 361
IGMP V2 .......................................................................................................................... 362
Inter-operation of V1 and V2 ............................................................................................. 364
IGMP V3 .......................................................................................................................... 365
Related Terms of PIM-SM Protocol........................................................................ 371

Introduction to PIM-SM Protocol........................................................................... 371
Basic Hierarchy of PIM-SM in TCP/IP Protocol Stack............................................................. 372
PIM-SM Protocol ............................................................................................................... 372
Introduction to PIM-DM Protocol .......................................................................... 376

PIM-DM Protocol............................................................................................................... 377
Introduction to MSDP Protocol ............................................................................. 380

Overview ......................................................................................................................... 380
Setup of MSDP peer.......................................................................................................... 381
Sending of Source Active Message ..................................................................................... 381
MSDP Application.............................................................................................................. 381
MPLS Technology................................................................................. 385

Terms of MPLS Protocol....................................................................................... 385
Introduction to MPLS........................................................................................... 386
MPLS Architecture............................................................................................... 386
Separation of Control and Forwarding................................................................................. 386
Forwarding Equivalence Class ............................................................................................ 387
Label Encapsulation and Label Operation ............................................................................ 388

MPLS Network Structure and Forwarding Process ................................................................ 389

Penultimate Hop Popping Mechanism ................................................................................. 390
Introduction to the LDP Protocol........................................................................... 391

Basic Concepts of LDP....................................................................................................... 391
LDP Working Process ........................................................................................................ 392
LDP Message Type and Format .......................................................................................... 397
BGP/MPLS VPN................................................................................................... 408

Concepts and Terms of BGP/MPLS VPN .............................................................................. 408
BGP/MPLS VPN Network Structure ..................................................................................... 409
BGP/MPLS VPN Cross-Domain ........................................................................................... 410
MPLS VPN User Accesses Internet ...................................................................................... 413
Introduction to CSC .......................................................................................................... 417
MPLS L2VPN....................................................................................................... 420

Terms.............................................................................................................................. 420
Basic Concepts ................................................................................................................. 420
VPWS .............................................................................................................................. 421
Point-to-Multipoint Connection (VPLS) ................................................................................ 423
Comparison between VPLS and VPWS................................................................................ 434
MPLS Traffic Engineering ..................................................................................... 435

Ground of MPLS Traffic Engineering.................................................................................... 436
Releasing MPLS-TE Network Topology Information .............................................................. 437
MPLS-TE Tunnel Path Calculation (CSPF) ............................................................................ 439
Creating MPLS-TE Tunnel Path........................................................................................... 439
Forwarding Traffic on MPLS-TE Tunnel................................................................................ 441
Tunnel Protection.............................................................................................................. 442
Graceful Restart ............................................................................................................... 445
MPLS OAM ......................................................................................................... 446

Introduction to MPLS OAM................................................................................................. 446
MPLS OAM Technology...................................................................................................... 447
IPv6 Network Protocol Technology..................................................... 450

Overview ........................................................................................................... 450
IPv6 Packet Format............................................................................................. 451
ICMPv6 Protocol ................................................................................................. 452
IPv6 Address Discovery Protocol .......................................................................... 454
Functions of Neighbor Discovery Protocol............................................................................ 459
IPv6 Address ...................................................................................................... 462

IPv6 Addressing Model ........................................................................................ 463
IPv6 Address Type .............................................................................................. 464
Unicast ............................................................................................................................ 464
Multicast .......................................................................................................................... 469
Any-cast .......................................................................................................................... 470

IPv6 Extension Header ........................................................................................ 471

Extension Header ............................................................................................................. 471
Usage of Extension Header................................................................................................ 472
Extension Header ID ......................................................................................................... 473
Extension Header Order .................................................................................................... 473
Options............................................................................................................................ 475
Hop-by-hop Extension Header ........................................................................................... 476
Routing Extension Header ................................................................................................. 476
Fragment Extension Header .............................................................................................. 478
Destination Extension Header ............................................................................................ 479
GRE Technology ................................................................................... 480

Terms................................................................................................................ 480
Introduction to the Protocol ................................................................................. 480
Location of GRE in the TCP/IP Protocol Stack ...................................................................... 481
Structure of the GRE Packet .............................................................................................. 481
Work Flow of the GRE ....................................................................................................... 483
Advantage and Disadvantage of GRE ................................................................................. 485

Transition Technology ......................................................................... 487
Introduction to the Transition Technology ............................................................. 487
Tunnel Technology.............................................................................................. 488
SLA Technology ................................................................................... 490
Introduction to SLA ............................................................................................. 490
SLA Terms ....................................................................................................................... 490
Introduction to SLA........................................................................................................... 491
RTR Entity........................................................................................................................ 492
RTR Group ....................................................................................................................... 504
RTR Schedule................................................................................................................... 504
Debug Commands and Debug Information ........................................................... 505

show rtr entity.................................................................................................................. 505
show rtr group ................................................................................................................. 510
show rtr schedule ............................................................................................................. 510
show rtr history ................................................................................................................ 511
SLA Debug Commands ..................................................................................................... 513
VRRP Technology................................................................................. 515

Related Terms of VRRP Protocol ........................................................................... 515
Introduction to VRRP Protocol .............................................................................. 515
Basic Hierarchy of VRRP in TCP/IP...................................................................................... 516
Structure of VRRP Packet .................................................................................................. 516
VRRP Workflow ................................................................................................................ 517
VRRP Features ................................................................................................................. 520

Debug Commands and Debug Information ........................................................... 520

VBRP Technology................................................................................. 522
VBRP Protocol Terms .......................................................................................... 522
Introduction to VBRP Protocol .............................................................................. 522
Basic Hierarchy of VBRP in TCP/IP...................................................................................... 523
VBRP Packet Format ......................................................................................................... 523
VBRP Workflow ................................................................................................................ 525
VBRP Functions ................................................................................................................ 527
Debug Command and Debug Information............................................................. 527

IPFIX Technology ................................................................................ 531
Overview ........................................................................................................... 531
Terms................................................................................................................ 531
Introduction to the Principle ................................................................................. 532
IPFIX Working Flow .......................................................................................................... 532
IPFIX Restrictions ............................................................................................................. 533
IPFIX Packet Structure ...................................................................................................... 533
Port Isolation Technology ................................................................... 538

Configure Port Isolation ....................................................................................... 538
Introduction to Port Isolation ............................................................................................. 538
Port Isolation Application ................................................................................................... 539
IPv6 Unicast Routing ........................................................................... 540

IPv6 RIPng Dynamic Routing Protocol................................................................... 540
Terms of IPv6 RIPng Protocol ............................................................................................ 540
Introduction to IPv6 RIPng Protocol.................................................................................... 541
Basic Work Principle of IPv6 RIPng Protocol ........................................................................ 544
Status Transition of IPv6 RIPng Protocol Route Entry and Related Timer ............................... 548
Avoidance of IPv6 RIPng Protocol Route Loop ..................................................................... 549
IPv6 OSPFv3 Dynamic Routing Protocol ................................................................ 551

Terms of OSPFv3 Protocol ................................................................................................. 551
Introduction to the OSPFv3 Protocol ................................................................................... 553
IPv6 IS-IS Dynamic Routing Protocol.................................................................... 577

Terms of IPv6 IS-IS Protocol ............................................................................................. 578
Introduction to IPv6 IS-IS Protocol..................................................................................... 579
Route Learning of IPv6 IS-IS Protocol in Single-Topology..................................................... 579
IS-IS Multi-Topology ......................................................................................................... 580
IPv6 BGP4+ Dynamic Routing Protocol ................................................................. 584

Terms of IPv6 BGP4+ Protocol........................................................................................... 584
Introduction to IPv6 BGP4+ Protocol .................................................................................. 584
GVRP Technology................................................................................. 601

GVRP Overview and GARP Principle ...................................................................... 601
GVRP Overview ................................................................................................................ 601

GARP Principle.................................................................................................................. 602
Implementation of GVRP ..................................................................................... 605

Private VLAN Technology .................................................................... 608
Related Terms of Private VLAN Protocol .............................................................................. 608
Introduction to Private VLAN Protocol ................................................................................. 609
Typical Application of Private VLAN..................................................................................... 610
Voice VLAN Technology ....................................................................... 612

Related Terms of Voice VLAN Protocol ................................................................................ 612
Introduction to Voice VLAN ................................................................................................ 612
Ports Cooperating with IP Telephone Sending tagged Voice Flow .......................................... 614
Ports Cooperating with IP Telephone Sending untagged Voice Flow ...................................... 615
Precautions ...................................................................................................................... 616
Typical Application of Voice VLAN ....................................................................................... 617
Neighbor Discovery Technology .......................................................... 618

NDSP and Relevant Terms................................................................................... 618
Introduction to NDSP .......................................................................................... 618
MFF Technology ................................................................................... 620
MFF Technology.................................................................................................. 620
MFF Terms ....................................................................................................................... 621
PPPoE+ Technology............................................................................. 626

PPPoE+ Principle................................................................................................. 626
PPPOE+ Typical Application ................................................................................. 628

Overview
This document describes the basic principles and major functions of

protocol modules. It also analyzes the debugging information through
specific instances. The implementation is based on the OSI model.
Therefore, this chapter focuses on the Open Systems Interconnection (OSI)
to help understand the following chapters.
Main contents:
OSI model
Application of OSI model
Use ping command
System displayed information
OSI Model
The OSI model is composed of seven layers: physical layer, data link layer,
network layer, transmission layer, session layer, presentation layer, and
application layer (see figure 1-1). Each layer processes specific
communication tasks. It exchanges data with the next layer of the protocol
stack through the protocol-based communication. The communication
between two network devices is implemented through the transfer of data
in the protocol stack of the devices. For example, if a workstation wants to
communicate with a server, the task starts from the application layer of
the workstation, certain information formatted by the lower layer, and
then the data reaches the physical layer. Then, the data is transmitted to
the server through the network. The server obtains information from the
physical layer of the protocol stack. Then, sends information to the upper
layer to explain the information until the information reaches the
application layer. Each layer can be called as the name and can be
identified thr9ough the location in the protocol stack. For example, the
bottom layer can be called as the physical layer of the first layer.
Application layer (layer 7)

Representation layer (layer 6)
Session layer (layer 5)
Transmission layer (layer 4)

Network layer (layer 3)

Data link layer (layer 2)
Physical layer (layer 1)
Figure 1-1 OSI model
The function implemented at the bottom layer is relevant with the physical
communication, for example, frame creation, transmission of packet-
contained signals; the middle layer coordinates the network
communication between nodes: ensure uninterrupted session, and error-
free communication The work of the highest layer affect the application
and data representation of the software, including data format, encryption,
data and file transmission management. Generally, these layers are called
protocol stack.
Physical Layer
The bottom layer of the OSI model is the physical layer. It includes the
following items:
Data transmission medium (cable, fiber, radio, and microwave)
Network plug
Network topology
Signaling and coding method
Data transmission device
Signaling error detection
The physical layer device transmits and receives signals containing data, it
should generate, carry, and check the voltage. In the signal transmission,
the physical layer processes the data transmission rate, monitors the data
error frequency, and processes the voltage and electrical level.
Data Link Layer

The function of data link layer in the LAN is to create frames. Each frame
is formatted in the specific mode. As a result, the data transmission can be
synchronized and identified. The layer will format data to serve as the
electrical signals sent to the transmission node through the frame code.
The receiving node decodes the data and detects the errors. The data link
layer creates the so-called data link frame, including the domain
composed of address and control information, as shown in the following:
Start point of the frame
The address of the device sending the frame (source address)

The address of the device receiving the frame (source address)
Management or communication control information.
Data
Error detection information.
Packet tail (or frame tail) tag.
If the communication is established between two nodes, the data link

layers of them are connected physically (through the physical layer) and
logically (through the protocol). The communication is established by the
transmission of the short signal set for data stream timing. Once the link is
established, the data link layer of the receiving end decodes the signals
into independent frames. The data link layer checks the received signals to
prevent receiving repeated, incorrect, or incomplete data. If any error is
detected, the error will be processed accordingly: the receiving end
discards the packets or the sending end retransmits the packets. The error
detection of the data link is performed by the Cyclic Redundancy Check
(CRC). The CRC is a kind of error detection method. It calculates a value
for the information domain (SOF, addressing method, control information,
data CRC and EOF). The value is inserted to the end of the frame in the
sending node by the data link layer. When the data link layer transmits the
frame to the upper layer, the value can ensure that the frame content is
the same as the sent content.
Network Layer
In the protocol stack, the third layer from the bottom is the network layer.
All networks are composed of physical route (cable path) and logical route
(software path). The network layer reads the packet protocol address
information and forwards each packet along the best path (including
physical and logical paths) to transmit data effectively. In this layer, the
packets can be sent from one network to another through the router. The
path of the network layer control packets is similar to the traffic controller.
It routes the packets through the most effective path. To determine the
best path, the network layer needs to collect the information about
network and node addresses. The process is called discovery.
The network layer can route data on different paths by creating virtual
(logical) circuit. The virtual circuit is a logical communication path for
sending and receiving data. The virtual circuit is for the network layer only.
The network layer manages the data along multiple virtual circuits. Then,
when the data is reached, the wrong sequence may occur. The network
layer checks the data sequence before the data is transmitted to the next
layer. If necessary, correct the sequence. The network layer needs to
adjust the size of the frame to meet the requirements of the receiving
network.

Transmission Layer
Similar to the data link layer and the network layer, the function of the
transmission layer is to ensure the reliable transmission of data from the
sending node to the destination node. For example, the transmission layer
ensures that the data is sent and received in the same sequence. The
receiving node will returns response after the transmission. When the
virtual link is adopted in the network, the transmission layer is also
responsible for tracing the ID specified to each circuit. The ID is called port,
connection ID, or socket, which is specified by the session layer. The
transmission layer needs to determine the level of the packet error
detection. The highest level can ensure that the packets can be
transmitted to from one node to another without any error in the tolerable
time.
Another function of the transmission layer is to divide messages into minor

units when the network uses different protocols with different packet size.
The data unit divided by the transmission layer in the transmission
network is combined by the transmission layer for the interpretation of the
network layer.
Session Layer
The session layer is responsible for establishing and maintaining the
communication link between two nodes. It also determines correct
sequence for the communication between nodes. For example, it can
determine the first transmission node. The session layer can also
determine the transmission distance and how to restore from the
transmission error. If the transmission layer is interrupted in the lower
layer, the session layer will try to re-establish the communication.
Representation Layer
This layer processes the data formatting problem. Different software
applications use different data formatting scheme. Therefore, the data
formatting is necessary. To some degree, the representation layer is
similar to the syntax checker. It ensures that the numbers and texts can
be sent in the format that can be recognized by the receiving node. For
example, the data sent from the IBM mainframe may use the EBCDIC
character formats. For the workstation running Window 95 or Windows98
can read the information, the data must be expressed in ASCII character
format.

The representation is also responsible for data encryption and data

compression.
Application Layer
The application layer is the highest layer of the OSI model. It controls the
access to the application programs and network services. The network
services include file transmission, file management, remote access to file
and printer, email, and terminal simulation. The programmer uses the
layer to connect the workstation to the network service, for example,
connect the application link to the email, or provide database access on
the network.
Application of OSI Model

We take examples to demonstrate the hierarchical communication.
Assume a workstation wants to access the shared drives of the server in
another network. In the workstation, the redirector of the application layer
locates the shared drives. The representation layer can determine the
workstation and server to use ASCII data format. The session layer
creates the link between two computers and ensures the link will not be
interrupted until the workstation ends the access to the shared drive. The
transmission layer can avoid the packet error and ensure the data can be
explained in the sending sequence. The network layer ensures the packets
can be sent through the fastest route to minimize the delay. The data link
layer creates frames and ensures that the frames can reach the proper
workstation. At last, the physical layer converts the information into
electrical signals that can be placed onto the network communication cable
to make data transmission possible. After the frame is formed, it can
adapt to the WAN communication in between LANs through encapsulation
or LAN simulation.
The OSI model is also applied to the network hardware and software
communication. To meet the standard, the network hardware and
software must contain the layers of the OSI model. The following table
lists the matching conditions of network hardware/software and specific
OSI model.
OSI Layer Matching Network Hardware or Software

Application layer Application program interface and gateway.
Representation layer Data conversion software and gateway
Session layer Network device software drivers, computer name searching
software, and gateway.
Transmission layer Network device software drivers and gateway
Network layer Gateway, router, and source route bridge
Data link layer Network interface card, intelligent hub and bridge, and gateway.

Physical layer Cable circuit, cable socket, multiplex adapter, sender, receiver,
transceiver, passive hub, passive cable connector, repeater, and
gateway.
Table 1-1 Network hardware and software related with OSI model layers
The function of the gateway in the network is limited and proprietary.

Presently, the pure implementation is decreasing (except the email
gateway software) for other devices such as network bridge, router, and
switch provide diversified functions. In history, the gateway is a device
defined in any layer of the OSI model.
Successful LAN implements the communication standards created by the

OSI model in the core part. Two basic attributes of LAN-network type and
network transmission method are the basis for the compliance of
communication standard.
Use Ping Command

Ping is a common tool used with IP for testing the connection between to
IP hosts. Use the ICMP protocol to send a series of test packets. The
packets will return to the source and display whether the destination is
available and display some timing and timeout statistics.
Simple Ping
The simple ping command can be used in the common user mode and the
privileged user mode of the Maipu switch. The method is as follows:
Switch>ping 131.199.130.3
The returned response characters are as follows:
Successful response
. timeout wait
U unavailable destination
& TTL timeout
It summarizes the results of sending 5 packets with the successful
proportion. If the ping is successful, it indicates that the network is normal
at the network layer. In addition, the two hosts can be connected to the
network layer.

Expanded Ping
Sometimes, the simple ping command cannot provide expected test for
some faults. In this case, the privileged mode of the Maipu switch provides
the expanded ping command. Ping is an interactive mode. It provides the
quantity, size, timeout value, and data format to respond to different
prompts. The usage method is as follows: Switch# ping <CR>.
Then, you will be prompted to set parameters. You can also read the help
file of the command.
System Displayed Information

show process
show cpu
show stack
show semaphore
show memory
show device
show arp
netstat r
show ip socket
show pool
netstat -m
show ip statistics
show ip icmpstate
show process
This command is to display the major tasks and the running status.
switch#show process
Displayed Content
NAME ENTRY TID PRI STATUS PC SP ERRNO DELAY
---------- ------------ -------- --- ---------- -------- -------- ------- -----
tExcTask 2a2aa8 2ffe458 0 PEND 2b8b38 2ffe368 3d0001 0
tLogTask 2ad798 2ffbad0 0 PEND 2b8b38 2ffb9f0 0 0
tExcTrace 103050 2fe98b8 10 PEND 2bf428 2fe9450 0 0
tSysWdog 2fc2e8 2ff7178 15 DELAY 2cc8e8 2ff70f8 0 3
tShell1 1291f0 13280c0 20 PEND 2bf428 1327840 c0002 0
tSysLog 43ebdc 16173e8 40 PEND 2bf428 1617318 3d0001 0
tFwdTask 356a18 235fd78 45 PEND 2bf428 235fcd8 0 0
tMonDscc 3e9fac 1e638f0 45 DELAY 2cc8e8 1e63848 0 66
tNetTask 356984 23626a0 50 PEND 2bf428 23625e8 0 0
tSysTimer 122f88 235d410 50 PEND 2bf428 235d378 0 0
tActive 2fb32c 16087c8 55 DELAY 2cc8e8 1608738 0 8

tSysTask 449a54 2f43d30 60 PEND 2bf428 2f43c80 0 0

tTnd00 4f774c 2feead8 90 PEND+T 2bf428 2fee0e8 3d0004 179
tSh00 4fab9c 12f8098 90 READY 2ccf34 12f6a40 d0001 0
tTffsPTask 5609b0 2ff7e88 100 DELAY 2cc8e8 2ff7e00 0 3
tTelnetd 4f83fc 16066f0 120 PEND 2bf428 16065f0 0 0
tSnmpd 4cf0f8 1322c90 125 PEND 2bf428 1322258 3d0001 0
tSnmpTmr 4cee20 1323ea8 200 PEND 2b8b38 1323da8 0 0
Display the meaning of each item
NAME Task name

ENTRY Entry address of the task
TID Task ID
PRI Task priority
STATUS Task status in the system
PC Program counter, the instruction address of the current task
SP The stack address of the task
ERRNO The error code of the task
DELAY Task delay time
System task status
READY The task is ready

PEND The task is congested
DELAY The task is delayed
SUSPEND The task is suspended
DELAY+S Delayed and suspended
PEND+S Pended and suspended
PEND+T With timeout value and is congested
PEND+S+T With timeout value, suspended, and pended
State+I The state has an inherited priority
Major functions of each task (common or configured)
tExcTask Exceptional tasks; provide VxWorks exceptional processing packets; implement

the functions that cannot be performed in the
interruption level The task must have the
highest priority. You need not suspend, delete,
or change the task priority.
tLogTask Log task, for the VxWorks to record the system information.
tExcTrace Display the system kernel information.
tSysWdog The watchdog task; when the switch encounters major faults, automatic
restart can be performed.
tShell1 Shell task.
tSysLog Print the output information and write the specific information into the logging
file.
tFwdTask System core forwarding task
tNetTask Task-level processing in the VxWorks network.
tSysTimer System timer
tActive Switching status detection

tSysTask Background system task; process the non-realtime system functions.

tTnd00 Forwarding task of the telnet
tSh00 Shell task of the telnet
tTffsPTask File system management task
tTelnetd The receiving task of Telnet; detect the connection request of the client.
tSnmpd Core task of the NMS
tSnmpTmr Timer tasks related with NMS
show cpu
Display the CPU usage of each task.
switch#spy cpu
switch#show cpu
Displayed Content
System monitor result:
NAME ENTRY TID PRI total % (ticks) delta % (ticks)

-------- -------- ----- --- --------------- ---------------
tExcTask 3f9bb68 0 0% ( 0) 0% ( 0)
tLogTask 3f98f90 0 0% ( 0) 0% ( 0)
tRlimit 353bf80 5 0% ( 0) 0% ( 0)
tKmemReapd 3f742a0 10 0% ( 0) 0% ( 0)
tExcTrace 3555e30 10 0% ( 0) 0% ( 0)
tFmmHdle 2c56238 10 0% ( 0) 0% ( 0)
tCPUMonitor 3f90ac0 10 0% ( 0) 0% ( 0)
tShell1 2b41248 20 0% ( 1) 0% ( 0)
tMbufTask 2e047b0 40 0% ( 0) 0% ( 0)
tSysLog 2cb67c8 40 0% ( 0) 0% ( 0)
tLocalStat 34ff8b8 45 0% ( 0) 0% ( 0)
systimerhigh 34083a8 50 0% ( 0) 0% ( 0)
tNetTask 2def410 50 0% ( 0) 0% ( 0)
tFwdTask 2dec8a8 50 0% ( 0) 0% ( 0)
tRtrSched 2c6a968 50 0% ( 0) 0% ( 0)
tRtrIcmpRcv 2c67bf8 50 0% ( 0) 0% ( 0)
tRtrJitter 2c64e88 50 0% ( 0) 0% ( 0)
tRtrWdog 2c620a8 50 0% ( 0) 0% ( 0)
tConMSig 2d404e0 55 0% ( 0) 0% ( 0)
tActive 2b3a650 55 0% ( 0) 0% ( 0)
tSysTask 3411928 60 0% ( 0) 0% ( 0)
tAaaRecv 2c46f80 80 0% ( 0) 0% ( 0)
systimer 3409cf8 90 0% ( 0) 0% ( 0)
tGTL 2de7c00 90 0% ( 0) 0% ( 0)
tLogHash 2d9d7e0 90 0% ( 0) 0% ( 0)
tELD 2c4be58 90 0% ( 0) 0% ( 0)
tTffsPTask 3f97478 100 0% ( 0) 0% ( 0)
tStaticRt 2dc8c70 100 0% ( 0) 0% ( 0)
tRtrSta 2c5ede0 100 0% ( 0) 0% ( 0)
tAclTask 2d6eb60 110 0% ( 0) 0% ( 0)
tPmtud 2df1dc0 120 0% ( 0) 0% ( 0)
tTelnetd 2b39258 120 0% ( 0) 0% ( 0)
tTelnetd6 2b35448 120 0% ( 0) 0% ( 0)
tFmmDtct 2c50d98 220 0% ( 0) 0% ( 0)
tDcacheUpd 34a8138 250 0% ( 0) 0% ( 0)
tIdle 3f8f268 255 0% ( 1) 0% ( 0)
KERNEL 0% ( 1) 0% ( 0)

INTERRUPT 0% ( 0) 0% ( 0)
IDLE 99% ( 447) 100% ( 13)
TOTAL 99% ( 450) 100% ( 13)
Note
total% From starting monitoring to showcpu, the percentage of CPU usage

delta% From previous showcpu to now, the percentage of CPU usage
current% The CPU usage of each current task
KERNEL System kernel task
INTERRUPT Interrupted
IDLE Idle time of the CPU
TOTAL Total time
Show the CPU usage in some time segments:
switch# monitor cpu

switch#show cpu monitor
Displayed Content
CPU utilization for five seconds: 2%; one minute: 1%; five minutes: 1%
CPU utilization per second in the past 60 seconds:
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
0% 0% 0% 9% 0% 0% 0% 0% 0% 0%
CPU utilization per minute in the past 60 minutes:
1% 1% 1% 1% 1% 1% 1% 1% 1% 1%
1% 1% 1% 1% 1% 1% 1% 1% 1% 1%
1% 2% 1% 1% 1% 1% 1% 2% - -
- - - - - - - - - -
- - - - - - - - - -
- - - - - - - - - -
CPU utilization per quarter in the past 96 quarters:
1% - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -
- - - - - - - -

- - - - - - - -
- - - - - - - -
- - - - - - - -
Note
CPU utilization for five seconds The CPU usage in the recent 5 seconds
one minute The CPU usage in the recent 1 minute
five minutes The CPU usage in the recent 5 minutes
CPU utilization per second in the past 60 The CPU usage per second in the recent 60 seconds
seconds
CPU utilization per minute in the past 60 The CPU usage per minute in the recent 60 minutes
minutes
CPU utilization per quarter in the past The CPU usage per quarter in the recent 96 quarters
96
quarters
- The time is not up
show stack
Display the task stacks in the system:
switch#show stack
Displayed Content
NAME ENTRY TID SIZE CUR HIGH MARGIN
------------ ------------ -------- ----- ----- ----- ------
tExcTask 0x00002a2aa8 2ffe458 7984 240 472 7512
tLogTask 0x00002ad798 2ffbad0 4984 224 376 4608
tExcTrace 0x0000103050 2fe98b8 7984 1128 1360 6624
tMonitor 0x0000102198 12f1438 2032 136 200 1832
tSysWdog 0x00002fc2e8 2ff7178 3984 128 360 3624
tShell1 0x00001291f0 13280c0 16376 2176 3552 12824
tSysLog 0x000043ebdc 16173e8 5112 208 1088 4024
tFwdTask 0x0000356a18 235fd78 9984 160 1384 8600
tMonDscc 0x00003e9fac 1e638f0 7984 168 1048 6936
tNetTask 0x0000356984 23626a0 9984 184 1064 8920
tSysTimer 0x0000122f88 235d410 10224 152 328 9896
tCheckCpu 0x00004f14dc 12f0008 8176 176 4544 3632
tActive 0x00002fb32c 16087c8 3992 144 424 3568
tSysTask 0x0000449a54 2f43d30 9984 176 240 9744
tTnd00 0x00004f774c 2feead8 10232 2544 3448 6784
tSh00 0x00004fab9c 12f8098 20472 2600 5864 14608
tTffsPTask 0x00005609b0 2ff7e88 2032 136 416 1616
tTelnetd 0x00004f83fc 16066f0 10224 256 976 9248
tSnmpd 0x00004cf0f8 1322c90 28664 2616 4800 23864
tSnmpTmr 0x00004cee20 1323ea8 4080 256 536 3544
tIdle 0x0000102304 12f0a20 2040 128 408 1632
INTERRUPT 5000 0 800 4200
Note
Meaning of each item:

NAME Task name

ENTRY Entry address of the task
TID Task ID
SIZE Stack size
CUR The size of the memory used in the current stack
HIGH The size of the memory used in the biggest stack
MARGIN The size of memory that is not used in the stack
show semaphore
Display the major semaphores used in the system and the status:
switch#show semaphore all

Displayed Content
===== SEMLIST [Checksum : 0xd954] =====
Semaphore Id : 0x3ede4d8
Semaphore Type : BINARY
Task Queuing : FIFO
Pended Tasks :0
State : EMPTY
Owner : 0x23e0478 (tShell1)
Options : 0x0 SEM_Q_FIFO
VxWorks Events
--------------
Registered Task : NONE
Event(s) to Send : N/A
Options : N/A
===== SysMemMechSem [Checksum : 0x79a2] =====

Semaphore Id : 0x3ede508
Semaphore Type : MUTEX
Task Queuing : PRIORITY
Pended Tasks :0
Owner : NONE
Options : 0x9 SEM_Q_PRIORITY
SEM_INVERSION_SAFE
VxWorks Events
--------------
Options : N/A
VxWorks Events
--------------
Options : N/A
Note
Semaphore type includes: MUTEX, BINARY, and COUNTING.

Task queuing ( Priority FIFO)

Use the show semaphore command to configure different parameters to
implement different functions:
show semaphore _STRING_: Display the information about specific

semaphore
Show semaphore list: display the list of the current semaphore
show semaphore binary | counting | mutex any | pended | unpended

detail | summary: Display the information about different types of
semaphores. Pended means the semaphore is blocked; unpended means
the semaphore is not blocked; detail means displaying the detailed
information; summary means the summary information.
show memory
Display the memory usage in the system:
switch#show memory
Displayed Content
Memory management mechanism, types, and usage.
SUMMARY
-------
Type Used bytes Free bytes Total bytes Used percent
---- ---------- ---------- ----------- ------------

HEAP 21291496 28001744 49293240 43.19%
CODE 17810592 / 17810592 /
SLAB 539040 349792 888832 60.65%
MBUF 755936 16081824 16837760 4.49%
Note
The space of all such memory types exclude CODE is part of the HEAP's
used memory,for example:MBUF,SLAB,and FPSS if exists.
The memory of all memory management mechanisms (such as MBUF, SLAB, and FPSS-if
they exist) except the CODE segment are part of the used
memory of HEAP.
STATISTICS
----------
Used bytes Free bytes Total bytes Used percent
---------- ---------- ----------- ------------

22670472 44433360 67103832 33.78%
Note

HEAP Stack memory, the most basic memory area in the system. Other re-
allocation memory management mechanisms are separated from the
area.
CODE Code segment memory, used in the area for saving code segment
SLAB A memory re-allocation management mechanism
MBUF A memory re-allocation management mechanism

FPSS A memory re-allocation management mechanism, exists in MP3700,
MP7200, and MP7500.
Use the show memory command to configure different parameters to
implement different functions:
show memory FPSS|HEAP|MBUF|SLAB: display the memory usage of

different memory management mechanisms
show memory FPSS|MBUF|SLAB _POOLNAME_: display the usage of a

memory pool in a memory management mechanism
show memory detail: display the usage details of system memory
show memory detail FPSS|HEAP|MBUF|SLAB: display the detailed memory

usage of different memory management mechanisms
show memory detail FPSS|HEAP|MBUF|SLAB _POOLNAME_: display the

detailed usage of a memory pool in a memory management mechanism
show arp
Display the ARP cache of the system.
switch#show arp
Displayed Content
Protocol Address Age (min) Hardware Addr Type Interface
Internet 128.255.41.40 2 0022.153b.55e4 ARPA vlan1
Internet 128.255.41.47 - 0001.7a5c.004a ARPA vlan1
Internet 128.255.43.254 0 0001.7a58.19ba ARPA vlan1
Note
When age is displayed as -, it means the static ARP entity.
show ip socket
Display the information about the sockets in the active status:
switch#show ip socket
Displayed Content
Active Internet connections (including servers)
PCB Proto Recv-Q Send-Q Local Address Foreign Address vrf (state)
-------- ----- ------ ------ ---------------------- ---------------------- ------- -------

2f50cb0 TCP 0 0 0.0.0.0.23 0.0.0.0.0 all LISTEN

2f50ba8 UDP 0 0 0.0.0.0.520 0.0.0.0.0 kernel
2f50aa0 UDP 0 0 0.0.0.0.0 0.0.0.0.0 kernel
2f50a1c UDP 0 0 0.0.0.0.0 0.0.0.0.0 kernel
2f5080c UDP 0 0 0.0.0.0.0 0.0.0.0.0 kernel
Note
PCB The address of the socket protocol control block (PCB)
Proto The protocol type of the socket
Recv-Q The quantity of data received in the receiving cache of the socket
Send-Q The quantity of data in the sending cache of the socket
Local Address The local IP address and port number bound with the socket (0.0.0.0.23
indicates that the IP address is any of the all local IP addresses; the port
number is 23).
Foreign Address The foreign IP address and the port number corresponding to the socket.
Vrf VPN route forwarding
state The status of the socket (effective to the TCP)
show pool
Display the three commands in the current cache pool:
Show pool (show the summary of the pool)
Show pool detail (show the details of the pool)
Show pool information (show the actual information about the cache chain)
The description is as follows:
Switch# sh pool
Displayed Content
Driver pool
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage
-------------------------------------------------------------------------------
1884 11008 10496 3906
-------------------------------------------------------------------------------
Size: 21247488 bytes
Data pool
__________________
CLUSTER POOL TABLE

_______________________________________________________________________________
-------------------------------------------------------------------------------
64 18000 17983 1611
128 36000 35943 175
256 3424 3422 40
512 2400 2394 20
1024 180 180 0
2048 300 300 0
-------------------------------------------------------------------------------
All MBUF pool size : 35689728 bytes

Note
*** pool The name of the cache pool, for example, data pool
is the cache pool used by the upper layer protocol and the driver pool is
the cache pool used by the drivers.
In CLUSTER POOL TABLE, the meaning of each item is as follows:
Size: the size of the cache data pool
Clusters: the number of data blocks
Free: the number of blocks not used
Usage: the times of using the pool
CLUSTER POOL TABLE size: the size of occupied memory
All MBUF pool size: the size of the memory occupied by all cache pools
Switch# sh pool information

Displayed Content
Driver pool
free mBlk number : 5500, fact free number : 5500
free clBlk number : 5504

__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage fact
-------------------------------------------------------------------------------
1884 11008 10496 2872 10496
-------------------------------------------------------------------------------
Data pool
free mBlk number : 69918, fact free number : 69918

free clBlk number : 59198

__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
size clusters free usage fact
-------------------------------------------------------------------------------
64 18000 17983 1133 17983
128 36000 35943 151 35943
256 3424 3422 10 3422
512 2400 2394 18 2394
1024 180 180 0 180
2048 300 300 0 300
-------------------------------------------------------------------------------
All MBUF pool size : 35689728 byt

Note
free mBlk number: the number of mblk not used
fact free number: the actual number of mblks of traversed mblk links
free clBlk number: the number of clblk
In CLUSTER POOL TABLE, the fact indicates the number of clusters

obtained in traversing cluster chain
switch#show pool detail

Displayed Content
fastethernet pool
Statistics for the network stack mbuf

type number
--------- ------
FREE : 1022
DATA : 2
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 0

DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 1024
number of mbufs: 1024
number of times failed to find space: 0
number of times waited for space: 0
number of times drained protocols for space: 0
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
-------------------------------------------------------------------------------
1556 512 256 599
-------------------------------------------------------------------------------
Link pool

type number
--------- ------
FREE : 1640
DATA : 0
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 0
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 1732
____ _____________
CLUSTER POOL TABLE
_______________________________________________________________________________
-------------------------------------------------------------------------------
64 1600 1600 0
128 10 10 0

256 10 10 0
512 10 10 0
1024 10 10 0
2048 100 100 0
-------------------------------------------------------------------------------
Size: 461120 bytes
sys pool

type number
--------- ------
FREE : 11560
DATA : 1
HEADER : 0
SOCKET : 2
PCB : 3
RTABLE : 22
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 8
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 4
IFMADDR : 0
MRTABLE : 0
DRVSCC : 0
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 38400
number of mbufs:38400
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
-------------------------------------------------------------------------------
64 8000 7973 28
128 16000 15959 59
256 3200 3199 1
512 3200 3192 26
-------------------------------------------------------------------------------
Size: 7801600 bytes
Data pool

type number
--------- ------
FREE : 7999
DATA : 0
HEADER : 0

SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 1
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 0
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 8000
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
-------------------------------------------------------------------------------
64 800 800 4
128 200 199 27520
256 200 200 0
512 100 100 0
1024 80 80 0
2048 50 50 0
-------------------------------------------------------------------------------
Size: 767000 bytes
Driver pool
type number
--------- ------
FREE : 1388
DATA : 112
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 0
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0

OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 56
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 4
DRVEXTSCC: 4
TOTAL : 6000
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
-------------------------------------------------------------------------------
1600 6000 5936 2446
-------------------------------------------------------------------------------
All MBUF pool size : 19971928 bytes
Note
*** pool The name of the cache pool, for example,

fastethernet pool is the cache pool of the 100M Ethernet interface
The following describes the Ethernet pool:
type number: Statistics of various mbufs
FREE : 1022 number of remaining mbuf
DATA : 2 The number of mbus for saving data
HEADER : 0 The number of mbuf for saving protocol headers
SOCKET : 0 The number of mbuf for creating sockets
PCB : 0 The number of mbuf for creating PCB
RTABLE : 0 The number of mbuf for creating routes
HTABLE : 0 The number of mbuf for creating IMP hosts
ATABLE 0 The number of mbuf for creating address resolution

table
SONAME : 0 The number of mbuf for saving socket names

ZOMBIE : 0 The number of mbuf for creating zombie option
SOOPTS : 0 The number of mbuf for saving socket option
FTABLE : 0 The number of mbuf for IP reconstruction
RIGHTS : 0 The number of mbuf for creating rights of accessing

kernel
IFADDR : 0 The number of mbuf for creating the interface address
CONTROL: 0 The number of mbuf for creating control option
OOBDATA : 0 The number of mbuf for saving TCP out-of-band data
IPMOPTS : : 0 The number of mbuf for saving multicasting option
IPMADDR: : 0 The number of mbuf for saving multicasting address
IFMADDR : 0 The number of mbuf for saving multicasting address in

Ethernet
MRTABLE : 0 The number of mbuf for saving multicasting routing
table
DRVSCC : 0 The number of mbuf for driving scc
DRV8SA : 0 The number of mbuf for driving 8sa
DRV8S : 0 The number of mbuf for driving 8s
DRV16A : 0 The number of mbuf for driving 16a
DRV4M336 : 0 The number of mbuf for driving 4m336
DRVEXTSCC 0 The number of mbuf for driving the expanded card
TOTAL : 1024 the sum of the preceding statistics
number of mbufs: 1024 The number of MBLK in the current pool
number of times failed to find space: 0 The times of failed to applying for
the mbuf
number of times waited for space: 0 The times of waiting in applying for
mbuf
number of times drained protocols for space: 0 The times of recycling

mbuf
CLUSTER POOL TABLE The statistics of cluster pool of the current mbuf
pool
size clusters free usage cluster size, total data, free, used

1556 512 256 599
netstat -m
Display the statistics of the system data pool:
switch#netstat -m
Displayed Content
type number
FREE : 7999
DATA : 0
HEADER : 0
SOCKET : 0
PCB : 0
RTABLE : 0
HTABLE : 0
ATABLE : 0
SONAME : 0
ZOMBIE : 1
SOOPTS : 0
FTABLE : 0
RIGHTS : 0
IFADDR : 0
CONTROL : 0
OOBDATA : 0
IPMOPTS : 0
IPMADDR : 0
IFMADDR : 0
MRTABLE : 0
DRVSCC : 0
DRV8SA : 0
DRV8S : 0
DRV16A : 0
DRV4M336: 0
DRVEXTSCC: 0
TOTAL : 8000
__________________
CLUSTER POOL TABLE
_______________________________________________________________________________
-------------------------------------------------------------------------------
64 800 800 9
128 200 199 20
256 200 200 0
512 100 100 0
1024 80 80 0
2048 50 50 0
-------------------------------------------------------------------------------
Note

The command displays the statistics of the system data pool. The display
format and the content of the show pool detail are the same as that of the
data pool. In the show pool detail command, the statistics of the system
data pool is also displayed.
show ip statistics
Display the statistics of the IP packets:
switch#show ip statistics
Displayed Content
Statistics for the IP protocol
total 1434
badsum 0
tooshort 0
toosmall 0
badhlen 0
badlen 0
infragments 0
fragdropped 0
fragtimeout 0
forward 0
cantforward 1403
redirectsent 0
unknownprotocol 0
toupper 31
nobuffers 0
reassembled 0
outfragments 0
noroute 0
rawsockout 0
badaddress 0
fastforwardtotal 0
fastforward 0
cannotfastforward 0
Note
total The number of received and sent packets.

bandsum The number of packets with incorrect checksum.
tooshort The length of the received packets is shorter than actual length (the
length filed in the IP header).
toosmal The length of the received packets is shorter than the IP header length
(20 bytes)
badhlen The IP header filed is smaller than the IP length (20 bytes)
badlen The value of the IP header is smaller than the IP header length
infragments The number of received fragments
fragdropped The number of dropped fragments
fragtimeout The number of timeout dropped fragments
forward The number of forwarded packets
cantforward The number of packets that cannot be forwarded
redirectsent The number of redirected packets
unknownprotocol The number of unknown protocol packets
toupper The number of sent to the upper layer

nobuffers The times of no buffer

reassembled The number of reassembled packets
outfragments The number of sent fragments
noroute The times of route failure
rawsockout The number of original IP packets
badaddress The number of the packets with the illegal address
fastforwardtotal The total of fast forwarded packets
fastforward The number of fast forwarded packets
cannotfastforward The number of packets that cannot be fast forwarded
show ip icmpstate
Display the statistics of the ICMP packets:
switch#show ip icmpstate
Displayed Content
Statistics for ICMP protocol
6929 calls to icmp_error
0 error not generated because old message was icmp
Output histogram:
echo reply: 5
destination unreachable: 24
0 message with bad code fields
0 message < minimum length
0 bad checksum
0 message with bad length
Input histogram:
echo: 5
#10: 2
5 message responses generated
Note
call to icmp error The number of invoking ICMP to send ICMP

error packets
error not generatd because old message The number of errors discarded for the packets
was icmp are ICMP packets
Output histogram The histogram of the sent ICMP packets
echo reply The number of ICMP packets of echo reply
destination unreachable The number of ICMP packets with unreachable
destination
message with bad code fields The number of ICMP packets discarded for
invalid code
message < minimum fields The number of packets discarded for the ICMP
header is too short
bad checksum The number of discarded ICMP packets for bad
checksum
message with bad length The number of discarded ICMP packets for
invalid ICMP body
Input histogram The histogram of the received ICMP packets
echo The number of ICMP packets of echo
#10: 2 There are two packets with the type of
ICMP_UNREACH_HOST_PROHIB
message response generated The number of generated response packets

Switch Principles
This chapter describes the switch principles for users to understand the
later chapters.
Main contents:
The development of the switching technology
The basic working principle of the switch
Multiple layer switching technology
Comparison between the switch and other network communication

products
Development of the Switching

Technology
The following is the development process of the LAN.
The combination of the computer technology and the communication

technology boosts the rapid development of the LAN. From 1960s to 1990s,
the development experiences ALOHA to 1000Mbps switching Ethernet. In
the thirty years, the technology leaps from simplex to duplex, from
sharing to switching, from low speed to high speed, from simple to
complex, and from expensive to popular.
In the later 1980s, the rapid increase of the semaphore boosts the
development of the technology. As a result, the LAN has increasingly
excellent performance. The 1M bps rate is replaced by the 100BASE-T and
100CGANYLAN. But, in the traditional media access method, lot of sites
share a common transmission media, namely CSMA/CD.

In the early 1990s, with the improvement of the computer performance

and the increase of the semaphore, the traditional LAN is beyond its load.
The switching Ethernet technology emerges and the performance of the
LAN is significantly improved. Compared with the LAN topology of the
shared media based on the bridge and router, the bandwidth of the
network switch increases. With the switching technology, the dispersed
network can be constructed. As a result, the ports of the LAN switch can
transmit information parallelly, safely, and simultaneously. Therefore, the
LAN can be intensively expanded.
The development of the LAN switching technology goes back to the two-
port bridge. The bridge is a storage and forwarding device for connecting
similar LANs. According to the structure of the internet network, the bridge
is the DCE class point-to-point connection. According to the protocol layer,
the bridge stores and forwards the data frame in the logical link layer; it is
similar to the function of a repeater in the L1 and router in L3. The two-
port bridge and the Ethernet are developing at the same time.
The Ethernet switching technology is developed in 1990s based on the

multiple-port bridge. It implements the lower two layer protocols and is
related with the bridge. It is even called by the professionals as many
connected bridges. Therefore, the current switching technology is not new
standard; it is only the new application of current technology and is the
improved LAN bridge. Compared with traditional bridge, the switching
technology provides more ports, better performance, more powerful
management functions, and lower price.
Basic Working Principle of the

Switch
The LAN switching technology is on the L2 (data-link layer) of the OSI
model. The "switching means forwarding frames. In the data
communication, all switching devices (namely the switches) implement
two basic tasks:
Frame forwarding: forward the frames received from the input media to
the corresponding output media;
Address learning process: construct and maintain the switching address

table to maintain the switch operation.

The following describes the details of the two basic operations.
Frame Forwarding
The switch forwards frames according to the MAC address. When the
switch forwards frames, the following rules must be observed:
1. If the destination MAC address of the frame is broadcasting address or

multicasting address, the frame is forwarded to all ports of the switch
(except the source port of the frame);
2. If the destination address of the frame is a unicast address, but the

address is not in the address table of the switch, the frame is
forwarded to all ports (except the source port of the frame).
3. If the destination address of the frame is in the address table of the

switch, forward the frame to the corresponding port according to the
address table.
4. If the destination address and the source address of the frame are in
the same network segment, the frame is discarded and switching is
not performed.
The following figure illustrates the frame switching.
Figure 2-1 Frame forwarding

When host D sends the broadcast frames, the switch receives frames with
the destination address of ffff.ffff.ffff from port E3, the frame is forwarded
to ports E0, E1, E2, and E4.
When host D communicates with host E, the switch receives frames with
the destination address of 0260.8c01.5555 from E3 port. Search the
address table and find that 0260.8c01.5555 is not in the table. Therefore,
the switch forwards the frames to E0, E1, E2 and E4 ports.
When host D communicates with host F, the switch receives frames with
the destination address of 0260.8c01.6666 from port E3. Search the
address table and find that 0260.8c01.6666 is at port E3, namely, the
address and the source address are in the same network segment.
Therefore, the switch does not forward the frame, and it drops the frame
directly.
When host D communicates with host A, the switch receives the frames
with the destination address of 0260.8c01.1111 from port E3. Search the
address table and find that 0260.8c01.1111 is at port E0. Therefore, the
switch forwards the frames to port E0. As a result, host A can receive the
frame.
If host D communicates with host A, host B is sending data to host C, the

switch also forwards the frames from switch B to port E2 connecting host
C. In this case, between E1 and E2, E3 and E0, through the hardware
switching circuit in the switch, two links are created. The data
communication between on the two links does not affect mutually.
Therefore, no network conflicts are encountered. Therefore, the
communication between host D and host A occupies a link exclusively. The
communication between host C and host B also occupies a link exclusively.
This type of link is created only when the two parties of the
communication have the requirements. When the data is transmitted, the
corresponding link is removed. This is the major features of the switch.
According to the switching process described previously, we can find that

the forwarding of frames is based on the MAC address table in the switch.
The following describes the creation and maintenance of the address table.
Address Learning Process

In the address table of the switch, one entry is composed of one MAC
address and the resident switch port number. The generation of the whole

address table is through the dynamic self-learning, namely, when the

switch receives a frame, the source address and the input port are
recorded in the switching address table. Figure 2-2 illustrates the
forwarding and learning of the received frames.
When a frame reaches from a specific port, the switch gets the conclusion
according to the two items: from port X, the workstation specified by the
frame source address domain can be reached. Therefore, the switch can
update the forwarding database for the MAC address. To allow the change
of the network topology, each item of the database is configured with a life
timer. When a new item is added to the database, the timer is started. The
default value of the timer is 30 seconds. If the scheduled time is up, the
item searches the database to check whether any item with the same
address field value and frame address exists. If such item exists in the
database, the content of the item is updated. Reset the timer. If such item
does not exist in the database, add a new item in the database. The
address in the new item is the MAC address of the received frame; the
port number is the port of the received frame; the timer value is set to the
original value.

Figure 2-2 Bridge forwarding and address learning
Multiple Layer Switching

Technology
The implementation of the LAN switching technology is through the
hardware mode. In the frame format of the LAN, the position of the
destination MAC address is fixed. The check of the header information is
simple to facilitate hardware switching. Therefore, the traditional LAN
switching refers to the L2 switching, namely, based on the L2 information-
destination MAC address.

In the switching mode, the switch needs to receive certain data to check
the forwarding before the switching operation. If the length of the
detection data is increased, you can expand the L2 switching technology to
the L3, or even L4 switching technology.
In the L3 switching technology, the detection data is expanded to the IP

packet header. The switching is performed by checking the IP address.
Actually, it is based on the hardware route. L4 switching technology
checks the communication protocol type and the port number in the IP
packet header. It can be regarded as the switching based on application.
The widely used multiple layer switching technology combines L2, L3, and
L4 switching technologies to implement one route, multiple switching
function.
Comparison Between the Switch

and Other Network
Communication Products
Switch and the Switch Hub
The switch hub can provides terminals with exclusive bandwidth,
automatically create and maintain the station table, and create switching
path between the output and input ports according to the station table.
The switch is developed based on the switch hub. It provides the

preceding functions, and also provides the functions required by the
current network: information flow priority, service category, virtual
network, RMON, automatic flow control, embedded network management
proxy. These functions construct the high speed, flexible, intelligent,
reliable, and expansible network. It provides high-speed data transmission
capability and good QoS, It extends the data transmission network to a
new field which is suitable for the multimedia application and real-time
data transmission.

Switch and Router

The traditional switch is developed based on the bridge. It belongs to L2 of
the OSI model. It addresses according to the MAC address. It selects
routes through the station table. The creation and maintenance of the
station table is performed automatically by the switch. The router belongs
to L3 of the OSI model. It addresses according to the IP address and
selects route according to the routing table. The routing table is generated
by the routing protocol. The advantage of the switch is the fast speed. The
switch only needs to identify the MAC address of the frame and select the
forwarding port according to the MAC address. The algorithm is simple and
the implementation of ASIC is easy. Therefore, the forwarding speed is
high, the line speed forwarding can be implemented in the 100Mbit/s and
Gbit/s communication links. The working mechanism of the switch also
brings the problems including loopback, lumped loading, and broadcast.
With the development of the technology, the problems are solved.
With the emergence of L3 switch, the function of the switch is becoming

more and more important. Compared with traditional router, L3 switch has
the following advantages:
Each interface is connected to a subnet. The transmission rate of the

subnet through the router is restricted by the bandwidth of the
interface. The L3 switch is different. It can define multiple ports to a
virtual network. The virtual port composed of multiple ports serves as
the virtual network interface. The information in the virtual network
can be transmitted to the L3 switch through the ports forming the
virtual network. The port number can be specified, the transmission
bandwidth between subnets is not restricted.
The information resources are reasonably configured: the rates for

accessing the resources in the subnet and for accessing resources in
the global network are the same, therefore, setting independent
server for the subnet is not necessary. IN the global network, you can
set the server cluster to save cost and configure information resources
reasonably.
The cost is reduced: In the normal network design, the subnet is

composed of switches and the subnets are connected through routers.
In the current network design, the L3 switch is adopted. It can divide
any virtual network and implement inter-subnet communication
through the L3 routing function of the switch. As a result, the cost for
the expensive routers is saved.
The connection between switches is flexible: Loopback is not allowed

between switches; multiple paths are used for improving the reliability
and balancing the load when it used as router. L3 switch use the
spanning tree algorithm to block the port that causes the loopback. In
the case of selecting routes, the blocked paths are still the options.
The function of a router is more powerful than that of a switch. But the
rate of a router is low and the price is high.

The L3 switch is widely used for it has the line speed forwarding capability
of a switch and has the good control function of a router.

VLAN Technology
This chapter describes the VLAN technology and its application.
Main contents:
Overview and principle
VLAN division
Typical application
Overview and Principle

This chapter describes the VLAN concept and principle.
Main contents:
Overview
VLAN principle
Overview
In the Ethernet communication, network problems including serious
conflict, flooded broadcast, and performance decreasing may be
encountered when the number of hosts is large. To solve the preceding
problems, the VLAN technology occurs. Each VLAN is a broadcast domain.
The hosts in a VLAN can communicate mutually. But the hosts between
VLANs cannot communicate with each other. As a result, the broadcast
packets are limited to a VLAN.
A VLAN is to divide physical network into logical networks. The division of

VLAN is not restricted by the physical location. The hosts in different
locations can belong to the same VLAN. VLAN restricts the broadcast
domain. The L2 unicast, broadcast, and multicast frames can be forwarded
and spread in the local VLAN and cannot enter other VLANs. L2 packets in

different VLANs are isolated, namely, users of different VLANs cannot

communicate mutually.
VLAN Principle
To identify packets of different VLANs, add VLAN tag in the packets. The
encapsulation format of the VLAN packets comply with IEEE 802.1Q, as
shown in the following figure.
DA: destination MAC address; SA: Source MAC address; Type: protocol
type of the packets. IEEE 802.1Q defines that after the destination MAC
address and the source MAC address, four-byte VLAN tag should be
encapsulated to identify the VLAN. The VLAN tag contains four fields
including Tag Protocol Identifier (TPID), priority, Canonical Format
Indicator (CFI), and VLAN ID.
TPID: identify the frame with VLAN tag; the length is 16bit; the value is
0x8100.
Priority: Indicates the 802.1P priority of the packets; the length is 3 bit.
CFI: identifies whether the MAC address can be encapsulated in standard

format in different transmission media. The length is 1 bit. The value 0
indicates that the MAC address can be encapsulated in standard format;
the value 1 indicates that the address is encapsulated in non-standard
format. The default value is 0.
VLAN ID: identifies the VLAN o the packets. The length is 12bit. The value
range is 0-4095. 0 and 4095 are the reserved value of the protocol. The
value range of VLAN ID is 1-4094.
VLAN Division
VLAN can be divided into different types. The common types are as follows:

Port-based VLAN
MAC-based VLAN
IP subnet-based VLAN
Protocol-based VLAN
In the default configuration, the priority (from high to low) of the four
types of VLANs is: MAC-based VLAN, IP subnet-based VLAN, Protocol-
based VLAN, and Port-based VLAN. In the same port, the VLAN division
takes effect according to the priority. Only one VLAN division takes effect.
Port-Based VLAN
In the Port-based VLAN, a port is regarded as a member of the port and
added to the VLAN. The port can forward the packets of the VLAN.
Port Types
The port modes can be classified into three types according to the mode of
processing packet tag.
Access:
The port belongs to one VLAN; the default VLAN ID of the port and the
home VLAN ID are the same; connected with user devices. The default
type of the port is Access.
Trunk:
The port allows multiple VLANs; receives and sends packets of multiple
VLANs; permits default VLAN packets without tag; used in interconnection
of network devices.
Hybrid:
The port can be added to multiple VLANs; receives and sends packets of
multiple VLANs; permits packets without tag of multiple VLANs; used in
interconnection of user devices and network devices.

D efaul t VL AN of the Port

Through the default VLAN of the port, divide the packets without tag
received to the default VLAN. The default VLAN of the port is 1 The user
can configure the default VLAN of the port as required.
The default VLAN of the Access port is the home VLAN. It cannot be
configured.
The Trunk port and the Hybrid port can belong to multiple VLANs. The
default VLAN can be configured.
MAC-based VLAN
The MAC-based VLAN divides VLAN ID for packets according to the source
MAC address of the received packets.
The untag packets received in the port are process as follows according to
different configuration:
1. If the source MAC and the MAC address of MAC-based VLAN are
consistent, and the In port of the packets is allocated to the VLAN of
the corresponding VLAN ID, the packet is allocated to the VLAN ID
corresponding to the MAC VLAN.
2. If the packets have no MAC set by the matched MAC VLAN, the
packets are divided to the default VLAN ID of the port.
IP subnet-based VLAN
The IP subnet-based VLAN divides VLAN ID for packets according to the
source IP address of the received packets.
The untag packets received in the port are process as follows according to
different configuration:
1. If the source IP address is in the network segment of IP subnet-based

VLAN, and the In port of the packets is allocated to the VLAN of the
corresponding VLAN ID, the packet is allocated to the VLAN ID
corresponding to the network segment.
2. If the packets have no network segment set by the matched IP subnet

VLAN, the packets are divided to the default VLAN ID of the port.

Protocol-based VLAN
The protocol-based VLAN divides VLAN ID for packets according to the
encapsulation format and protocol type of the received packets.
The protocol VLAN defines the protocol template. The protocol template is
composed of the frame encapsulation format and the protocol type. The
same port can be configured with multiple protocol templates. When the
protocol VLAN is enabled in the port, the port is configured with protocol
template, the protocol VLAN process the received untag packets as follows
according to different configuration.
1. If the packet matches the protocol template, and the In port of the
packet is allocated to the VLAN of the corresponding VLAN ID, the
packet is allocated to VLAN ID corresponding to the port configuration
protocol template.
2. If the packets have no matched protocol template, the packets are

divided to the default VLAN ID of the port.
Typical Application
In an enterprise, communication can be performed in the same
department located in different places. Communication cannot be
performed between different departments. The networking diagram is as
follows:
VLAN 10 VLAN 20
VLAN 10 VLAN 20 VLAN 10 VLAN 20
Section A Section B Section A Section B
For the detailed configuration of VLAN, see chapter 4 VLAN Configuration.

Link Aggregation
This chapter describes the link aggregation technology and its application.
Main contents:
Link aggregation
Classification of link aggregation
Typical application
Link Aggregation
This section describes the concept of the link aggregation.
Main contents:
Terms of the link aggregation
Functions of the link aggregation
LACP protocol
Terms of the Link Aggregation

Link aggregation: multiple physical links are bound together to form a
logical link, which expands the link bandwidth. At the same time, the
member links of the aggregation are dynamic backup mutually. It provides
higher reliability.
LAC: Link Aggregation Control
LACP: Link Aggregation Control Protocol, defined in IEEE802.3ad.
LACPDU: Link Aggregation Control Protocol Data Unit.

LAG: Link Aggregation Group.
LAG ID: Link Aggregation Group Identifier.
Key: 16-bit integer variable, for describing the aggregation capability of a

port. It is composed of rate, duplex, and administrative key (aggregation
group ID).
Administrative Key: The key used by the administrator for setting.
Operational Key: The key reflecting the port aggregation capability.
Functions of the Link Aggregation

The link aggregation is a aggregation group composed of multiple ports.
The upper layer entities using the link aggregation service regard the
multiple physical links in the same aggregation group as a logical link. The
function of the link aggregation is to share the in/out load in each member
port to increase the link bandwidth. At the same time, member ports of
the aggregation group are dynamic backup mutually. It provides higher
reliability.
LACP Protocol
IEEE802.3ad-based LACP is a protocol for implementing the link dynamic
aggregation. The LACP protocol communicates with the opposite end
through the Link Aggregation Control Protocol Data Unit (LACPDU).
After the LACP protocol of a port is enabled, the port advertises the
system priority, system MAC address, port priority, port number, and the
operation key to the opposite end by sending LACPDU. After the opposite
end receives the information, compare the information with the
information saved in other ports to select port to aggregate. As a result,
the two parties can agree with each other on joining or exiting a dynamic
aggregation group.
The operation key is a configuration combination generated by the LACP

protocol according to the port configuration (rate, duplex, administrative
key).

Classification of Link Aggregation

The link aggregation can be classified into two types according to the
aggregation mode:
1. Manual aggregation
2. LACP protocol aggregation
Manual Aggregation
1. Overview
The manual aggregation is configured by the user manually. The LACP

protocol of the manual aggregation port is disabled.
2. Port status in the manual aggregation group
In the manual aggregation group, the status of the port can be Selected
and Unselected. Only the Selected port can receive user service packets;
the Unselected port cannot receive or send user service packets.
The system sets the port status (Selected or Unselected) according to the
following principles:
The any port in the aggregation group is in the Up status, select the
port with the highest priority and in the Up status to serve as the root
port of the group.
The port in the Up status with the same operation key as the root port
becomes the candidate port of the possible Selected port. Other ports
will be in the Unselected status.
The number of the ports in the Selected status of the manual

aggregation group is limited. When the number of the candidate ports
does not reach the upper limit, all candidate ports are in the Selected
status and other ports are in the Unselected status. When the number
of the candidate ports exceeds the limit, the system selects some
candidate ports to remain the Selected status according to the port
number (from small to large), and the ports with bigger port numbers
become Unselected.
3. Configuration requirements for the manual aggregation
In the manual aggregation group, only the ports with the same
configuration as the reference port can become the Selected ports.
The configuration covers the rate, duplex, and up/down status. Users

need to keep the basic configuration of each port same through

manual configuration.
In an aggregation group, when the configuration of a port changes,

the system does not perform aggregation. But the system resets the
Selected/Unselected status of each port and re-selects the root port.
LACP Protocol Aggregation

1. Overview
The LACP aggregation is performed by users manually. When the port

joins the LACP aggregation group, the LACP protocol of the port is
automatically enabled.
2. Port status in the LACP aggregation group
In the LACP aggregation group, the status of the port can be Selected and
Unselected.
The Selected ports and the Unselected ports in the up status can
receive and send LACP packets.
Only the Selected port can receive user service packets; the
Unselected port cannot receive or send user service packets.
The system sets the port status (Selected or Unselected) according to the
following principles:
The local system and the opposite system negotiate. The status of the
ports at two ends is determined by the port ID with higher device ID
priority. The negotiation procedure is as follows:
Compare the device IDs of the two ends (device ID= system priority +
system MAC address). Compare the system priorities. If the system
priorities are the same, compare the system MAC addresses. The end
with smaller device ID is considered to be prior (when the system
priority is low and the system MAC address is small, the device ID is
small)
Compare the port IDs of the end with the prior device ID (port ID =
port priority + port number). For the ports at the end with prior device
ID, compare the port priorities. If the priorities are the same, compare
the port numbers. The port with small port ID serves as the root port
of the aggregation group (the port with lower priority has smaller port
number, and the port ID is small).

When the port is consistent with the operation key of the root port and
is in the Up status, the configurations of the opposite port and the
opposite root port are the same, the port becomes the candidate port
of the Selected ports. Otherwise, the port is in the Unselected status.
The number of the ports in the Selected status of the LACP

aggregation group is limited. When the number of the candidate ports
does not reach the upper limit, all candidate ports are in the Selected
status and other ports are in the Unselected status. When the number
of the candidate ports exceeds the limit, the system selects some
ports to remain the Selected status according to the port ID (from
small to large), and the ports with bigger port IDs become Unselected.
At the same time, the opposite device feels the change of the status.
The corresponding port status changes.
3. Configuration requirements for the LACP aggregation
In the LACP aggregation group, only the ports with the same
configuration as the root port can become the Selected ports. The
configuration covers the rate, duplex, and up/down status. Users need
to keep the basic configuration of each port same through manual
configuration.
In an aggregation group, when the configuration of a port changes,

the system does not perform aggregation. But the system resets the
Selected/Unselected status of each port and re-select the root port.
The following figure illustrates the LACP aggregation. The priority of device
S is higher than the priority of device T. The member ports of aggregation
group 1 are A, B, C, E, D, and F. Port F is in the Down status. The rate of
port E is 10M and the rate of other ports is 100M. One aggregation group
supports only three ports.

LACP aggregation
1. Port A has the highest priority and is set to the Selected status first.
Therefore, port A is the root port of aggregation group 1.
2. The opposite end of port G is connected with the port of aggregation

group 8, which is different from the aggregation group of the port
connected with root port A. Therefore, the status of port G is set to
Unselected.
3. The link is in the down status, and the aggregation status is set to
Unselected.
4. The rate of port E is different from that of root port A, and the
aggregation status is set to Unselected.
5. The rate and duplex of the port D are the same as root port A. But the
link priority is lower than B and C, therefore, the aggregation status is
set to Unselected.
As a result, in the six member ports of aggregation group 1, only ports A,

B, and C are in the Selected status. Perform real aggregation and write
into the TRUNK_BITMAP table. The spanning tree status of ports D, E, F,
and G is set to Blocking/Disabled.
Typical Application
Networking diagram of link aggregation

As shown in the preceding figure, ports 0/0/1-0/0/3 of switch A and switch

B are connected through 10/100/1000M link. The same configurations are
adopted at two ends. The three ports are added to the aggregation group
of each device.
The LACP aggregation mode is adopted. The configuration procedure is as

follows:
Switch A:
link-aggregation 1 mode lacp Create the LACP aggregation group.

Specify the aggregation group ID to 1
port 0/0/1-0/0/3 Enter the port mode
link-aggregation 1 active Add the ports in the active status to the

aggregation group 1
Switch B:
link-aggregation 1 mode lacp Create the LACP aggregation group.

Specify the aggregation group ID to 1
port 0/0/1-0/0/3 Enter the port mode
link-aggregation 1 active Add the ports in the active status to the

aggregation group 1
Through the preceding configuration, an aggregation link is created. For

detailed configuration commands, see chapter Configuring Link
Aggregation.

MSTP
In the L2 switching network, the loopback may cause loop and propagation
of packets and thus broadcast storm is generated. As a result, all valid
bandwidth is occupied and the network is unavailable. The STP protocol
occurs accordingly. The STP is a L2 management protocol. It selectively
blocks redundant links to eliminate L2 loopback of the network. At the
same time, the protocol provides the link backup function.
Like other protocols, the STP protocol is developing rapidly. At the

beginning, the IEEE 802.1D STP is widely used. On this basis, IEEE 802.1w
RSTP and IEEE802.1s MSTP are generated.
This chapter describes the protocols of STP and focuses on the MSTP.
Main contents:
STP
RSTP
MSTP protocol
MSTP protection function
MSTP typical application
STP Overview
The basic idea of the STP protocol is very simple. Loopback does not occur
in the natural trees. If a network grows like a tree, no loopback will occur.
In the STP protocol, the Root Bridge, Root Port, Designated Port, and Path
Cost are defined. The purpose is to construct a tree to tailor the redundant
loopback and back up links and optimize paths. The algorithm for
constructing the tree is Spanning Tree Algorithm.

STP exchanges the BPDU information between bridges. First, the root
bridge is selected. The selection is based on the bridge ID composed of
bridge priority and MAC address. The bridge with smallest ID will become
the root bridge of the network. All ports are connected to the downstream
bridge. Therefore, all port roles become designated ports. Then, the
downstream bridge connected with the root bridge will select a most
robust branch to serve as the path of the root bridge. The role of the
corresponding port becomes the root port. Perform the operation to the
edge of the network. After the designated port and the root port are
determined, a tree is generated. After 30 seconds (default value), the
designated port and the root port enter the forwarding status. Other ports
enter the block status. The STP BPDU is transmitted from the designated
port of each bridge periodically to maintain the link status. If the network
topology changes, the spanning tree recalculates and the port status
changes accordingly. This is the basic principle of the spanning tree.
With spread of application and the development of network technology,

the disadvantages of STP are exposed in the applications. The
disadvantage of the STP mainly falls on the convergence speed. When the
topology changes, new configuration message can be transmitted to the
entire network after certain delay, which is called Forward Delay. The
default value of the delay is 15 seconds. After all bridges receive the
change information, if the forwarding ports in the old topology do not find
that t hey should stop forwarding in the new topology, a temporary
loopback may exist. In the STP, a timer policy is used to solve the
temporary loopback, namely, add a learning status between the block
status and the forwarding status. The status only learns the MAC address
and does not forward any packets. The duration of status switching is
Forward Delay. As a result, no loopback occurs when the topology changes.
But, the solution brings double Forward Delay convergence time. The time
cannot be accepted in some real-time services (such as audio and video
services).
RSTP Overview
To solve the defect of STP convergence speed, in 2001, the IEEE defines
the RSTP based on IEEE 802.1w. The RSTP protocol improves the STP
protocol in the following three aspects to quicken the convergence (within
one second at maximum):
1. Set Alternate Port and Backup Port for the root port and the
designated port. When the root port fails, the alternate port becomes
the new root port and enters the forwarding status without any delay.
When the designated port fails, the backup port becomes the new
designated port and enters the forwarding status without any delay.
2. In the point-to-point link connecting two switching ports, the

designated port can enter the forwarding status without any delay

through handshaking with the downstream bridge. For the shared link
connecting more than three bridges, the downstream bridge does not
respond to the handshaking requests sent from the upstream
designated port. It waits for double Forward Delay time to enter the
forwarding status.
3. The port connected with terminals but not connected with other
bridges is defined as the Edge Port. The edge port can enter the
forwarding status without any delay.
Compared with the STP protocol, the RSTP protocol is significantly

improved and it is downward compatible with the STP protocol to form a
hybrid network. RSTP and STP belong to the Single Spanning Tree (SST).
It has the following defects:
1. There is only one spanning tree in the entire switching network. When
the network scale is large, the convergence time is long.
2. The RSTP is a single spanning tree protocol, so all VLANs share a

spanning tree. To ensure normal communication in the VLAN, each
VLAN in the network must be distributed along the direction of the
spanning tree path. Otherwise, some VLANs will be isolated for the
internal links are blocked. As a result, communication fails in the VLAN.
3. When a link is blocked, no traffic is carried and thus load cannot be

balanced, which wastes the bandwidth.
The defects cannot be removed by the single spanning tree. The MSTP
supporting VLAN occurs.
MSTP Protocol
Terms
Multiple Spanning Tree Regions
It is composed of multiple devices and the network segment between

them. The devices are enabled with MSTP. The devices have the same
region names, revision levels, and same configuration of mapping from
VLAN to spanning tree.
VLAN Mapping Table
It is an attribute of the MST region. It is an instance table for describing

VLAN and spanning tree instance relation. For example, VLAN1 is mapped

to spanning tree instance 1, VLAN2 is mapped to spanning tree instance

2,and the other VLANs are all mapped to CIST.
Internal Spanning Tree
IST is a spanning tree in the MSTP domain. It is instance 0 in the MST

domain. It and CST form the spanning tree CIST of the entire switching
network.
Common Spanning Tree
CST is the single spanning tree connecting all MST domains in the switch
network. If each MST domain is regarded as a device, CST is a spanning
tree generated by the MSTP protocol.
Common and Internal Spanning Tree
CIST is composed of IST and CST. It is a single spanning tree connecting

all devices in the switching network.
Multiple Spanning Tree Instance
Multiple spanning trees can be generated in an MST domain. Each tree is

independent. Each spanning tree is called an MSTI.
Introduction to the Protocol

MSTP is a new spanning tree protocol defined in IEEE 802.1s. Compared
with STP and RSTP, it has obvious advantages. The features of the MSTP
are as follows:
1. The domain concept is used in the MSTP. One switching network can
be divided into multiple domains. Multiple spanning trees are
generated in each domain and each spanning tree is independent.
Between domains, the MSTP uses the CIST to ensure that no loopback
exists in the global topology.
2. The Instance concept is used in the MSTP. Multiple VLANs are mapped
to an instance to save communication overhead and resource
utilization. The calculation of each MSTP instance is independent.
(Each instance corresponds to a spanning tree). In these instances,
the load of VLAN data can be shared.
3. MSTP can implement the port status fast transfer similar to the RSTP.

4. MSTP is compatible with STP and RSTP
The MSTP sets the VLAN mapping table to associate VLAN and the
spanning tree. At the same time, it divides a switching network into
multiple domains. Multiple spanning trees are generated in each domain.
Each spanning tree is independent. The MSTP prunes the loopback
network into a loopless tree network to avoid increasing and indefinite
cycling of packets in the loopback network. At the same time, multiple
redundant paths for data forwarding are provided. In the process of data
forwarding, the load of VLAN data is balanced.
For example, in the following network, there are four bridges A, B, C, and
D, including VLAN 10, 20, 30, 40, 50, and 60. Four bridges run the MSTP
protocol. Bridge B, C, and D, are in the same MST domain. Bridge A can
be considered to be in an isolated area. On bridge B, C, and D, map VLAN
10 and VLAN 20 to instance 1, map VLAN 30 and VLAN 40 to instance 2,
map VLAN 50 and VLAN 60 to instance 0.
The connection of CIST is shown in the blue links in the following figure.
Frames of VLAN 50 and 60 are forwarded along the active connection.
Bridge A is the general root of the entire CIST. Bridge B is the region root
of CIST. Port 1 of bridge B is the root port of CIST region root.
Figure 5-1 CIST topology
The connection of instance 1 is shown in the red links in the following

figure. Frames of VLAN 10 and 20 are forwarded along the active
connection. Bridge C is the region root of instance 1; bridge B is the
master port of port 1.

Figure 5-2 Instance 1 topology
The connection of instance 2 is shown in the red links in the following

figure. Frames of VLAN 30 and 40 are forwarded along the active
connection. Bridge D is the region root of instance 2; bridge B is the
master port of port 1.
Figure 5-3 Instance 2 topology
MSTP Protection Function

BPDU Protection
For the access layer device, the access port is usually connected with user
terminal or file server. In this case, the access port is set to be the edge
port to implement fast transfer of the ports. When the ports receive BPDU
packets, the system automatically sets the ports to be non-edge ports. It
re-calculates the spanning tree and the network topology changes.

Normally, the ports do not receive any BPDU packets. If anybody attacks
devices by pretending BPDU, the network oscillation may occur.
The MSTP provides the BPDU Guard function to prevent the attack: after
the BPDU protection function is enabled, if a port whose AdminEdge is
TRUE receives the BPDU packets, the port will be shut down. At the same
time, log information is used to prompt users. The disabled ports can be
restored only by the network administrators. The ports can also be
automatically restored through the port management module.
Root Protection
The root bridge and the backup root bridge of the spanning tree should be
in the same domain, especially for the CIST root bridge and backup bridge.
In the network design, the CIST root bridge and back root bridge are
usually placed in a high bandwidth core domain. But, owing to the
incorrect configuration and the malicious attack in the network, the legal
root bridge in the network may receive BPDU with higher priority. As a
result, the legal root bridge loses the position of the root bridge and the
network topology changes. The illegal changes may cause that the high-
speed link traffic is led to the low-speed link. As a result, the network is
congested.
For the ports enabled with Root Guard function, the port roles in all
instances can only be the specified port. Once the port receives the BPDU
with higher instance priority, the port will be blocked. If no configuration
information with higher priority is received, the port will be restored to the
original status.
Loop Protection
By receiving the BPDU packets sent from the upstream devices, the device
can maintain the status of root ports and other congested ports. Owing to
the link congestion or unidirectional link fault, the ports cannot receive
BPDU packets sent from the upstream devices. The spanning tree
information on the port times out. In this case, the downstream device re-
selects the port role. The downstream device port that cannot receive
BPDU packets will become the designated port and the congested port will
be transferred to the forwarding status. Then, a loopback occurs in the
switching network.
The loop guard function suppresses the generation of the loopback.For the
port configured with the Loop Guard, when the BPDU packets from the
upstream devices cannot be received, the spanning tree information times

out, in the case of recalculating the port roles, set all instances to the
Blocking status, and the port does not participate in the spanning tree
calculation. If the port receives the BPDU packets, it re-participates in the
spanning tree calculation.
MSTP Typical Application

Through MSTP, the packets of different VLANs in the same network can be
forwarded according to different spanning tree. As a result, load sharing
and redundant backup can be performed for packets of different VLANs. As
shown in the following figure, Switch A and Switch B are the devices of the
aggregation layer. Switch C and Switch D are the devices of the access
layer. To balance the traffic on each link, configure the devices as follows:
All devices belong to the same MST domain.
VLAN 10 packets are forwarded along instance 1; root bridge of

instance 1 is switch A.

instance 2 is switch B.

instance 3 is switch A.

instance 4 is switch B.
Figure 5-4 MSTP networking
After the MSTP calculation, the forwarding paths of different VLANs are
shown in figure 5-5. As a result, the load of each link is reduced. At the

same time, each VLAN has a redundant backup link. When the working
link fails, the redundant link takes effect immediately, which reduces the
traffic lose caused by link failure.
Figure 5-5 MSTP forwarding path

QinQ Technology
This chapter describes the QinQ technology and application.
Main contents:
New requirements of service development
QinQ supports multiple services
Realizing modes of QinQ
Application scene of QinQ
New Requirements of Service

Development
With the development of the technology, the user hopes to divide its
internal network VLAN to realize the security and reliability of the internal
network as desired. The network provider has the special requirements for
the VLAN quantity and VLAN ID supported by the user. The VLAN ID
ranges needed by different users may overlap with each other, so the
division of the internal network of the user is limited With the service
development, more and more VLANs are needed to support identifying and
separating services The maximum number of the VLAN IDs of the network
provider is 4K. In the actual application, when there are lots of users,
VLAN IDs are consumed up and cannot meet the requirement.
Therefore, the QinQ technology comes into being. QinQ expands the VLAN
technology and increases the VLAN quantity to 4K 4K via the double
layers of tags.

QinQ Supports Multiple Services

What is QinQ?
The QinQ technology is called VLAN dot1q tunnel, 802.1Q tunnel, VLAN
Stacking technology. The standard comes from IEEE 802.1ad and it is the
expansion of the 802.1Q protocol. QinQ adds one layer of 802.1Q tag
(VLAN tag) based on the original 802.1Q packet head. With the double
layers of tags, the VLAN quantity is increased to 802.1Q. QinQ
encapsulates the private network VLAN tag of the user in the public
network VLAN Tag to make the packet with double layers of VLAN Tags
cross the backbone network (public network) of the operator. In the public
network, the packet is broadcasted according to the out layer of VLAN tag
(that is the public network VLAN Tag) and the private network VLAN Tag
of the user is shielded
The formats of the common 802.1Q packet with one layer of VLAN TAG
and the QinQ packet with two layers of VLAN TAGs are as follows:
The formats of common VLAN packet and QinQ packet
Two layers of VLAN tags can support 4K 4K VLANs, meeting most

requirements.
QinQ features:
1. Provide one simple L2 VPN tunnel for the user;
2. Do not need the supporting of the protocol and signaling; be realized

by the static configuration;
QinQ mainly solves the following problems:
1. Shield the VLAN ID of the user, so as to save the public network VLAN
ID resource of the service provider;

2. The user can plan the private network VLAN ID, avoiding the
confliction with the public network and other user VLAN IDs;
3. Provide the simple L2 VLAN solution;
The process of realizing QinQ:
QinQ diagram
The upstream packet of the CE1 switch carries one layer of VLAN tag. The
packet reaches the QinQ port of the PE1 switch. According to the
configuration of the QinQ port, add one out layer of VLAN TAG to the
packet. The packet with two layers of VLAN tags is forwarded to PE2 via
the public network. On the QinQ port of PE2, the out layer of VLAN TAG is
deleted, and the packet recovers to have one layer of VLAN Tag and is
forwarded to CE2.
Realizing Modes of QinQ

QinQ is divided to two kinds, including basic QinQ and selective QinQ.
Basic QinQ: When receiving the packet, the QinQ port adds the VLAN TAG
of the default VLAN of the port to the packet no matter whether the packet
has the VLAN TAG. Before the packet is forwarded out from the QinQ port,
delete the out layer of TAG and then forward it. The disadvantage of the
method is that the encapsulated out layer of VLAN cannot be selected
according to the VLAN TAG of the packet.

Selective QinQ: The selective QinQ solves the disadvantage of the basic
QinQ. When receiving the packet, the QinQ port adds the specified out
layer of VLAN TAG to the packet according to the VLAN TAG of the packet.
If the encapsulated out layer of VLAN TAG is not specified, add the VLAN
TAG of the default VLAN of the port to the packet.
QinQ expansion: Configure the mapping entries on the QinQ port to

replace the VLAN TAG of the packet with the specified VLAN TAG to realize
the conversion of the VLAN TAG. The function is called VLAN Mapping.
TPID (Tag Protocol Identifier): It is one field in VLAN TAG, used to indicate
the protocol type of VLAN TAG. IEEE 802.1Q protocol defines the value of
the field as 0x8100 The default value of the out layer of TPID of QinQ is
0x8100. The TPID of the out VLAN TAG of the device QinQ packet of some
manufacturer is 0x9100 or 0x9200. The user can modify the TPID of the
port at the public network to realize the intercommunication of the devices
of different manufacturers.
Introduction to QinQ Application

Scene
Configure the selective QinQ entries on the ports of the switch that
supports QinQ and encapsulate the out TAG according to the VLAN TAG.
Different VLAN TAGs can be encapsulated with different out VLAN TAGs.
The enterprises divide different VLANs according to services, so as to
realize the separation and security of the private network. On the
enterprise access port, encapsulate the out VLAN TAG for the enterprise
packet. The out VLAN ID is the VLAN ID provided by the service operator.
With the simple VLAN solution provided by the QinQ function, the
communication between different places of the enterprises and the
separated security between different services are realized.

QinQ service division and flow diagram

L2 Protocol Control
Technology
This chapter describes the L2 protocol control technology and its

application.
Main contents:
L2 protocol control theory
Realize L2 protocol control
Typical application
L2 protocol control Theory

L2 protocol control controls the L2 protocol packets received on the port
With L2 protocol control, L2 protocol tunnel, L2 protocol discard and L2
protocol peer can be realized
L2 Protocol Tunnel
With L2 protocol tunnel, the L2 protocol packets (such as BPDU and
LACPDU) of the customer network can be transmitted transparently in the
operators network.
The upper is the operators network and the lower is the user network,
which includes the user network A and user network B. Configure the L2
Protocol Tunnel function on the packet input and output devices at the two
sides of the operators network so that the BPDU and LACPDU packets of
the user network can be transmitted transparently in the operators
network. Besides, the spanning tree calculation and link aggregation
functions of the whole user network can be realized.

L2 protocol tunnel network
L2 Protocol Discard
With L2 protocol discard, the port directly discards the received BPDU and
LACPDU packets so that the packets do not take part in the corresponding
protocol processing.
L2 Protocol Peer
With L2 protocol peer, the port does not process the received BPDU and
LACPDU packets, but directly forward the packets to the upper protocol
module for processing. The function is the default function.
L2 protocol Control Supports EVC

Application
L2 protocol pass-to-evc combines with the EVC application. The
configuration of EVC for the control type of the BPDU and LACPDU packets
decides the L2 protocol Control function (discard or tunnel).

Realize L2 protocol control

Realize L2 Protocol Tunnel
When the L2 Protocol Tunnel function is enabled, the edge device of the
operators network replaces the destination MAC address of the L2
protocol packet at the input direction with one special multicast MAC
address and the packet becomes the tunnel packet. The internal device of
the operators network does not process the packet, but just forwards it as
the common packet. When the tunnel packet reaches the edge device at
the output direction, the edge device recovers the original destination MAC
address and the L2 protocol packet is recovered and then forwarded to the
device of the user network, so as to realize the L2 protocol tunnel function.
The default special multicast MAC address is 01-00-0c-cd-cd-d0. Other
common-used special multicast MAC addresses are 01-00-0c-cd-cd-d1,
01-00-0c-cd-cd-d2, and 01-0f-e2-00-00-03. Enable the L2 protocol packet
tunnel function on the two edge ports to realize the tunnel function of the
L2 protocol.
Currently, the bmga protocol, dot1x protocol, gmrp protocol, gvrp protocol,
lacp protocol and stp (mstp) protocol support the L2 protocol tunnel
function.
Realize L2 Protocol Discard

When the L2 protocol discard function is configured on the port, the L2
protocol control module discards the separated L2 protocol packets so that
the protocol packets do not take part in the processing of the protocol
module.
Realize L2 Protocol Peer

L2 protocol control module does not process the packet, but forwards the
packet to the upper protocol module for processing.
Typical Application
PE1 and PE2 are the devices of the operator network. Customer A and
Customer B are the devices of the user network.

The networking requirement: To realize the tunnel transmission of the STP

packets between Customer A and Customer B, the L2 protocol tunnel of
the STP packets needs to be set up between PE1 and PE2.
Networking
The user enables the L2 tunnel function of the STP protocol packets on the
edge ports Port0/0/2 of PE1 and Port0/0/2 of PE2. The network between
PE1 and PE2 can pass the tunnel packets.

L2 Multicast
This chapter describes the public part of L2 multicast, L2 static multicast,

IGMP Snooping, IGMP Proxy, MVR, MVP and the applications.
Main contents:
Public part of L2 multicast
L2 static multicast and its application
IGMP Snooping and its application
IGMP Proxy and its application
MVR and its application
MVP and its application (the function is just applicable to

MyPower3400 and S4100 series switch)
Public Part of L2 Multicast

This section describes the principles of the L2 multicast public part.
Main contents:
Terms
Introduction
Terms
1. L2 multicast comprehensive table: the table integrates the L2
multicast information obtained in static configuration and dynamic
learning. In each entry, the VLAN, multicast MAC address, and output
port list obtained through static configuration and dynamic learning
are contained.

2. L2 multicast forwarding table: similar to the L2 multicast

comprehensive table. The output port list in each table entry is formed
after the corresponding L2 multicast comprehensive table port is
filtered by VLAN and the aggregation group is converted into member
port. The table entry is used to determine the forwarding port list of
L2 multicast.
Introduction
The public part of the L2 multicast is the middle layer connecting bottom
layer chips and the L2 multicast applications. It integrates the L2 multicast
applications (for example, configured in L2 static multicast and learned
from IGMP Snooping dynamic L2 multicast application) to form the L2
multicast forwarding table, and delivers the entries to the bottom layer
chips. Consequently, the hardware forwarding table is formed.
Entr y Main tenance

L2 multicast public part, integrates the L2 multicast information in static
configuration and dynamic learning. Then, the L2 multicast comprehensive
table is formed. In the process of forming multicast comprehensive table,
static configuration is preferred in the integrated processing. For example,
if the L2 multicast group (in the [VLAN, MAC] mode) is not allowed to be
forwarded in the static configuration, when the dynamic L2 multicast
learned the members of the L2 multicast group in the port, the port cannot
become an output port, and it cannot duplicate and forward the L2
multicast packets.
Based on the L2 multicast comprehensive table, L2 multicast public parts

add or delete output port list to form the forwarding port list through VLAN
filtering and converting the aggregation group output ports into
aggregation group member port list. As a result, the L2 multicast
forwarding table is created. The table is used for L2 multicast forwarding.
At last, the forwarding table is written into the hardware forwarding table.
L2 Mult icast For w arding

When the device receives a L2 multicast packet from a port, it first
searches the hardware forwarding table. If the hardware forwarding table
entry is not found, the packet is flooded in the reached destination (except
the port that the packet reached). If the hardware forwarding table entry
is not found, and the configuration is to discard the unknown multicast,
the packet will be discarded. If the corresponding hardware table entry is
found, the multicast packet is duplicated and forwarded in all output ports
(except the port that the packet reached) specified by the hardware table
entry. The basis for searching L2 multicast forwarding is the unit doublet

determined by VLAN and multicast MAC. The forwarding port list is the
collection of ports whose L2 multicast packets should be duplicated and
forwarded.
L2 Static Multicast and Its

Application
This section describes the principles and application of L2 static multicast.
Main contents:
Terms
Introduction
Typical Application
Terms
L2 static multicast table: a table of L2 static multicast maintenance, each
table entry is the L2 static multicast information generated in static
configuration. The information covers VLAN, multicast MAC, member port
list, and forbidden port list.
Introduction
The L2 static multicast can generate L2 multicast information through the
static configuration. The VLAN, multicast MAC, member port list, and
forbidden port list should be specified. The L2 static multicast table entry
generates the related entries through L2 multicast public part. At last, the
entries are delivered to the hardware forwarding table.
M ember Port Lis t

If a port belongs to the member port list of the L2 static multicast table
entry, after the corresponding L2 multicast packets are received, they will
be duplicated and forwarded in the port.

Forbidden Port L ist

If a port belongs to the forbidden port list of the L2 static multicast table
entry, after the corresponding L2 multicast packets are received, they will
not be duplicated and forwarded in the port. According to the preferred
static configuration policy of the L2 multicast public part, if the dynamic L2
multicast learns the received member in the port, the L2 multicast packets
will not be duplicated and forwarded.
Typical Application
Figure 8-1 Application of L2 static multicast
As shown in the preceding figure, the video server and the switch are
connected. The video server sends multicast video programs. The
receivers PC1, PC2, and PC3 are connected with the switch. The ports
connected with the video server and receiver PC belong to the same VLAN.
Create L2 static multicast table entry according to the VLAN and multicast
MAC. Then, set the port connected with PC1 to be member port. Set the
port connected with PC2 to be forbidden port. Do not configure the port
connected with PC3. PC1 can receive the video programs. PC2 and PC3
cannot receive the video programs.
IGMP Snooping and Its

Application
This section describes the principles of IGMP Snooping.

Main contents:
Terms
Introduction
Terms
1. IGMP: Internet Group Management Protocol, used to maintain the
multicast member qualification protocol advertised to the router or
switch by the host.
2. IGMP Snooping: Internet Group Management Protocol Snooping.
3. Dynamic router port: refers to the port receiving IGMP query

packets or L3 multicast protocol packets (such as PIM hello) in the
switch.
4. Dynamic member port: refers to the port receiving IGMP member

relation report in the switch.
5. IP multicast L2 forwarding: uses VLAN, multicast source IP

address (for (*, G), the address is 0.0.0.0, this is related with
switch chip), and multicast destination address to forward L2
multicast service.
Introduction
The IGMP protocol creates and maintains the multicast member
qualification between the host and the router. The IGMP protocol is
running between the host and the connected multicast routers. At one side,
the host notifies the multicast router through the IGMP protocol that it
wants to join in and receive the information of specific multicast group (or
specific multicast source); at other side, the router queries through the
IGMP protocol whether any members are in the active status in the local
network segment, namely, check whether any multicast group member
exists in the network segment and then collects the member information I
the local network segment. The multicast router only cares whether any
multicast group member exists in the local network segment; it does not
care the number of the members in the network segment. If there is one
group member, the router will forward the service data of the specified
multicast group (or specified multicast source) to the network segment.
IGMP has three versions: IGMPV1, V2, and V3. The most common version
is IGMP V2. IGMP V1 is defined in RFC1112. It describes the process of
universal query and qualification report. IGMPv2 is defined in RFC 2236.

On the basis of IGMP V1, it adds the group member quick leave
mechanism and querier selection function. IGMPv3 is defined in RFC 3376.
On the basis of IGMP V2, the source filtering function is added. It can
specify to receive specific multicast group service of certain multicast
source host; it can also exclude specified multicast group service.
Internet Group Management Protocol Snooping (IGMP Snooping), is used

in the switch that does not support IGMP protocol to narrow the
transmission scope of multicast packets to prevent transmitting multicast
packets to the network segment that does not need the packets. It snoops
and analyzes the IGMP packets. It forms and maintains the mapping
relation between multicast MAC or IP address and multicast receiving port
and VLAN. Based on the mapping relation, it forwards multicast traffic.
As shown in the following figure, when the IGMP Snooping is not running
in the L2 device, the multicast data is flooded in the VLAN. The multicast
traffic is forwarded to all ports in the VLAN. When the IGMP Snooping is
running in the in the devices, the known multicast data will not be flooded
in the VLAN, but is forwarded to specified multicast member port.
Figure8-2 Before and after the IGMP Snooping is used
Snoop I G M P Packets to Create Mul ticast

Inf orma tion
IGMP Snooping obtains the multicast information to create related entries
through snooping IGMP packets. The port receiving the query packets is
the router port. The port receiving the member relation packets is the
multicast member port. The switch records all member ports of the device

in the multicast forwarding table of IGMP Snooping. The multicast

forwarding table contains multicast group MAC or IP address, VLAN, and
port list.
For ward the Snooped IG M P Packets

IGMP Snooping traps the IGMP packets to the CPU, and then forwards the
packets as required. The received query packets should be forwarded to
other ports in the VLAN. For the query of specified group without
suppression tag, adjust the group timer to the LMQT. The received
member relation packets, should be forwarded to the router port (if the
report suppression is applied, for V1 and V2, not all report packets will be
forwarded to the router port).
D yn am ic Port Aging Timer

The aging timer is for dynamic port only. After the aging timer times out,
the port will be deleted from the related table entry.
After the IGMP ordinary query packets are received, the switch forwards
the packets through the ports except the receiving port in the VLAN. The
switch processes the receiving port as follows:
1. If the router port list contains the dynamic router port, reset the
aging timer.
2. If the router port list does not contain the dynamic router port,
add the port to the router port list, and then start the aging timer.
After the IGMP member relation report packets are received, the switch
forwards the packets through all router ports in the VLAN. It parses the
multicast group address that the host will be added from the packets and
processes the receiving port as follows:
1. If the forwarding table entry corresponding to the multicast group

does not exist, create the forwarding table. Add the port as
dynamic member port to the output port list and then start the
aging timer.

exists, but the port list does not contain the port, add the port as
dynamic member port to the output port list and then start the
aging timer.

exists, and the port list contains the port, restart the aging timer.

I P Mul ticast L2 For wardi ng

The L2 multicast forwarding is performed through the VLAN and MAC
address. Refer to the section of L2 multicast forwarding in the L2 multicast
public part.
For the table entry formed by IGMP SNOOPING, the IP multicast L2

forwarding can be performed. The forwarding table entry of (VLAN, *, G)
and (VLAN, S, G) is formed. In the table entry, the member port list is
specified. The VLAN in the table entry is the VLAN of the multicast packet;
* indicates matching all multicast source IP address; S indicates the
multicast source IP address; G indicates the multicast destination address.
When the IP multicast L2 forwarding is used, and a multicast packet
reaches a port, check whether the (VLAN, S, G) entry exists. If the entry
exists, duplicate and forward the multicast packets according to the
member ports specified by the entry and then end the forwarding.
Otherwise, check whether (VLAN, *, G) entry exists. If the entry exists,
duplicate and forward the multicast packets according to the member
ports specified by the entry and then end the forwarding. If the two
entries do not exist, duplicate and forward the multicast packets in all
member ports of the VLAN. If the table entry is not found, and the
configuration is to discard the unknown multicast, the packet will be
discarded.
In the preceding forwarding process, you cannot forward the packets from
the multicast packet to the port.
IGMP Proxy and Its Application

This section describes the principles of IGMP Proxy.
Main contents:
Terms
Introduction

Terms
IGMP PROXY: The switch is logically divided into two parts. One part acts
as the IGMP group member, responsible for sending IGMP member report
to the router. The other part acts as the multicast router, sends IGMP
query to the downstream port list and collects member information to
form the member database. Different from IGMP SNOOPING, IGMP Proxy
integrates the port member information to form its own IGMP member
report.
Introduction
Figure 8-3 Working principle of IGMP proxy
The preceding figure is the working principle of IGMP proxy. The L2 switch
running IGMP proxy is logically divided into two parts: IGMP group
member and multicast router. The multicast router disguises the switch to
be a multicast router to send IGMP query information and collect IGMP
member information. The multicast router integrates some group member
information and forms the IGMP proxy member database. The IGMP group
member reports the IGMP member information to the real multicast router
according to the IGMP proxy member database information. Different from
IGMP SNOOPING, the IGMP member report of the downstream receive
host and the leave message are terminated in the switch running IPMP
proxy. The query messages sent by multicast router are also terminated in
the switch running IGMP proxy. IGMP proxy automatically sends IGMP

protocol query, member report, and leave message. But IGMP SNOOPING
only forwards the messages.
Typical Application
Figure 8-4 Typical application of IGMP proxy
As shown in the preceding figure, when no IGMP proxy is running in the

switch, the switch will forward the IGMP reports of terminal A, B, and C to
the router. In addition, it will forward the query packets of the router to
the downstream terminals. After the switch runs the IGMP proxy, the
upstream query will not be forwarded to the downstream terminals. The
IGMP report of terminal A, B, and C will not be forwarded to the router.
The switch sends query packets to the downstream. It integrates the
reports of terminal A, B, and C to form member database. It forms report
according to the group record of the member database and sends to the
router. The effects of receiving proxy report and receiving downstream
terminal A, B, and C are the same. But it reduces the quantity of IGMP
report packets received by the router and relieve the pressure of the
router.
MVR and Its Application

This section describes the principles and applications of MVR.
Main contents:

Terms
Introduction
Terms
MVR: Multicast VLAN Registration.
Introduction
In the traditional multicasting VOD mode, when users of different VLANs
select programs in the multicasting VOD mode, the multicasting data is
duplicated in each VLAN. This mode wastes large amount of bandwidth
and increases the load of layer 3 equipment. To solve the problem, you
can configure the multicasting VLAN function in the switch, that is, add the
user interfaces belonging to different VLANs to the multicasting VLAN, and
enable the IGMP snooping function. Through the VLAN conversion, the
IGMP joining and leaving packets received by the multicasting VLAN
interface carry the tag of multicasting VLAN. The forwarding table of
multicasting VLAN is generated in the switch. As a result, the multicasting
data only need to send one copy in the multicasting VLAN, and users of
different VLANs can receive the multicasting data. This mode of joining the
user interfaces that should receive multicasting data to the multicasting
VLAN and become the member of multicasting VLAN is called Multicast
VLAN Registration (MVR).
Figure 8-5 Before and after enabling MVR

For warding Table of the Mul ticast VL AN

The multicast forwarding table formed through IGMP Snooping includes
multicast group MAC or IP address, VLAN, and port list. After the MVR
function is enabled, the switch analyzes the member relation report
packets received from the member port of multicast VLAN. If the VLAN tag
of the packets is not multicast VLAN, convert the VLAN into multicast VLAN.
Then, process the forwarding table forming the multicast VLAN.
Consequently, the multicast traffic only needs to be duplicated once in the
multicast VLAN.
Typical Application
The MVR improves the multicast application. It can save the bandwidth
and reduces the burden of L3 devices. In all multicast application
environments, the MVR can be used. The following figure describes the live
web broadcasting.
Figure 8-6 MVR application in live web broadcasting
Through the multicast technology, router A transmits the video traffic to

the video terminals connected with switch A and switch B. According to the
preceding figure, if the six ports connecting video terminals in the two
switches belong to different VLANs, if the MVR is not enabled, router A
needs to send three copies of video traffic to the connected switch. If the
MVR is enabled, it only needs to send one copy of video traffic to each
switch. As a result, the network traffic is reduced and the bandwidth is
saved. At the same time, the burden of the router A is relieved. In the
application environment occupying much bandwidth such as live web
broadcasting, the function of MVR is significant.

MVP and Its Application

This section describes the principle and application of MVP.
Main contents:
Terms
Introduction
Terms
MVP: Multicast VLAN Plus.
Introduction
In the traditional multicast distribution mode, when the users belong to
different VLANs, the upstream will duplicate the multicast data for each
VLAN. It occupies large amount of bandwidth and adds extra burden to L3
device. To solve the problem, you can configure the MVP function in the
switch. The home VLAN of the receiver joins the multicast VLAN as sub-
VLAN. As a result, the receiver of main VLAN and the sub-VLAN in the
multicast VLAN can receive the multicast data flow. Compared with
traditional multicast forwarding mode, the upstream only needs to send
one copy of data to the multicast VLAN. Consequently, the bandwidth is
saved and the upstream pressure is relieved. Compared with MVR, it does
not require than all receivers should join in the multicast VLAN. The cross-
VLAN multicast duplication can be implemented. Users of different VLANs
are isolated, which ensures the security.

Figure 8-7 Before and after enabling MVP
For warding Table of the Mul ticast VL AN

The switch forms records in the multicast VLAN and each sub-VLAN
through the IGMP Snooping. The MVP forms the multicast forwarding table
of MVP according to the group records. The MVP multicast forwarding table
contains multicast primary VLAN, source IP address, group IP address, and
the forwarding port list of primary VLAN. At the same time, it contains
sub-VLAN and forwarding port list. After the multicast data enters the
primary VLAN of the multicast, search the entry according to multicast
primary VLAN, source IP, and group IP. If it is matched, forward the data
according to the forwarding port and sub-VLAN forwarding port of the
multicast primary VLAN. If it is not matched, discard or flood in the
multicast primary VLAN according to the configured policy. L2 multicast
can only be forwarded in the local VLAN. After the MVP function is enabled,
the switch can forward the multicast traffic received from the multicast
VLAN according to the MVP forwarding table to the multicast VLAN and
multicast sub-VLAN forwarding table. As a result, the multicast traffic can
be duplicated to receivers of different VLANs.
Typical Application
The MVP can save bandwidth and reduce the burden of L3 devices. The
following figure describes the live web broadcasting:

Figure 8-8 MVP application in live web broadcasting
Through the multicast technology, router A transmits the video traffic to

the video terminals connected with switch A and switch B. According to the
preceding figure, if the six ports connecting video terminals in the two
switches belong to different VLANs, if the MVP is not enabled, router A
needs to send three copies of video traffic to the connected switch. If the
MVP is enabled, it only needs to send one copy of video traffic to each
switch. As a result, the network traffic is reduced and the bandwidth is
saved. At the same time, the burden of the router A is relieved. Compared
with MVR, the terminals are different VLANs. The isolation of VLAN ensures
the security. In the application environment occupying much bandwidth
such as live web broadcasting, the function of MVP is significant.

Security Technology
This chapter describes the related technologies of L2 security and its

application.
Main contents:
802.1X technology
DHCP Snooping technology
IP Source Guard technology
Dynamic ARP detection technology
Port security
Port monitoring
Port isolation
802.1X Protocol and Application

This section describes the 802.1X theory and realization, and its
application.
Main contents:
Related terms
Introduction
Typical application

Related Terms
Supplicant system: It is the client, an entity located at the LAN. It is
authenticated by the device at the other side of the link. The client is
one user terminal device. The user initiates the 802.1X authentication
by enabling the client software.
Authenticator system: It is the device side, another entity located at

the LAN. It authenticated the connected client. The device side is the
network device that supports the 802.1X protocol. It provides the LAN
ports for the client.
Authentication server system: It is the authentication server, the

entity that provides the authentication service for the device side. The
authentication server is used to perform the authentication,
authorization, and accounting for the user. It is usually RADIUS
(Remote Authentication Dial-In User Service) server. The server can
save the user information, including user name, password, VLAN and
so on.
PAE (Port Access Entity): It is the entity that executes the calculation
and protocol operations in 802.1X.
Non-controlled port/controlled port: The device side provides the ports

of accessing the LAN for the client. The ports are divided to two logical
ports, including non-controlled port and controlled port. The non-
controlled port is always in the bi-directional connected state. It is
mainly used to transmit the EAPOL protocol frames, ensuring that the
client can send and receive the packets. The controlled port in the
authorized state is in the bidirectional connected state. It is used to
transmit the service packets and prohibits receiving any packet from
the client in the un-authorized state.
Introduction
802.1 X Au thent ication System Structu re

Figure 9-1 802.1X architecture
The 802.1X system is the typical client/server structure, including three

entities, that is, Supplicant system, Authentication system, and
Authentication server system. The 802.1X authentication system uses the
EAP protocol to realize the exchange of the authentication information
among Supplicant system, Authenticator system, and Authentication
server system. The EAP protocol packet between Supplicant PAE and
Authenticator PAE uses the EAPOL encapsulation format. Authenticator
system uses the non-controlled port to receive and send the EAPOL frames.
Authenticator PAE and Authentication server carry the EAP protocol
packets on other high-layer protocol (usually, it is the Radius protocol) for
communication, so as to exchange the authentication information.
Authenticator PAE changes the authorization status of the controlled port
according to the authentication result returned by Authentication server,
so as to permit or deny Supplicant system to access the network resources.
EAPOL Message Encapsu lation

1. EAPOL Message Format
Figure 9-2 EAPOL message format
PAE Ethernet Type: the protocol type, 0x888E.
Protocol Version: the protocol version supported by the sender of

the EAPOL frame.
Type: the EAPOL frame type, including EAP-Packet (0x00)

authentication frame, EAPOL-Start (0x01) authentication initiation
frame, EAPOL-Logoff (0x02) exit request frame.
Length: It is the data length, that is, the length of the Packet Body If
it is 0, it means that there is no data.
Packet Body: the data contents, varying with the type.
2. EAP Message Format

When the Type of the EAPOL message is EAP-Packet, Packet Body is the
EAP packet structure, as follows:
Figure 9-3 EAP encapsulation format
Code: the EAP type, including Request, Response, Success, and Failure.
Success and Failure do not have Data field. The value of Length is 4.
The Data field format of Request and Response is as follows. Type is the
EAP authentication type and the contents of Type data depend on Type.
Figure 9-4 The Data field format of Request and Response
Identifier: perform the Request and Response message matching;
Length: The length of the EAP packet, including Code, Identifier,

Length and Data fields.
Data: the contents of the EAP packet, depending on the Code type.
3. Encapsulation of EAP Attribute
To support the EAP authentication, RADIUS adds two attributes, that is,
EAP-Message and Message-Authenticator.
EAP-Message

Figure 9-5 EAP-Message encapsulation
As shown in figure 9-5, the attribute is used to encapsulate the EAP packet.
The type code is 79 and the String field is 253 bytes at most. If the length
of the EAP packet is larger than 253 bytes, you can fragmentize the packet
and encapsulate in multiple EAP-Message attributes.
Message-Authenticator
Figure 9-6 EAP-Authenticator attribute
As shown in figure 9-6, the attribute is used to prevent the access request
packet from being monitored when using the EAP and CHAP authentication.
The packet with the EAP-Message attribute must contain Message-
Authenticator at the same time. Otherwise, the packet is regarded as
invalid and discarded.
802.1 X Au thent ication

The authentication can be initiated by Supplicant system or Authenticator
system. On one hand, Authenticator system actively sends the EAP-
Request/Identity packet to Supplicant system to initiate the authentication;
on the other hand, Supplicant system can send the EAPOL-Start packet to
Authenticator system via the software to initiate the authentication. The
following takes the Supplicant system to actively initiate the authentication
as an example. The EAP protocol supports the multiple authentication
methods. The following takes EAP-MD5 as an example to describe the
basic service flow.

Figure 9-7 Service flow of 802.1X authentication system
The authentication process is as follows:
1. When the user has the requirement of accessing the network, enable
the 802.1x client program, input the applied and registered user name
and password, and initiate the connection request (EAPOL-Start
packet). Here, the client program sends the authentication request
packet to the device side and starts one authentication.
2. After receiving the authentication request data frame, the device side
sends one request frame (EAP-Request/Identity packet) to ask the
client program of the user to send the input user name.
3. The client program answers the request of the device side and sends
the user name information to the device side via the data frame (EAP-
Response/Identity packet). The device side encapsulates the data

frames sent by the client in the RADIUS Access-Request packet and

then sends it to the authentication server for processing.
4. After receiving the user name information forwarded by the device

side, the RADIUS server compares it with the user name table in the
database, finds the corresponding password information of the user
name, and uses one random-generated encryption word to encrypt it,
and then sends the encryption word to the device side via the RADIUS
Access-Challenge packet . The device side forwards it to the client
program.
5. After receiving the encryption word (EAP-Request/MD5 Challenge

packet) sent by the device side, the client program uses the
encryption word to encrypt the password (the encryption algorithm is
irreversible; generate the EAP-Response/MD5 Challenge packet) and
sends it to the authentication server via the device side.
6. The RADIUS server compares the received encrypted password

information (RADIUS Access-Request packet) with the local password
information after the encryption algorithm. If they are the same,
regard the user as the legal user and feed back the message of
passing the authentication (RADIUS Access-Accept packet and EAP-
Success packet).
7. After receiving the message of passing the authentication, the device

changes the port to the authorized state and permits the user to
access the network via the port.
8. The client also can send the EAPOL-Logoff packet to the device side to
ask for logout actively. The device side changes the port status from
the authorized state to the un-authorized state and sends the EAP-
Failure packet to the client.
Technolog ies C ooperat ing with 802.1 X

Auto Vlan:
Auto Vlan in the port-based access control mode is valid only on the
ACCESS port. Auto Vlan in the MAC-based access control mode is valid
only on the HYBRID port. In other access control modes, Auto Vlan is
invalid.
Auto Vlan is also called Assigned Vlan. When the 802.1x user passes the
authentication on the server, the server delivers the authorized VLAN
information to the device side. If the delivered VLAN is illegal (VLAN ID is
wrong or the VLAN does not exist), the authentication fails. Otherwise, the
authentication port is added to the delivered VLAN. After the user logs out,
the port recovers to the unauthorized state and is deleted from the Auto
Vlan. The default VLAN of the port recovers to the previous configured
VLAN.

The authorized delivered Auto Vlan does not change or affect the port
configuration, but the priority of the authorized delivered Auto Vlan is
higher than that of the Vlan configured by the user (that is Config Vlan),
that is to say, the effective Vlan after passing the authentication is the
authorized delivered Auto Vlan and the Config Vlan takes effect after the
user logs out.
The three associated Radius attributes:
[64] Tunnel-Type = Vlan
[65] Tunnel-Medium-Type = 802
[81] Tunnel-Private-Group-ID = Vlan name or Vlan Id
Guest Vlan:
Guest Vlan in the port-based access control mode takes effect only on the
ACCESS port. Guest Vlan in the MAC-based access control mode takes
effect only on the HYBRID port. It does not take effect in other access
control mode.
The Guest Vlan function is used to permit the un-authenticated users to

access some specified resources. The authenticated port of the user
belongs to one default VLAN (that is Guest Vlan) before passing the
802.1X authentication. To access the resources in the Guest Vlan, the user
does not need the authentication, but cannot access other network
resources. After passing the authentication, the port leaves Guest Vlan
and the user can access other network resources.
The user in Guest Vlan can get the 802.1X client software, upgrade the
client, or execute other application upgrade programs (such as anti virus
software and operation system patch program).
After enabling the 802.1X and configuring Guest Vlan, the port is added to
the Guest Vlan in untagged mode. Here, the users of the ports in the
Guest Vlan initiate authentication. If the authentication fails, the port is
still in Guest Vlan; if the authentication succeeds, there are two cases as
follows:
1. If the authentication server delivers one Vlan, the port leaves Guest
Vlan and is added to the delivered Vlan. After the user logs out, the
port returns to Guest Vlan.

2. If the authentication server does not deliver Vlan, the port leaves
Guest Vlan and is added to Config Vlan. After the user logs out, the
port returns to Guest Vlan.
802.1 X Expansion
User-based authentication:
The standard 802.1X protocol is based on the port to realize, that is, as
long as one user of the port passes the authentication, the other users can
use the network resources without authentication, but after the user logs
out, the other users also are denied to use the network. Maipu switch
supports the user-based authentication (based on MAC address). When
the port is configured as the user-based authentication, each user of the
port needs the separate authentication. Only the users that pass the
authentication can use the network resources. After one user logs out,
only the user cannot use the network, but the other authenticated users
still can use the network.
EAP termination mode:
The standard 802.1X protocol defines that the client and the server
interact with each other via the EAP packet. During the interaction, the
device serves as the role of EAP relay. The device encapsulates the EAP
data sent from the authentication server in the EAPOL packet and then
sends it to the client. The interaction mode is called EAP relay. The EAP
relay requires that the authentication server supports the EAP protocol.
Otherwise, the authentication server cannot interact with the client by
using EAP. Considering the actual application environment, maybe the
previous deployed authentication sever does not support the EAP protocol,
so Maipu switch expands it and supports the EAP termination mode. The
EAP data of the client is not directly sent to the authentication server, but
the device completes the EAP interaction with the client. The device gets
the authentication information of the user from the EAP data and then
sends it to the authentication server for authentication. If adopting the
EAP termination mode, only MD5-based EAP authentication is supported.
When adopting the EAP termination mode, the service interaction flow is
as follows:

Figure 9-8 The service flow of the EAP termination mode of the 802.1X
authentication system
Compare Figure 9-8 with Figure 9-7, and we can see that when EAP
termination mode is adopted, the EAP protocol packer is not sent to the
authentication server, but terminates at the device side. The device gets
the enough information from the EAP protocol packet and then sends it to
the authentication server for authentication.
EAP over UDP mode:
In the standard 802.1X function, the client and the authentication device
exchange information via the EAPOL (EAP over LAN) packets. In the actual
application environment, because of the network complexity, maybe the
user to be authenticated and the authentication device need to traverse
the intermediate switch. Once the intermediate switches do not transmit

the EAPOL packets transparently, the user authentication cannot be

performed normally. Therefore, in the environment, you can use the
EAPOU mode to make the authentication packet (EAP packet) to traverse
the intermediate switch. In fact, the EAPOU function means to encapsulate
the original EAP packet in the UDP packet to be forwarded. Compared with
the EAPOL mode, the packet header changes from the original Ethernet
header to Ethernet header + IP header + UDP header, but the EAP
contents are the same. The EAPOU packet is not limited by the
intermediate switch, so the EAPOU mode can realize the 802.1X
authentication across the switch.
Non-client user authentication:
In the actual network, besides lots of PC terminal users, there are some
network terminals (such as network printer), which do not carry or cannot
be installed with 802.1X client program. Therefore, this kind of user
authentication is called non-client user authentication, that is, the so-
called MAC address authentication. The authentication method does not
need the user to install any client software. After the device detects the
user MAC address at the first time, enable the authentication for the user
at once. The authentication process does not need the user to input the
user name and password. After passing the authentication, the user can
access the network. The authentication is suitable for the terminal without
client software to authenticate and the PC terminal user that does not
want to install the client software or does not want to input the user name
or password to authenticate.
When performing the MAC address authentication, you can select the user
name type of the MAC address authentication. Usually, there are the
following two modes:
MAC address user name: Use the MAC address information of the user as
the user name and password for authentication.
Fixed user name: No matter what is the user MAC address, all users use
the local user name and password pre-configured on the device to
authenticate.
Dynamically deliver ACL:
In the 802.1X authentication environment that uses the radius server, you
can configure the corresponding ACL name on radius. When the user
authentication is passed, the server delivers the ACL name to the
authentication device, which binds the user with the ACL so that the
subsequent actions of the user are controlled by ACL. The ACL needs to be
pre-configured on the device. Passing the user authentication is just a
process of searching and binding. If the searching or binding fails, the user
cannot be online.

Typical Application
802.1 x C lient Authent ication
The Supplicant is connected to the network via 802.1X authentication. The
authentication server is the Radius server. The port 0/0/1 connected to
the Supplicant is in Vlan 1; the authentication server is in Vlan2; Update
Server is the server used to download and upgrade the client software and
is in Vlan 10; the port 0/0/2 of the switch connected to Internet is in Vlan
5.
Update Server Radius Server
Vlan 10
Port 0/4 Vlan 2
Port 0/3
Vlan 1 Vlan 5
Port 0/1 Switch Port 0/2
Internet
Supplicant
Figure 9-9
Enable the 802.1X authentication function on Port 0/1; set the

authentication mode as the port-based authentication; set Vlan 10 as the
Guest Vlan of the port.
Port 0/1 is added to Guest Vlan. Here, Supplicant and Update Server are in
Vlan 10; Supplicant can access Update Server and download the 802.1X
client.

Vlan 10
Port 0/4 Vlan 2
Port 0/3
Vlan 10
Vlan 1 Vlan 5
Internet
Supplicant
Figure 9-10
When the user goes online after passing the authentication, the
authentication server delivers Vlan5. Here, Supplicant and Port 0/2 are in
Vlan 5; Supplicant can access Internet.
Vlan 10
Port 0/4 Vlan 2
Port 0/3
Vlan 1 Vlan 5
Internet
Vlan 5
Supplicant

Figure 9-11
N on -cl ient M AC Address Authent ication

As shown in the following figure, one user (Client) is connected to Port 0/1
of the device. The device manager hopes to perform the MAC address
authentication for the user access on the port, so as to control the access
for Internet. After the device detects the MAC address of Client
0001.7a11.2233, enable the corresponding authentication. If the
authentication is passed, Client can access Internet. Otherwise, Client
cannot access Internet.
Figure 9-12
DHCP Snooping and Its

Application
This section describes the DHCP Snooping theory and how to realize it, as
well as its application.
Main contents:
Related terms
Introduction
Typical application

Related Terms
Trust Port: DHCP Snooping divides the ports to trust port and un-trust
port and performs some limitation processing for the DHCP packet on the
un-trust port, so as to realize the security policy.
Option 82: Option82 is one DHCP option. The option is used to record the
location information of the DHCP client. The administrator can locate the
DHCP client according to the option, so as to perform some security
control.
Dynamic binding table: Snoop the interaction of the DHCP packets to

get one binding table that contains the binding relation of the IP address
and MAC address and the related information.
Introduction
DHCP Snooping is one security feature of DHCP. It can ensure that the
client gets the IP address from the legal server, preventing the proof
attack. It also can record the corresponding relation between the IP
address and the MAC address of the DHCP client for the administrator to
view and for other security modules to use.
R ecord Corresponding Re lation o f I P Add ress

and M AC Address
Considering the security, the network administrator may need to record
the IP addresses used by the users for Internet and ensure the
corresponding relation of the IP address got by the user from the DHCP
server and the MAC address of the user supplicant.
DHCP Snooping records the MAC address of the DHCP customer and the
got IP address by snooping the DHCP-REQUEST and DHCP-ACK broadcast
packets received by the trust ports. The administrator can use the show
dhcp-snooping command to view the information about the IP address
got by the DHCP client.
Ensure that Client Ge ts I P Address from Legal

Ser ver
If there is private deployed DHCP server in the network, the user may get
the wrong IP address. To make the user get IP address from the legal
DHCP server, DHCP Snooping permits the port to be set as the trust port
and un-trust port.

The trust port is the port directly or indirectly connected to the legal DHCP
server. The trust port forwards the received DHCP packets normally, so as
to ensure that the DHCP client gets the correct IP address.
The un-trust port is the port not connected to the legal DHCP server. If the
DHCP-ACK and DHCP-OFFER packets returned by the DHCP server are
received from the un-trust port, discard them, so as to prevent the DHCP
client from getting the wrong IP address.
Support O p tion 82
Option82 is one DHCP option. The option is used to record the location
information of the DHCP client. The administrator can locate the DHCP
client according to the option, so as to perform some security control, such
as restrict the number of the IP addresses distributed to one port or VLAN.
Option 82 can contain 255 sub options at most. SM4100 series switch only
supports two sub options, that is, sub-option 1 (Circuit ID) and sub-option
2 (Remote ID).
SM4100 series switch supports two kinds of filling formats, that is, default
format and user-configured format.
The contents of the two sub options of the default format are as follows:

Figure 9-2-1 option82 default format
The contents of the two sub options of the user-configured format are as
follows:
Figure 9-2-2 Sub option 1 of option82 user-configured format
Figure 9-2-3 Sub option 2 of option82 user-configured format
The supporting of DHCP Snooping for Option 82:
1. After receiving the DHCP request packets, the device performs the
following processing on the packets according to whether the packet
contains Option 82 and the processing policy configured by the user,
as well as the filling format, and then forwards the processed packets
to the DHCP server.

Received DHCP Processing Policy Filling Format The Processing of

Request Packet DHCP Snooping for
Packets
The received packet Drop Discard the packet
carries Option 82. Keep Keep the Option 82
in the packet and
forward it
Replace Default Adopt the default
format to fill in
Option 82; replace
the original Option 82
in the packet and
forward it
User-configured Adopt the user-
configured format to
fill in Option 82;
replace the original
Option 82 in the
packet and forward it
The received packet Default Adopt the default
does not carry Option format to fill in
82. Option 82 and
forward it
User-configured Adopt the user-
configured format to
fill in Option 82 and
forward it
Figure 9-2-4 DHCP Process Snooping packets
2. If the packet contains Option 82 when the device receives the

response packet of the DHCP server, delete Option 82 and forward it
to the DHCP client; if the packet does not contain Option 82, directly
forward the packet to the DHCP client.
Packet Rate Lim itat ion

After enabling the DHCP Snooping function on the device, send all DHCP
packets to CPU. If the user adopts the tool to fabricate lots of DHCP
packets and initiate the DHCP Flooding attack, it may result in the running
of the device with high payload or even breakdown. To avoid this, you can
set the threshold for the DHCP packets received every second on the port.
The device measures the number of the DHCP packets received by the

port each second. If the number of the packets received each second
exceeds the set threshold, the excessive packets are directly dropped by
CPU. If the number of the received DHCP packets exceeds the threshold in
successive 20 seconds, directly shut down the port and whether to recover
automatically depends the configuration managed by the port. You can
also recover manually.
Typical Application
The typical application of the DHCP Flooding function in the network is as
shown in the following Switch A. The port connected to the client network
is set as the un-trust port and the port connected to the relay or server is
set as the trust port. This can ensure that the client can get the IP address
from the trust port (that is the legal server).
Figure 9-2-5 DHCP networking
IP Source Guard and Its

Application
This section describes the IP Source Guard theory and how to realize it.
Main contents:

Related terms
Introduction
Typical application
Related Terms
IP Source Guard: Filter IP packets via IP or IP+MAC.
Introduction
With the IP Source Guard binding function, you can filter the packets
forwarded by the port, so as to prevent the packets with invalid IP address
and MAC address from passing the port and improve the port security.
After receiving the packet, the port searches for the IP Source Guard
binding entries and perform the following processing on the packet
according to the filter mode specified on the port.
When the filter mode of the port is IP: If the source IP address of the
packet is the same as the IP address recorded in the binding entries,
the port forwards the packet. Otherwise, drop the packet.
When the filter mode of the port is IP+MAC: If the source MAC
address and source IP address of the packet is the same as the MAC
address and IP address recorded in the binding entries, the port
forwards the packet. Otherwise, drop the packet.
The IP Source Guard binding entries have two sources. One is the static
binding entries configured manually by IP Source Guard; the other is the
entries maintained by DHCP Snooping.
Key Points for Realization

1. When the IP Source Guard function is enabled, poll IP Source Guard
static binding table and DHCP Snooping dynamic binding base to get
the corresponding port entries and write into the hardware entries.
2. When the IP Source Guard function is disabled, poll the IP Source

Guard function is static binding table and the DHCP Snooping dynamic
binding base and delete the corresponding port entries from the
hardware entries;
3. When adding the IP Source Guard static entries, update the hardware
entries automatically. Delete the hardware entries during deletion. If

setting the hardware entries fails, the static table sets Writed-Flag as
nonwrite.
4. When adding the DHCP Snooping dynamic entries, update the

hardware entries automatically. Delete the hardware entries during
deletion. If setting the hardware entries fails, the static table sets
Writed-Flag as nonwrite.
5. Synchronize the software table (IP Source Guard static entries and
DHCP Snooping dynamic entries) and hardware table every minute.
Because of the ACL resource limitation, it is likely that all software
entries cannot be written into the hardware entries. You need to check
whether there are available resources regularly. If there are available
resources, for example, some entries are deleted and the ACL
resources are adjusted larger, write the legal entries in the software
table into the hardware entries. The default ACL resources are two
slices, that is, 256. Enabling one port needs to occupy two and the
other are used to set the filter entries.
6. When the IP Source Guard function is enabled on the port, the

configured binding table is written into the switch chip hardware, so as
to realize the filtering of the IP packets. The quantity written into the
switch ship hardware depends on the number of the resources
distributed by the switch chip hardware to IP Source Guard. If the
switch chip hardware resources distributed to IP Source Guard are
used up and you need to add the binding entries or enable the IP
Source Guard binding function on other port, you need to add the
switch chip hardware resources or delete some binding entries. You
can continue to distributed the resources after restarting the device. If
you just delete some entries after the switch chip hardware resources
are used up, you cannot enable the IP Source Guard function on other
port, because you need to pre-distribute the resources for enabling
the IP Source Guard function of the port, but when the switch chip
hardware resources are not enough, to make the resource utilization
reach the maximum, the binding entries occupy the pre-distributed
resources. Meanwhile, after disabling the IP Source Guard function of
the port, the pre-distributed resources of the port are released, but
maybe the resources cannot be written into the binding table.

Typical Application
Applic ation in non - D H CP Snooping
en viro nment
Figure 9-3-1 IP Source Guard configuration instance 1
The switch can be applied in LAN and be connected to Internet. Configure

IP Source Guard on the port of the switch connected to LAN; bind the IP
address and MAC address of the users in LAN according to the
configuration of the static binding table. Only the bound address can be
connected to Internet via the switch. The IP packet that is sent from the
un-bound address is regarded as illegal packet and is filtered.
Dynamic ARP Detection and

Application
This section describes Dynamic ARP Inspection theory and how to realize it.
Main contents:
Related terms
Introduction
Typical application

Related Terms
Dynamic ARP Inspection: It is one security measure of discovering and
preventing the ARP proof attack by checking the validity of the ARP packet.
Introduction
Dynamic ARP detection function can be used to discover and prevent the
ARP proof attack.
The dynamic ARP function re-directs all ARP packets (broadcast ARP and
unicast ARP) of the port on which the ARP detection function is enabled to
CPU for judging, comparing, software forwarding, log recording and so on,
so when there are lots of ARP packets, the CPU resource is consumed.
Therefore, in the normal state, it is not recommended to enable the
function. When it is double that there is the ARP proof attack in the
network, you can enable the function to confirm and locate.
The device does not check all ARP packets from the port on which the
dynamic arp inspection function is not enabled, but directly forward the
packets. Usually, the port on which the dynamic arp inspection is not
enabled is the upstream port of the device. The device checks the ARP
packets from the port on which the dynamic arp inspection function is
enabled according to the DHCP Snooping table or the IP static binding
table configured manually by IP Source Guard.
When global arp-security is enabled, control whether the device processes

the ARP packets of the IP/MAC specified by the global IP/MAC of ACL.
When the source IP of the ARP packet sent to the device matches with the
IP specified by the global IP/MAC of ACL, but the source MAC does not
match, the ARP packet is dropped so that the device does not set up the
wrong ARP entities. The device sets up the entity only when the source
IP/MAC matches with the global IP/MAC of ACL. When the source IP does
not match with the IP specified by the global IP/MAC, the ARP entity can
also be set up.
AR P Detection Pol ic y
1. When the binding relation of the source IP address and source MAC
address in the ARP packet matches with the DHCP Snooping entries
or the manual-configured IP static binding entries, and the ingress
port of the ARP packet and its VLAN are consistent with the DHCP

Snooping entries or the IP static binding entries manually

configured by IP Source Guard, the ARP packet is valid and is
forwarded.
2. When the binding relation of the source IP address and source MAC
address in the ARP packet does not match with the DHCP Snooping
entries or the manual-configured IP static binding entries, and the
ingress port of the ARP packet and its VLAN are inconsistent with
the DHCP Snooping entries or the IP static binding entries manually
configured by IP Source Guard, the ARP packet is invalid and is
dropped. Besides, the log information is printed.
3. The matching order: First match IP Source Guard static binding

table and then match DHCP snooping dynamic binding table.
Packet For warding Po lic y

After receiving the ARP packet, first judge whether the dynamic arp
inspection function is enabled on the port. If not, the ARP packet continues
going to the protocol stack for processing and do not perform the software
forwarding; if yes, check the validity according to the previous method. If
the packet is invalid, drop it directly and record in the log. If the packet is
valid, process it according to the destination address.
1. If the destination MAC address of the ARP packet is the local device,
forward the packet to the ARP protocol stack processing and update
the ARP cache of the local device.
2. If the destination MAC address of the ARP packet is the broadcast

address, copy the packet, forward the original packet to the ARP
protocol stack for processing, update the ARP cache of the local
device, and forward the copied packet from all ports of the same
VLAN.
3. If the destination MAC address of the ARP packet is other unicast

address, first search the hardware MAC table to get the forwarding
port. If the forwarding port is found, forward the packet from the
port; if the forwarding port is not found, forward the packet from
all ports of the same VLAN.

Figure 9-4-1 Processing flow for valid ARP packet
Packet Rate Lim itat ion

After enabling the dynamic ARP function on the device, TRAP all ARP
packets to CPU. If the user adopts the tool to fabricate lots of ARP packets
and initiate the ARP Flooding attack, it may result in the running of the
device with high payload or even breakdown. To avoid this, you can set
the threshold for the ARP packets received every second on the port. The
device measures the number of the ARP packets received by the port each
second. If the number of the packets received each second exceeds the
set threshold, the excessive ARP packets are directly dropped by CPU. If
the number of the received ARP packets exceeds the threshold in
successive 20 seconds, directly shut down the port and whether to recover
automatically depends the configuration managed by the port. You can
also recover manually.
Log Record ing

For the invalid ARP packet, record it in the log before dropping it. Each
invalid ARP log entry includes the following contents:
1. Receiving VLAN
2. Receiving port
3. The IP address of the sender and the destination IP address
4. The MAC address of the sender and the destination MAC address

5. The number of the dropped packets
The log information is not output in real time, but output periodically. The
user can perform the further processing according to the output log
information, such as locate the host that initiates the ARP attack.
Typical Application
Figure 9-4-2 Application instance of Dynamic ARP Inspection
The above figure is the application in the DHCP environment. If it is not

the DHCP environment, that is, the DHCP Snooping function is not enabled
on switch A, you need to configure the IP Source Guard static binding
table. Otherwise, the ARP packets of all ports on which the Dynamic ARP
Inspection function is enabled are filtered. The Dynamic ARP Inspection
function adopts the dynamic binding table generated by the DHCP
Snooping function to filter the ARP packets, forward the valid packets, and
drop the invalid packets and record in the log.
Port Security
This section describes the basic theory of the port security and its
application.
Main contents:

Introduction
Typical application
Introduction
The port security is applied at the access layer. It can limit the hosts that
access the network via the device, permitting some specified hosts to
access the network, but other hosts cannot access the network.
The port security function binds the MAC address, IP address, VLAN ID and
Port of the user flexibly to prevent the invalid user from being connected
to the network, so as to ensure the security of the network data and the
valid user can get the enough bandwidth.
The user can limit the hosts that can access the network via three kinds of
rules, including MAC rule, IP rule and MAX rule. The MAC rule is divided to
three kinds of binding modes, that is, MAC binding, MAC+IP binding, and
MAC+VID binding. The IP rule can be for one IP or a series of IP. The MAX
rule is used to limit the number of the maximum MAC addresses that the
port can learn (by order). The maximum number of the MAC addresses
does not include the valid MAC addresses generated by the MAC rule and
IP rule.
The MAC rule and IP rule can specify whether the packet that matches
with the corresponding rule permits the communication. With the MAC rule,
you can bind the MAC address with VLAN, MAC address with IP address
flexibly. The port security is realized based on the software. The rule
quantity is not limited by the hardware resources, which makes the
configuration more flexible.
The rules of the port security depend one the ARP packets of the terminal
device to trigger. When the device receives the ARP packet, the port
security gets the information about various kinds of packets to match the
configured three rules. The matching order is first to match the MAC rule,
then match IP rule and at last match the MAX rule. Control the L2
forwarding table of the port according to the matching result, so as to
control the forwarding of the port for the packet.
When the port security regards the packet as the illegal packet, it
performs the corresponding process. Currently, there are three kinds of
processing modes, that is, protect, restrict, and shutdown. The protect

mode drops packets; the restrict mode drops packets and trap alarm
(alarm within two minutes when receiving illegal packet); besides the
actions of the restrict mode, the shutdown mode shuts down the port.
Typical Application
Refer to the related chapter of the configuration manual.
Port Monitoring
This section describes the basic theory of the port monitoring and its
application.
Main contents:
Introduction
Typical application
Introduction
The port monitoring function is to monitor the packets on the switch CPU,
filter the excessive packets at the bottom layer and protect the switch
from being attacked by the lots of invalid packets.
The monitoring includes the port monitoring and host monitoring. When
the switch is attacked, the user first enables the port monitoring. The
monitoring program measures the packets to the CPU by port. The user
discovers the attacked port from the statistics data and then enables the
host monitoring on the port and sets the upper threshold of the packets to
the CPU in sampling period. The packets that exceed the threshold in the
sampling period from the host that initiates the attack are filtered at the
bottom layer and they do not go to the IP layer for being routed and are
not written into the hardware route table, so as to save the CPU resources
and hardware table resources. When performing the packet filtering on the
host that initiates the attack, the other hosts still can communicate
normally. The monitoring program writes the host whose packets to the
CPU exceed the upper threshold in the sampling period into the blacklist.
In the next sampling period, only half of the upper threshold of the
packets of the hosts in the backlist can go to CPU and the other packets to
CPU are dropped. The port monitoring program performs the measuring
and dropping operations according to the packet classification.

The port monitoring program calculates the sampling result at the end of
each sampling period and updates the backlist information.
The port monitoring divides the packets into six types:
1. broadcast-packet: The destination MAC address is all 1;
2. multicast-packet: The lowest digit of the highest bytes of the

destination MAC address is 1;
3. admin-packet: The destination IP address is the IP address of the

switch VLAN interface;
4. forward-packet: The destination IP address is not the IP address of

the switch VLAN interface. It is the packet that requires to be
forwarded out after being routed;
5. other-packet: The other packets except for the previous four kinds
of packets;
6. All the previous packets are called total-packet;
Typical Application
Refer to the related chapters of the configuration manual.
Port Isolation
This section describes the basic theory of the port isolation and its
application.
Main contents:
Related terms
Introduction
Typical application

Related Terms
Port isolation: It is one function of the port security. The function can
prevent the packet forwarding between one port and the other ports of the
switch.
Introduction
The port isolation is port-based security feature. The user can specify the
isolated ports of one port as desired to realize the L2 and L3 data isolation
between the port and the isolated ports, which improves the network
security and provide the flexible networking scheme for the user.
By default, the packets can be forwarded between any two ports in one
VLAN of the switch. To make any specified port in one VLAN cannot
communicate, you can configure the isolated ports of the port in the
specified port mode so that the port that is configured with the port
isolation cannot communicate with the specified isolated ports.
The port isolation is not related with the VLAN of the port. Currently, the
switch supports configuring the isolated ports in the common port and
aggregation port mode. The isolated port can be common port or
aggregation port. The port isolation only realizes the uni-directional packet
dropping. Suppose that the isolated ports are set as B, C, and D on port A.
If the destination port of the packet entering from the port A is B/C/D,
drop the packet directly. However, the destination port of the packet
entering from the port B/C/D is port A, the packet can be forwarded
normally.

Typical Application
Figure 9-6-1 Typical application of port isolation
Illustration
The three ports of switch A are connected to three terminal devices

respectively. port 0/1, port 0/2 and port 0/3 are connected to PC1, PC2
and PC3 respectively. Port 0/27 is connected to the public network. port
0/1, port 0/2, port 0/3 and port 0/27 are connected to one VLAN.
PC1, PC2 and PC3 cannot communicate with each other, but can
communicate with the public network normally. In the normal state, the
ports in one VLAN can communicate with each other. To meet the previous
environment, you can use the port isolation function to realize the
application environment. Isolate port 0/2 and port0/3 on port 0/1; isolate
port 0/1 and port0/3 on port 0/2; isolate port 0/1 and port0/2 on port 0/3.
After the configuration, port 0/1, port 0/2, and port 0/3 cannot
communicate with each other, but can communicate with port 0/27.

SPAN Technology
This chapter describes the port mirroring SPAN technology and application.
Main contents:
SPAN technology
Typical application
SPAN Technology
Switched Port Analyzer (SPAN) is used to monitor the data flow of the
switch port. You can use SPAN to copy the frames on one monitoring port
(source port) to another destination port on the switch connected to the
network analysis device to analyze the communication on the source port.
The user adopts the network analysis device to analyze the packets
received by the destination port for network monitoring and
troubleshooting. SPAN does not affect the normal packet switching of the
switch, but all frames that enter into the source port and are output from
the source port are copied to the destination port. However, for one
destination port with excessive traffic, for example, one 100Mbps
destination port monitors one 1000Mbps port, the frames may be dropped.
Related Terms of SPAN Technology

SPAN Session
The SPAN session means the data flow between one group of monitoring
ports and one destination port. The data of multiple monitoring ports can
be mirrored to the destination port. The mirrored data flow can be the
input data flow, output data flow or output and input data flow. You can
set SPAN for the port that is in the close state, the SPAN session is
inactive. However, as long as the port is enabled, SPAN becomes active.
Each line card support the SPAN session of four rx and one tx.

Local SPAN
Local SPAN supports the port mirroring on one switch and all monitoring
ports and destination ports are on one switch. Local SPAN mirrors the data
of one or multiple monitoring ports to the destination port.
Remote SPAN
RSPAN supports that the monitoring port and the destination port are not
on the same switch, so as to realize the remote monitoring across the
network. Each RSPAN Session bears the monitoring traffic on the specified
RSPAN VLAN. RSPAN includes RSPAN Source Session, RSPAN VLAN, and
RSPAN Destination Session. You need to configure RSPAN Source Session
and RSPAN Destination Session on different switches. When configuring
RSPAN Source Session, you need to specify one or multiple monitoring
ports and one RSPAN VLAN. The monitoring data is sent to RSPAN VLAN.
Configure RSPAN Destination Session on another switch and you need to
specify the destination port and RSPAN VLAN. RSPAN Destination Session
sends the RSPAN VLAN data to the destination port.
The switches that realize the remote port mirroring function are divided to
three kinds:
1. Source switch: It is the switch of the monitored port, which

transmits to the intermediate switch or destination switch via
RSPAN VLAN.
2. Intermediate switch: It is the switch between the source switch and

destination switch in the network, which transmits the mirroring
traffic to the next intermediate switch or destination switch. If the
source switch is connected to the destination switch directly, there
is no intermediate switch.
3. Destination switch: It is the switch of the remote mirroring

destination port, which transmits the mirroring traffic received from
RSPAN VLAN to the monitoring device via the mirroring destination
port.
Traffic Types
There are three types of monitored traffic:
1. Receive (Rx): The traffic received by the monitoring port;
2. Transmit (Tx): The traffic sent by the monitoring port;
3. Both: The received and sent traffic of the monitoring port.

Monitoring port (source port)
The data of the monitoring port (source port) is monitored for network
analysis. The monitored data flow can be input, output or bi-directional
and can be in different VLANs.
The monitoring port has the following features:
It can be common port or aggregation port;
It cannot be destination port;
One source port can only belong to one SPAN session;
It can be or not in the same VLAN as the destination port.
Destination port
The destination port can only be one separate physical port or aggregation
group. One destination port can only be used in one SPAN session.
The destination port has the following features:
The destination port is common port or link aggregation;
The destination port cannot be monitoring port;
The destination port type of RSPAN Destination Session should be

hybrid;
The destination port cannot take part in the STP calculation. The local
SPAN includes the BPDU of the monitored traffic, so any BPDU seen by
the destination port is from the source port;
The destination port should not be connected to other switch, which

may result in the network loop;
The destination port had better be larger than or be equal to the

bandwidth of the monitoring port. Otherwise, the packets may be lost;
The destination port does not enable the LACP and 802.1X function,
preventing the mirroring data from being affected;
The source RSPAN destination port can only be the common port, but
cannot be the aggregation port;
The destination port can serve as the common forwarding port, but to
prevent the monitored data from being interfered by other data flow,
it is recommended to delete the destination port from all VLANs.

RSPAN VLAN
RSPAN Vlan should be one private idle VLAN for RSPAN and its VLAN
number can be 2-4096. You can select one idle VLAN flexibly during
configuration, but you need to ensure that other devices on all paths to
the analysis device are all configured with the VLAN and the corresponding
ports are added to the VLAN.
RSPAN VLAN has the following features:
To prevent the monitored data from being interfered by other data

flow, RSPAN VLAN can only bear the RSPAN traffic;
Except for the ports those are used to bear the RSPAN traffic, do not
configure any port to RSPAN VLAN;
RSPAN VLAN prohibits the MAC address learning function;
RSPAN does not support the L2 protocol monitoring unless disabling

the L2 protocol function of RSPAN destination session device.
Limitations
1. SPAN and flow mirroring use the same chip resource. When
enabling the port mirroring, avoid enabling the flow mirroring.
Otherwise, the hardware resource may become lacking.
2. In the MPLS environment, if MPLS learns the destination MAC

address of the packet, the mirrored MPLS packet carries the MPLS
header; if MPLS does not learn the destination MAC address of the
packet, the mirrored MPLS packet does not carry the MPLS header.
Typical Application
Local SPAN Appl ication
The following is one simple local SPAN environment.

The application diagram of the local SPAN
Illustration
In the above figure, all packets of port 0/1 are mirrored to port 0/2. The
network analyzer connected to port 0/2 is not connected to port 0/1
directly, but port 0/2 can receive the packets of port 0/1 via the mirroring.

R emo te SPAN Applic ation
The application diagram of remote SPAN
Illustration
In the above figure, the mirroring packets of the port 0/8 on the source
device switch 1 are transmitted to the destination port 0/1 of the
destination device switch 2 via RSPAN Vlan 100, realizing the monitoring
for the sent and received packets of the source switch ports on the
destination switch.

IPv4 Unicast Routing
This chapter describes the principles of the mainstream routing protocols.
Main contents:
Introduction to the IPv4 unicast routing
Static routing protocol
M-VRF
Load balance
RIP dynamic routing protocol
OSPF dynamic routing protocol
IS-IS dynamic routing protocol
BGP dynamic routing protocol
Introduction to the IPv4 Unicast

Routing
The packets reach another host from one host in the network. Then, you
should know the transmission path of the packets in the network. The path
is called route.
A network is composed of many forwarding devices (such as switches). To

forward packets from one host to another host, each forwarding devices
should know the path to the destination host, that is, each forwarding
device should have the route to the destination route.
The source of the route includes three types: when the forwarding device
is directly connected to the network, the directly-connected route is

generated; when the network administrator adds routes manually, static

routes are generated; when the forwarding device runs the dynamic
routing protocol, the dynamic route can be automatically learned.
There are many paths for packets sent from one host to another.
Therefore, the best path should be selected to forward the packets.
Determine the path from the following aspects:
Path length: the path length can be measured through the hops or
cost. In the distance vector routing protocol, the path length refers to
the number of the forwarding devices from the source host to the
destination host. In the link status routing protocol, the path length
refers to the sum of the cost of each link.
Reliability: measured by the error rate between the source host and
the destination host. In most routing protocols, the reliability of a link
is designated by the network engineer.
Delay: refers to the sum of the time spent in traveling through all
network devices, links, and switching devices. In addition, for the
delay time, the network congestion and the distance between the
source end and the destination end. Many variables are taken into
account for the delay time. Therefore, in the calculation for best path,
delay is an important measurement standard.
Bandwidth: Calculating the best path through the bandwidth may

cause misleading. The link with 1.544Mbps bandwidth is better than
the link with the bandwidth of 56Kbps, but the utilization rate of the
1.544Mbps link is high, or the load of the opposite receiving device is
heavy, it may not be the best path.
Load: Assign a value for the network resource according to the

resource utilization. The value is determined by the CPU utilization,
passed packet per second, and disassemble/assemble of packets. But
the process of monitoring device resources is a heavy load.
Communication cost: In some cases, the communication link of public

network is charged by utilization rate or by monthly fee, for example,
the ISDN link is charge by the utilization time and the data amount in
the period. In the examples, the communication cost is a very import
factor in determining the best path.
Static Routing Protocol

Main contents:
Introduction to the static route

Typical application of the static route
Troubleshooting of the static route
Introduction to the Static Route

The static route is defined by users. Through the static route, the packets
between the source and destination adopt the path specified by the
administrator.
To know the information categories in the routing table, when a frame

reaches one interface of the switch, it is useful to check the changes. You
must check the data link tag of the frame in the destination domain. If the
tag includes the tag of the switch interface and the broadcast tag, the
switch will deprive the header and tailor of the frame and transmit the
complete packets to the network layer. The network layer must check the
destination address in the packets. If the destination address is the IP
address of the switch, is the multicast address performing monitoring, is
the broadcast address of the subnet or the designated broadcast address,
is the global broadcast address (255.255.255.255), the protocol domain of
the packets will be checked and the complete data will be transmitted to
the corresponding internal process.
To find a route, use the next-hop address as the destination, and parse
the link layer address. The next-hop address may be the address of
another host directly connected with the switch. It may be the address of
another host non-directly connected with the switch in the network. The
addresses can be routed.
To route the packets, the switch searches the routing table to get the
correct route. In the database, each route in the database should contain
the following two conditions:
1. Destination address: The network address that the switch can reach.
Based on the same primary network address , the switch may have
more than one route to the same address.
2. Destination pointer: The pointer specifies whether the network and the
switch are directly connected or specifies the address of the next
switch, namely, the next-hop switch.
The switch will try to match the most special address. In the following
special sequence, the address may be one of the following:

Host address (host route)
Subnet
A group of subnets (summary route)
Main network ID
A group of network ID (ultranetwork)
Default address
If the destination address of the packets does not match any entry in the
routing table, the packets will be discarded and send an ICMP message
that the destination address is unavailable to the source address.
Typical Application of the Static Route

The following is a simple environment illustrating the static route.
Figure 11-1 Typical application of the static route

Illustration
Two Maipu routers (switch-a and switch-b), as the forwarding equipment,

connect the two networks including 10.1.1.0/24 and 10.1.3.0/24. The
default gateway of PC-1 is 10.1.3.1 and the default gateway of PC-2 is
10.1.1.2.
Configure static route on the two switches to implement the

interconnection of 10.1.1.0/24 and 10.1.3.0/24. Configure a static route
on switch-a: set the destination address to 10.1.1.0/24 and set next hop
to 10.1.2.1. Configure a static route on switch-b: set the destination
address to 10.1.3.0/24 and set the next-hop to 10.1.2.2. Then, the
network can be interconnected.
The data flow sent to PC-2 from PC-1 reaches the default gateway switch-
a. Switch-a finds that the destination address 10.1.1.1 of the data flow is
not the local address. Search the routing table. Owing to the existence of
static route 10.1.1.0/24, switch-a can forward the data flow to the next
hop 10.1.2.1 (namely switch-b). Switch-b continues forwarding, the
destination address of the data flow hits the directly connected route, and
the data flow is successfully transmitted to PC-2.
Troubleshooting of the Static Route

Load Ba lancing o f the Switching De vi ce
On the switching devices that support hardware routing (such as L3
switch), after the static route is configured, small amount of packets
should be forwarded (through software) to parse the next hop. For
example:
S 128.255.0.0/16 [1/10] via 1.1.1.2, 00:40:10, vlan1

When the static route takes effect, it is possible that the ARP table entry
corresponding to 1.1.1.2 does not exist. When real data flow should be
forwarded through the route, the ARP table entry corresponding to 1.1.1.2
will be parsed. The ARP is parsed by sending the data to the CPU for
software forwarding. When the ARP is parsed successfully, the data is
switched on the hardware and is not sent to the CPU.
When the static route is a load balancing route, it is possible that the data
is sent to the CPU continuously owing to the different route of the software
and hardware.
S 128.255.0.0/16 [1/10] via 1.1.1.2, 00:40:10, vlan1

via 1.1.1.3, 00:40:10, vlan1
The load balancing route is written into the hardware. The ARP is not
parsed for next hops 1.1.1.2 and 1.1.1.3. The data flow with the
destination address of 128.255.1.1 hits the route. For the load balancing
route, the hardware adopts flow load balancing mode to select the next

hop. For example: select 1.1.1.3. For 1.1.1.3, if the ARP is not parsed, the
packets should be transmitted to the CPU to perform software forwarding.
After the packets reach the CPU, if the software also adopts the flow load
balancing mode to select the next hop, owing to the different algorithm of
software and hardware, 1.1.1.2 may be selected. As a result, the ARP
parsing of 1.1.1.2 is implemented. 1.1.1.3 is not parsed.
Then, the hardware selects 1.1.1.3 as the next hop. The software selects
1.1.1.2 as the next hop. Consequently, the data flow is continuously
transmitted to the CPU and hardware forwarding cannot be performed.
Therefore, for the hardware route switching devices, when the static route
load balancing mode is used, we recommend setting the software load
balancing to packet load balancing mode. Then, each next hop on the
software can perform ARP parsing.
Use the ip load-sharing per-packet command to set the software load

balancing mode to per packet mode.
M-VRF
Main contents:
Terms
Introduction to M-VRF
Terms of M-VRF
VPN- Virtual Private Network Through VPN technology, two or multiple
network sites can be connected through the Internet. In the VPN, the
running mode is like that all sites are in a single private network.
M-VRF- MultiVPN Routing and Forwarding In the switch, each VPN has
its own routing and forwarding table. All customers of sites of the VPN can
only access the routes of the table.

Introduction to M-VRF
M-VRF supports the VPN. In a switch, multiple VRFs may exist. The
resources (interface, IP address, routing table) belong to a VRF. The
resources in different VRF cannot access mutually. Through the Multi-VRF
function, users can isolate the network. And the address space overlapping
is supported.
The M-VRF does not modify the packet format. It only enhances the
security by dividing the resource attributes. The resources in the system
belong to one VRF only. After the interface is configured with a VRF, the
packets sent or received through the interface can only access the
resources of its own VRF.
We take the packet forwarding as an example. When an interface receives

a packet, take the VRF attributes of the interface. In addition to
determining whether the local address is the destination address of the
packets, we need to determine whether the VRF attributes of the home
interface of the address and the VRF attribute of the interface receiving
packets are the same. To forward packets, locate routing table according
to the VRF attribute.
Load Balancing
Main contents:
Types of load balancing
Modes of load balancing
Switching types and load balancing
Types of Load Balancing

Equal-cost load balancing, assigns communication traffic on average. (1:1)
Unequal-cast load balancing, assigns communication traffic according to

the cost ratio. (1: n)

Modes of Load Balancing

Load balancing of per packet, the first packet takes one link and the
second packet takes another link. The packets are distributed each links
circularly. (Ignore whether the destination address is the same)
Load balancing of per session (or destination by destination), packets to

the same host use the same link.
Both modes have their own features.
1. Switching per packet: when the concurrent link is less than 64K, it is a
good option. Missequence may occur. It is improper for specific
application, such as voice traffic (depends on the sequence of the
arrived packets))
2. Switching per session: when the load of the link used by the session
traffic is heavy (for the communication traffic is heavy), but the load
of other links is light, the load of different links may be unbalanced.
Switching Types and Load Balancing

Different switching types match different load balancing modes; generally
there are the following two types:
Process switching: To balance the load based on the sequence of the

arrived packets. The per packet balancing mode is adopted.
Fast switching: To balance the load based on the source/destination

address of the packets. The per session balancing mode is adopted.
Note
The content described in this chapter is only applicable to the software

forwarding. The packets forwarded through switching chip are not
restricted by the description in this chapter.
RIP Dynamic Routing Protocol

Main contents:
Terms of RIP protocol
Introduction to the RIP protocol

Terms of RIP Protocol

UDP- User Datagram Protocol. It is a simple datagram-oriented unreliable
transmission IP network transmission layer protocol.
D-V algorithm-distance vector algorithm. It is a routing calculation

method for the computer network. It is also called the Bellman-Ford
algorithm.
IGP-- Interior Gateway Protocol.
Request packets-The packets for requesting the RIP routing information

about other routing devices.
Response packets-For advertising its own routing information to the RIP

of the adjacent routing devices.
Split horizon- A measure adopted by the RIP protocol to prevent the

generation of loopback.
Poisoned reverse- A measure adopted by the RIP protocol for preventing

the generation of route loopback, is more initiative than the Split Horizon.
Triggered updates- A measure of the RIP protocol for quickening the

convergence. When the route changes, the updates are triggered and the
changed routes are advertised. Regular updates, the RIP protocol sends
the updates of all routing information at an interval of 30 seconds by
default.
Introduction to the RIP Protocol

Routing Information Protocol (RIP) is an interior gateway routing protocol
based on the distance vector algorithm. It is used for the dynamic IPv4
route. The RIP protocol has become one of the standards of information
transmission between routing devices and hosts.
The RIP protocol includes RIPv1 and RIPv2. RIPv1 does not support
classless routes but RIPv2 supports the classless routes. Usually RIPv2 is
used.
The RIP protocol is simple and the configuration is also simple. The routing
information to be advertised by the RIP protocol and the number of routes
in the routing table are directly proportional. A large number of routes use
lots of network resources. At the same time, the RIP protocol defines that
the maximum of the hops is 15. Therefore, the RIP protocol is only
applicable to the simple small-to-medium network.

RIP protocol is applicable to most campus network and simple regional

network. For more complex environment, the RIP protocol is not used.
R I P in the TC P/I P Protocol
Figure 11-2 RIP in the TCP/IP protocol stack
As shown in the preceding figure, the RIP protocol is based on the UDP
protocol. The protocol packets sent by the RIP protocol are encapsulated
in the UDP packets. At port 520, the RIP protocol receives the protocol
packets sent from the remote routing devices. It updates the local routing
table according to the routing information in the received protocol packets.
At the same time, add one to the metric and then notify other adjacent
routing devices. Through this mode, all routing devices in the route
domain can learn all routes.
The RIP protocol sends packets in the following three modes: broadcast,
multicast, and unicast. The usage of each mode is shown in the following
table.
Table 1-1 Modes of sending packets
Mode Address Version Port Purpose

Broadcast 255.255.255.255 RIPv1 520 RIPv1 sends protocol packets to all adjacent
routing devices.
Multicast 224.0.0.9 RIPv2 520 RIPv2 sends protocol packets to all adjacent
routing devices.
Unicast Unicast IP RIPv1/2 520 The response packets responding to request
address packets; protocol packets sent to Neighbor.
R I P Packets Typ es and Structure

RIP Packet Types
There are two types of packets: Request packets and Response packets.
The RIP packet types and the functions are as follows.

Table 1-2 RIP packet types
Packet Type Function Sending Status

Request packets Request the routing information When the RIP is running at the
from the adjacent routing device interface, request all routing
RIP. You can request the information from the adjacent
specified routing information or routing device RIP.
request all routing information
(there is only one route entry
with the address family tag 0,
metric 16.)
Response packets Advertise the routing information A) Respond to the request
to the adjacent routing device packets.
RIP. B) When the route changes, the
update of the routing information
is triggered.
C) Advertise all routing information
(regular updates) to the adjacent
device RIP periodically.
RIP Packet Structure
Figure 11-3 RIP packets structure
As shown in the preceding figure, the RIP packets are encapsulated in the
UDP packets. In the IP header of the RIP packets, TTL is set to 1 to
prevent RIP packets from being forwarded by other routing devices.
The RIP header has two fields: Command field identifies the request
packets (value is 1) or response packets (value is 2); Version field
identifies the RIPv1 (value is 1) or RIPv2 (value is 2).
RIP Entry includes three types: RIPv1 routing entry, RIPv2 routing entry,
and authentication information entry. RIP Entry types and description are
as follows.
Table 1-3 RIP protocol RIP entry types and description
RIP information Version Format Description

entry
RIPv1 routing entry RIPv1 The format is In the RIPv1, advertise the routing
shown in the information to the adjacent routing

following device RIP.

figure:
RIPv2 routing entry RIPv2 The format is In the RIPv2, advertise the routing
shown in the information to the adjacent routing
following device RIP.
figure:
Authentication Plain RIPv2 The format is Add the authentication information about
information text shown in the the plain text of the packet in the RIPv2
entry following protocol. The information follows the RIP
figure: packet header.
MD5 RIPv2 The format is Add the authentication information about
shown in the the MD5 of the packet in the RIPv2
following protocol. The information follows the RIP
figure: packet header. At the end of the packet,
corresponding authentication content is
required.
Figure 11-4 Format of the RIP routing information entry

Figure 11-5 Packet format of the RIPv2 authentication information
W orking Pri nciple of RI P
Figure 11-6 Working flow of the RIP protocol

The working flow of the RIP protocol is shown in the preceding figure. It
can be divided into two parts: one is the RIP protocol starting flow, and
the other is the processing flow of RIP receiving packets.
Starting the Protocol
When an interface starts to run the RIP protocol, request packets are sent
to the interface through the broadcast (RIPv1) or multicast (RIPv2) mode
to request all routing information from all adjacent routing devices. Then,
the fast convergence can be implemented.
After the response packets of the request packets are received, update the
routes in the route database according to the routing information
contained in the packets. Then, the changed routes are advertised to other
adjacent routing device RIP (triggered updates).
At the same time, start the Updates timer. Every 30 seconds by default,
advertise all routing information through response packets to the adjacent
routing device RIP. The purpose of the operation is to ensure the
synchronization of the database between the routing device RIPs and to
update the advertise routes. As a result, the previously advertised routes
do not time out or become invalid on other routing devices.
Route Database
The route database records all routing information about the RIP protocol.
Each routing information is composed of the following elements:
1. Destination address: the destination host or subnet of the route.
2. Metric: The metric value of reaching the destination.
3. Next hop interface: the interface for forwarding packets reached

the destination, namely, the interface of the route is learned.
4. Next hot IP address: to reach the destination, the interface IP

address of the passed adjacent routing devices. Generally, the
source IP address of the response packets of the route is learned.
5. Source IP address: the source IP address of the response packets

of the route is learned.
6. Route tag: defined by the user, for marking category 1 route. For
example, mark that a route is obtained through redistributing the
BGP routes.

Source of the Routing Entries in the Route Database
In the RIP route database, the sources of the routing entries are as follows:
1. Directly connected route of the covered interface
2. The route for the protocol to redistribute other protocols.
3. Routes generated by the protocol configuration command, for

example, the command for generating and launching default route
0.0.0.0 (default-information originate).
4. Routes learned from the adjacent routing device RIP.
Retrieval of Next-Hop Route
In RIPv1, the next-hop interface of the route is the interface of the learned
route. The next-hop IP address is the source IP address of the response
packets of the learned route.
In RIPv2, the routing information in the response packets can carry the
next-hop IP address. The next-hop interface of the route is the interface of
the learned route. The next-hop IP address can be one of the following:
the source IP address of the response packets that learned the route; the
next-hop IP address carried in the routing information. If the next-hop IP
address in the routing information and the interface that receives the
routing information are in the same subnet, the next-hop IP address of the
route is the next-hop IP address in the routing information. Otherwise, the
next-hop IP address of the route is the source IP address of the response
packets. The purpose is to implement the re-direction function.
The following example illustrates the application of the next-hop address

of the routing entry in RIPv2.

Figure 11-7 RIP route redirection
As shown in the preceding figure, switch-A runs RIP, switch-B runs RIP
and OSPF, switch-C runs OSPF. In switch-B, the RIP redistributes the
learned OSPF route 11.0.0.0/8. As a result, switch-A can learn the route
11.0.0.0/8 that reaches the subnet. When switch-A learns the route, by
default, the next-hop is switch-B, namely, 10.1.1.2. Then, the packets
forwarded from switch-A to destination subnet 11.0.0.0/8 reach switch-C
through switch-B.
To solve the problem, when switch-B advertises route 11.0.0.0/8 to

switch-A, the next-hop of the route is specified to switch-C, namely
10.1.1.3. When switch-A learns the route, it specifies the next-hop of
route 11.0.0.0/8 to switch-C, namely 10.1.1.3. Then, the packets
forwarded to destination subnet 11.0.0.0/8 by switch-A are directly
forwarded to switch-C, and the packets doest pass through switch-B.
Route Updates
When a route is learned from the adjacent routing device RIP, in the
following cases, use the route to update the route in the database:
1. The route does not exist in the route database and the metric of the
route is less than 16 hops.
2. The route exists in the database. The source IP address and the source
IP address of the learned route are the same.

3. The route exists in the database, but the metric is equal to or greater
than the metric of the learned route.
To accurate the number of metric hops, when the routes in the route
database are advertised, the metric increases 1. The maximum of the
metric is 15. When the metric is greater than 15, the route is considered
to be unreachable.
R I P Ti mer
Running
invalid timer on
Valid nexthops of routes
Invalid Timer timeout
or metric is updating Route Update
to 16 (Unreachable)
Running Holddown
Timer timeout Running
holdown timer Invalid +
Invalid flush timer on
and Holddown routes
flush timer on
routes
Flush Flush
Timer timeout Timer timeout
Flush
(Delete route from
database)
Figure 11-8 Status change of RIP route entry
RIP protocol contains four timers, Update timer, invalid timer, holddown
timer, and flush timer. The description of each timer is as follows.
Table 1-4 RIP protocol timers
Timer Operation Default Startup Function

Object Value Condition
Update Route 30 The timer Advertise all route information to the
Timer Database seconds is started adjacent routing device Rip through the
repeatedly response packets periodically.
when the 1. Ensure the database synchronization
RIP is between routing device RIPs.
started. 2. Refresh the previously advertised
routes. As a result, the advertised
routes do not time out on other routing
devices.
Invalid Timer Next hop 180 Started A route entry will become invalid if it is
of routing seconds when one not updated in certain time. The change
entry route of status is shown in the preceding
entry is figure. The timer can be updated by the
learned response packets.
When the route entry becomes invalid,

shut down the timer.

Holddown Route 180 Started In a certain time after the route entry
Timer entry seconds when the becomes invalid, the route entry cannot
route be updated by the response packets to
entry prevent the loopback. The change of
becomes status is shown in the preceding figure.
invalid. When the route entry gets out of the
holddown status, shutdown the timer.
Flush Timer Route 240 Started A route entry is deleted from the
entry seconds when the database after it becomes invalid for a
route certain time. The change of status is
entry shown in the preceding figure.
becomes When the route entry is deleted, shut
invalid. down the timer.
Pre vent ion of R I P Route Lo opbac k

The RIP protocol is dynamic routing protocol based on the distance vector
algorithm. It does not know the status of the entire network topology.
When the network sends the changes, the routes of the entire network
take some time to perform convergence. As a result, the route databases
of each route devices are not synchronized in certain time. At the same
time, the topology of the entire topology is not known, so the route
loopback may be generated. The RIP protocol uses the following
mechanism to reduce the possibility of route loopback caused by the
inconsistency in the network:
Counting to Infinity
The RIP protocol allows the maximum hop of 15. The destination greater
than 15 hops is considered to be unreachable. The number restricts the
network size and prevents the infinite transfer of routing information. The
routing information travels from one routing device to another. The
number of hops increases 1 at each transfer. When the number of hops
exceeds 15, the route will be deleted from the routing table.
Split Horizon
Split-horizon prohibits a router from advertising a route back out the

interface from which it was learned. The route learned from one interface
is advertised from the interface. Consequently, the route loopback may
occur.
The rules of the RIP split horizon are as follows: if the routing device RIP
learns routing information A from an interface, the response packets sent
to the interface cannot contain the routing information A.
There is a special case for split-horizon, when an interface receives route

request packets from an interface, do not perform split-horizon for the
response of the packets.

Poisoned Reverse
The purpose of poisoned reverse and the purpose of the split horizon are
the same, but the operations are different.
The rules of the RIP poisoned reverse are as follows: if the routing device
RIP learns routing information A from an interface, the response packets
sent to the interface cannot contain the routing information A, but the
metric is set to 16 (namely unreachable).
Compared with split horizon, the poisoned reverse has the advantage that:
when the number of hops is set to unreachable, notify the routing
information to the source routing device, if the route loopback already
exists, the route loopback can be broken immediately. But for the split
horizon, it has to wait until the wrong route entry is deleted for timeout.
The disadvantage is that: the poisoned reverse increases the size of the
response packets. As a result, the consumption of the protocol bandwidth
is increased.
Holddown Timer
The purpose of the holddown timer is to prevent the response packet

update after the route entry becomes unreachable for certain time.
Through the hoddown timer, before the route device receives the message
that the route is unreachable, the unreachable route will not be updated
by the received response packets. The route entry information in the
received response packets may be the packets advertised by itself.
Triggered updates
When the route changes, it is advertised to the adjacent routing device

RIP through the response packets.
The poisoned reverse and split horizon break the route loopback composed
of any two routing devices. The route loopback composed of three or more
routing devices may also occur until the route metric is accurate to infinite
(16). The triggered updates can quicken the convergence of the route.
Then, the time for breaking the route loopback is shortened.
R I Pv1 and RI Pv2

RIPv2 is the expansion of RIPv1. RIPv2 is the trend of the technology
development. At the same time, RIPv2 overcomes some disadvantages of
RIPv1. The main mechanism of RIPv2 is the same as that of RIPv1. It
improves and expands the RIPv1. The difference between the two
protocols is as follows:
Table 1-5 Difference between RIPv1 and RIPv2

Attribute RIPv1 RIPv2

Route mask Cannot release the route mask. Can release the route mask; the
The mask is obtained through the classless route is supported.
route class and the classless route
is not supported.
Packet sending Send in the broadcast Send in the multicast (224.0.0.9)
(255.255.255.255) mode; it mode; it consumes lots of network
consumes lots of network resources.
resources.
Authentication Does not support authentication Authentication information field is
expanded; support the plain text
and MD5 authentication.
Route tag Does not support advertisement Support advertisement and
and learning of route tag. learning of route tag.
Next hop Does not support the Support the advertisement of next
advertisement advertisement of next hop. hop to implement the function of
route redirection.
IRMP Dynamic Routing Protocol

Main contents:
Related terms
Introduction to IRMP protocol
Related Terms of IRMP Protocol

downstream router: (for the subnet) it is the router nearer to the
destination subnet;
successor: the next router passed from the current router to the
destination router;
reported distance: the distance reported by the neighbor to the current

router;
feasible successor: the router that is nearer to the destination router

than the current router.
Introduction to IRMP Protocol

The technology (DUAL-Diffused Update Algorithm) used by IRMP (Internal
Routing Message Protocol, compatible with EIGRP) is similar to the
distance vector protocol.
The router only uses the information provided by the direct-connected

neighbor to make the routing decision. The received information can

perform the next filtering because of the security or communication

project.
The router only provides the used route information for the direct-
connected neighbor. The information sent to the neighbor also can be
filtered at first, and then be sent.
However, there is some difference between IRMP and distance victor,

which makes IRMP more excellent than the traditional distance vector.
1. IRMP saves all routes sent by all neighbors in the topology table, but
not just save the best route received up to now;
2. When IRMP cannot access the destination, but there is no substitute

route, it can query the neighbor (the topology table is one data
structure and IRMP uses it to save all route information received from
the neighbor).
IRMP Types
Opcode Type
1 Update
3 Query
4 Reply
5 Hello
6 IPX SAP (does not support for the moment)
Different TLV Defined in IRMP

No. TLV Type
Common TLV types:
0x0001 IRMP Parameters
0x0003 Sequence
0x0004 Software Version 12
0x0005 Next Multicast Sequence
The TLV types of IP:
0x0102 IP Internal Routes
0x0103 IP External Routes
Other types are not supported for the
moment
IRMP Unicast and Multicast Sending

(Multicast Address 224.0.0.10)
Type/Reliability Unreliable Reliable
Unicast ACK Reply
Multicast Hello Update Query

In the following cases, IRMP adopts the unicast:
When transmitting packets (X.25 and frame relay) on the transmission

medium that does not support the hardware multicast;
When re-transmitting the packet to the neighbor that does not reply
within the multicast timeout interval;
IRMP Packet Format (Take One IP Packet

with IRMP Data as an Example)
Version (4) header length Service Total Length (0045)
(5) Type (00)
ID (05f7) 00 00
Life time (02) Protocol Header check sum (c75d)
(58)
(IRMP)
Source IP address (0a010102)
Destination IP address (E000000a)
IRMP version (02) Operation Check sum (e655)
code (01)
Flag (00000000)
Sequence (00000003)
Response (00000000) (when the packet is ack packet, it is not 0)
AS number (00000001)
TLV type (0102) Length (00 1d)
Next step (00000000)
Delay (0001f400)
Bandwidth (00000100)
MTU (008000) Steps (00)
Reliability (ff) Load (01) Reserved (0000)
Prefix length (20) Destination
By default, the hello packets are sent with an interval of 5s; keep timer as
15s (for the NBMA interface with the bandwidth lower than T1, the two
values are 60s and 180s respectively).
OSPF Dynamic Routing Protocol

Main contents:
Terms of OSPF Protocol
Introduction to the OSPF protocol
OSFP features

Terms of OSPF Protocol

AS- Autonomous System: a group of routing devices exchanging information
through the same routing protocol.
Area: the collection of routing devices, which has such topology database:
OSPF divides one AS into multiple areas; the topology of one are is
invisible to another area, which reduces the number of routing information
in an AS. The area is used to contain link state updates and enables the
administrator to create hierarchical network.
areaID-the 32-bit ID of the area in the AS.
IGP- Internal Gateway Protocol: the routing protocol running on the

routing devices of an AS system, each AS system has an independent IGP;
different AS system may run different IGP. OSPF is one kind of IGP.
Router ID-a 32-bit number, it is granted to the OSPF, as a result, each

routing device can identify the routing device in the AS.
Point To Point network-the network composed of a pair of routing

devices, such as a 56kb serial port connection.
Broadcast Networks-the network supports multiple (more than 2)

routing devices. The routing devices can exchange information with all
network (broadcast) routing devices. The neighbor routing device is
dynamically detected by the OSPF hello packets. If the network has the
multicast capability, OSPF also uses multicast. Each pair of routing device
on the network is supposed to directly connect with the opposite party.
The Ethernet is an example of the broadcast network.
Non-broadcast Multi-Access network-the network supports multiple

(more than 2) routing devices. But it has no broadcast capability. The
neighbor is maintained by the Hello packets of the OSPF. Owing to the lack
of broadcast capability, configuration is required in the case of detecting
neighbors.
OSPF can exchange information in two types of non-broadcast network: 1.

Non-Broadcast Multi-Access, OSPF in the network is similar to the
broadcast network; 2. Point-to-MultiPoint, OSPF processes the network
like processing multiple point-to-point collection.
Interface-the connection between the routing device and the reachable

network; each interface has the relevant status information, which can be
obtained through the bottom layer or routing protocol. Each interface has
one associated and unique IP address and mask (except the unnumbered
point-to-point connection).
Neighbor-two routing devices have an interface connecting to the same

network. The neighbor relationship is maintained through the OSPF hello
packets.

Adjacency-OSPF creates adjacency between neighbor routing devices and

then they can exchange routing information. Not every pair of neighbor
routing devices can be adjacent.
LSA- Link state advertisement: the data unit for describing local routing
device or network state. For a routing device, the interface state of the
routing device and the adjacency state are contained. The advertisement
of each link is sent to the entire area. The routing device uses the
collected link state advertisement to form the link state database.
Stub Area-the area that has only one interface connected with the
external. Category 5 LSA cannot be flooded to the area.
Backbone Area-Composed of all area boarder routing devices and the

links among them.
ASE- AS external route: the routes obtained by the non-OSPF protocols,

such as BGP, RIP, and static configuration route.
DR- Designate Router: to reduce the number of adjacencies; the

adjacencies are formed in the multiple access network, such as Ethernet,
token ring, and frame relay. The reduction of the number of formed
adjacencies lowers the scale of the topology database. The DR forms
adjacencies with all routing devices in the multiple-access network. The
routing device send the LSA to the DR, and the DR sends the LSA to the
entire network. Each routing device has a convergence point for sending
information. At the same time, each routing device exchanges information
with other routing devices in the network.
BDR- Backup Designate Router: applied in a multi-access network; the

task is to takes over the DR when it fails.
Inter-Area Route-a route generated in non-local area
Intra-Area Route- a route in an area
Flooding-a technology distributing LSA among routing devices, as a result,

the routing devices running OSPF synchronize the link state database
Hello-hello packets: to create and maintain the neighbor relationship In

the broadcast network, the hello packets can discover the neighbor routing
devices dynamically; in addition, hello packets can be used to select a DR
in the network
NSSA- Not-So-Stubby-Areas: allow the external route to advertise to the

OSPF AS; at the same time, for other parts of the AS system, the stub
area features are reserved. In NSSA ASBR, type 7 LSA is generated to
advertise external routes of the AS area; when the ABR of the NSSA
receives type 7 LSA and the P bit is set to 1, type 7 LSA is converted to
type 5 LSA to other parts of the AS area.

Introduction to OSPF
Open Shortest Path First Protocol (OSPF) is a dynamic routing protocol. It
can detect the network change in the AS and form new route after
convergence for some time. The convergence time is short and the routing
information is limited. In the OSPF protocol, each route maintains one
network topology database describing the AS. Each specific routing device
has the same database. Each record of the database is the local state of
the specific routing device The routing device distributes the local states
through the flooding mode in the AS.
All routing devices run the same algorithm in parallel. Each routing device
uses the link state database to generate a shortest path tree with itself as
the root. The shortest path tree provides the route to each destination in
the AS. The external routing information serves as leaves in the tree.
OSPF allows the combination of multiple networks. The combination is

called an area. The topology information in an area is invisible to other
areas in the AS. The information shielding can reduce the route traffic. In
addition, the determination of interior route in an area requires the
topology information about the area. Then, the routing information in the
area can be protected. Generally, the route in the area is determined by
the topology of the area. An area is usually a subnet.
OSPF allows flexible configuration of the subnet. Each route distributed by

OSPF has a destination and a mask. The datagram is routed to the best
matched route. The host route is considered to be the subnet of 0xffffffff.
All OSPF interactions are authorized. It means that only the trusted
routing devices can participate in the route of AS. Multiple authorization
configurations can be used. Actually, each subnet can be configured with
independent authorization.
External routes (such as exterior gateway protocol: BGP) is advertised in

the AS. External routes and the OSPF link state data are separated. Each
external route marks an advertisement routing device. Then, the AS
boarder routing device can transmit information.
The hierarchy of the OSPF in the network protocol stack

Figure 11-9 Hierarchy of OSPF in the network protocol stack
AS Di vis ion in O SPF
Figure 11-10 OSPF area, AS division
SW1, SW2, SW3, and SW4 comprise area 1; SW3 is the area boarder
router (ABR);
SW6, SW7, and SW8 comprise area 2; SW6 and SW8 are the area boarder
router (ABR);
SW8, SW9, and SW10 comprise area 3; SW8 is the area boarder router
(ABR);
SW5 is the AS boarder router (ASBR).
SW3, SW5, SW6, and SW8 comprise the backbone area 0.

Process o f OSPF
The basic idea of OSPF: in the AS, each routing device running OSPF
collects the link state. Broadcast the link state in the entire system
through the flooding mode. Then, the entire system maintains the
synchronized link state database. Each routing device calculates a shortest
path tree with the device itself as the root and other network nodes as the
leaves through the database. Then, the best routes to many places in the
system are obtained.
The routing devices running the OSPF form an AS. The AS can be divided
into multiple areas. For each routing device in the area, an AS topology
(link state database is required).
When the OSPF is enabled in a routing device, it creates relationship with

other routing devices in the area. By sending hello packets, other routing
devices know its existence. It knows the existence of the opposite part by
receiving the hello packets. Then, the neighbor relation with other routing
devices is created.
If the network type is broadcast or NBMA network, the routing device A

will select the DR and BDR from the known neighbors. In addition, it
creates adjacency with them. As a result, the data traffic is reduced for all
routing devices create adjacencies only with the DR and BDR.
If the network type is point-to-point or point-to-multiple point, routing

device A attempts to create adjacency with all neighbors. In this case,
routing device A exchanges network topology with neighbors that have
created adjacencies.
Routing device A exchanges network topology through the database

description (DD) with adjacent neighbor-routing device B.
When routing device A discovers updated route in routing device B,

request the route from routing device B through the link state request.
Routing device B also requests updated route from routing device A. After
the two parties receives the requests from the opposite party, the two
parties send detailed routing information to the opposite party through the
link state update packets. And confirm the receiving of link state update
packets (link state ACK).
After the topology is obtained, routing device A runs the SPF algorithm to
generate one shortest path tree with itself as the root and records it in the

routing table. The route to the destination in the future is obtained from
the routing table.
In the area, each routing device exchanges network topology with

designated routing device continuously. Therefore, the routing devices in
the entire area have the same network topology.
The area boundary router belongs to multiple areas at the same time.
Therefore, the topology of the home area of routing device A will be
advertised to other areas, and the topology of other areas will be
advertised into the area. Through the exchange of topology in the
boundary routing devices, the home area of routing device A learns the
network topology of the entire AS area. In the OSPF, the boundary routing
devices form the backbone area.
When the AS boundary router knows the AS external route, the AS

boundary router will advertise the routes to the internal of the AS. As a
result, routing device A can obtain a topology of the entire network.
O SPF Fas t Con vergence

The fast convergence function optimizes the procedures in the process of
route convergence to quicken the route convergence. The following items
are improved.
1. Interval of triggering SPF calculation
Generally, the interval of triggering route calculation is 5 seconds, which

causes the low speed of route convergence. The optimization for the
interval of triggering route calculation is based on the frequency of the
network changes. The interval is automatically set. When the network
changes frequently, the interval is increased to prevent repeated
calculation of routes in short time. When the network changes rarely, the
interval is reduced to trigger route calculation quickly to quicken the
convergence.
2. SPF route calculation
The main feature of the route calculation is incremental calculation. SPF

algorithm divides the network information into two parts: one part is the
network topology composed of network top point (corresponding to
routing devices, shared network segment) and the sides (the link between
routing device and the shared network segment); the other part is the
leaves mounted on the top point (network route and host route). The
routing devices performing route calculation is called Root. The first step

of the route calculation is to calculate one shortest path tree a root; the
second step is to calculate the leaves (routes) on the top point according
to the shortest path tree. The increment for the shortest path tree in the
network topology is called incremental SPF (ISPF); the incremental
calculation for the leaves (routes) are called Partial Route Calculate (PRC).
Incremental calculation can significantly improve the calculation
performance of the routing devices and decrease the CPU load.
SPF Calculation Self-Adaptive Timer
To quickly respond to the changes of network information, but do not

perform route calculation frequently, the Self-adaptive timer is adopted.
The self-adaptive timer can dynamically change the interval according to
the exponential backoff law and the preset parameters.
The self-adaptive timer has three configurable parameters: initial interval,

incremental interval, and the maximum interval. The first interval is the
initial interval, and the second interval is incremental interval. Then, the
interval is two times of the previous interval, namely, incremental interval
x 2n , until the maximum interval is reached.
Generally, the initial interval can be set to 100 milliseconds, which can
respond to burst change quickly; the incremental interval can be set to
100 milliseconds or 1 second; the maximum interval can be set to 5
seconds or 10 seconds.
The self-adaptive timer is a cyclic timer. The interval is increasing. The

initial interval is short, so it can respond to the network changes quickly.
In addition, the interval is increasing, which prevents the frequent route
calculation caused by the frequent network change. In the following three
cases, the self-adaptive timer will be reset or stopped.
1. The interval reaches the maximum for three timers consecutively.

If any route calculation request exists in this case, the next
interval will be reset to the initial interval. Otherwise, the timer
will be stopped.
2. If the interval between new route calculation request and the

previous route calculation exceeds the maximum interval, reset
the interval of the timer to the initial interval.
3. The protocol process is reset.
Incremental SPF (ISPF)

In the SPF calculation, according to the link state database, a shortest

path tree with the calculation routing device as the root is formed.
Calculate the route according to the shortest path tree. The OSPF protocol
saves its own specific link information. The information does not reflect the
topology and the relation between the route and topology. The shortest
path tree must be determined through the SPF calculation and then
calculate the route. But, the shortest path tree is not saved. If any
information changes, the shortest path tree is deleted. Then, use the SPF
algorithm to re-calculate.
The ISPF only processes the network topology information, namely, only
calculates the shortest path tree. By reorganizing the link, the ISPF forms
a graphical database reflecting the network topology. The calculated
shortest path tree is saved in the graphic. When the link state changes,
the ISPF determines the affected network topology. Then, only the
affected parts are calculated, instead of the entire network topology.
Figure 11-11 ISPF calculation
As shown in the preceding figure, RTA is the root node (the routing
devices performing calculation). When the cost of RTC-> RTD (blue link) is
changed into 50 from 100, the affected parts are RTD and RTE. Other
routing devices are not affected. ISPF will judge the range of the effect.
Then, only the routes released by RTD and RTE are calculated.
If the positions of the network topology changes are different, the affected
range is different. The spent time of the ISPF calculation is different.
Therefore, the spent time is different, even in the same network structure.
If the sides of the root node change (RTA->RTB and RTA->RTF), the
affected range covers the entire topology. In this case, ISPF is similar to
re-calculate all.

PRC Technology
For IGP, any route is a leaf in the network node. The expression leaf can
reflect the relation between the route and the network node. According to
the root node, if the shortest path of the network node is determined, the
shortest path of the route released by the node is determined. Therefore,
PRC uses the shortest path tree calculated by ISPF to calculate the leaf
route. When any routing information changes, PRC determines the
changed route (leaf). Then, the route is selected and updated. (based on
the existing calculation of the ISPF)
Owing to the restriction of the link information format in the OSPF protocol,
the routing information and the network node (released routing devices)
are not directly associated. The same routes released by different devices
are also not directly associated. Therefore, the PRC needs to re-organize
the database.
Take the route as the base point; organize all elements that release the
route. As a result, select the best route from all elements in the case of
calculating routes. At another aspect, take the releaser as the base point,
all routes released by the releaser are assembled together. As a result,
when the ISPF announces that the shortest path of a node changes, all
routes released by the node will be directly updated.
Link State Database (L SD B) of the O SPF

The LSDB of the OSPF contains the information about the entire area. It
exchanges information with the adjacent neighbor to maintain the
synchronization of the LSDB in the entire area. It enables the OSPF to
dynamically process the route changes through the hello packets and the
link state update packets.
The LSDB is composed of link state advertisements (LSA). The LSA can be
divided into 6 categories:
1. Router-LSA: generated by the routing devices in the area. It describes

the link state of the routing device and is flooded only in the area.
2. Network-LSA: generated by the DR in the area. It describes the

reachable routing devices in the area on is only flooded in the area.
3. Summary-Net-LSA: generated by ABR. It describes the network

information about other areas.

4. ASBR-Summary-LSA: generated by ABR. It describes the ASBR host

5. AS-External-LSA: generated by ASBR. It describes the external route

information outside of the AS.
6. NSSA-LSA: generated by the ASBR. It describes the external route

information outside of the AS (it is flooded only in the NSSA area).
The boarder route of the area assembles the information about the local
area into a summary_LSA. It is flooded to the boarder routers of other
areas in the AS. The boarder routing devices analyze the received
summary_LSA and then generate summary_LSA and then flood to each
area. All boarder routers and the links among them form the backbone
area. Backbone areas are mutually reachable. They can be connected
physically or through the virtual link. In the case of configuring the virtual
link, the passed area must be transit area, instead of stub area.
The ASBR of the AS sends the external routing information to all nodes
except the stub area in the AS. The routing devices in the stub area are
directed to the ASBR through the default route.
NSSA allows external routes to be advertised to the OSPF AS. In addition,

the stub features of other parts in the AS are reserved. ASBR of the NSSA
generates NSSA External LSA (type 7) to advertise external routes. The
NSSA External LSAs are flooded in the NSSA are but terminated in the
ABR. When the ABR of the NSSA receives the type 7 LSA and the P bit is
set 1, the type 7 LSA will be converted into type 5 LSA to other AS areas.
If the P bit is set to 0, it will not be converted. Therefore, the NSSA
External LSA will not be advertised to external NSSA.
O SPF Pac ket Encapsula tion

The OSPF packet is composed of multiple encapsulations. The external
layer of the packet is IP header. In the IP header, the encapsulated packet
can be one of the following five types. The format of each type of packet
starts with the OSPF header with unified format. The packet data of the
OSPF packet varies with the packet format.

Figure 11-12 OSPF packet encapsulation
OSPF Packet Header
Figure 11-13 OSPF packet header
Version: OSPF version, the current version is 2.
Type: the packet type at the later part of the OSPF header. The OSPF has
five types of packets. Hello packets, type=1; database description packets,
type=2; link state request packets, type=3; link state update packets,
type=4; link state acknowledgement, type =5.

Area ID: the area where the packet is generated; when the packet passes
the virtual link, area ID is 0.0.0.0.
Checksum, the checksum of the entire packets.
Au type: the authorization mode
Authentication: essential authorization information about the packet

specified by the AU type.
Hello Packet Format
Figure 11-14 hello packet format

Hello packets are used to create and maintain the adjacency. It contains
the consistent parameters when the neighbor creates the adjacency.
Network Mask: the mask of the interface where the packets are generated
Hello Interval: the interval of retransmitting hello packets
Option: see the option domain in the OSPF packets.
Router priority: it is used in the case of selecting DR and BDR. When the
router priority is 0, the routing device does not have the selecting rights.
Router Dead Interval: if no hello packets are received in the router dead
interval, the neighbor is considered to be down. Delete the neighbor.
Designated Router: the IP address (not router ID) of the DR selected by

the interface generating the packets.
Backup DR: the IP address of the BDR selected by the interface generating
the packets
Neighbor: the list of the neighbors that can receive hello packets at the
interface generating the packets in the router dead interval.
Database Description Packet

Figure 11-15 format of the database description packets
Interface MTU: the maximum IP packets that can be transmitted when the
interface generating the packets is not fractionized When the packets are
transmitted in the virtual link, the interface MTU is set to 0.
Option: see the option domain in the OSPF packets.
I-bit: initial bit, when the packet is the initial packet of the DD packet
sequence, the bit is 1.
M-bit: More bit, when the packet is the last packet of the DD packet
MS-bit: Master/Slave bit, when the Master is set to 1 in the case of

generating packets, the slave is set to 0.
DD Sequence Number: sequence number of the DD packets, set by the

Master
LSA Headers: the LSA header list of the link state database
Link State Request Packet

Figure 11-16 Format of the link state request packets
Link State Type: for describing the LSA type
Link State ID: works with link state type and advertising router to identify
a LSA.
Advertising Router: the router ID of the routing device generating the LSA.
Format of the Link State Update Packet

Figure 11-17 Format of the link state update packets
Number of LSA: the number of LSAs contained in the packet
LSAs: the list of the LSAs sending updates
Format of the Link State Acknowledgment Packet

Figure 11-18 Format of the link state acknowledgement packets
LSA headers: the LSA headers acknowledged
LSA header
Figure 11-19 LSA header
Age: the duration after the LSA is generated
Options: see the option domain in the OSPF packets.

a LSA.
Advertising Router: the router ID of the routing device generating the LSA
Sequence Number: the sequence number of LSA, when new instances of

LSA are generated, it increases.
Checksum: the checksum of the LSA except Age
Length: length of LSA, with the unit of byte
Format of Router LSA Packet
Figure 11-20 Format of the router LSA packet
V: Virtual Link Endpoint bit; set the bit when the routing device generating
the packet is one end of a virtual link

E: External bit, set the bit when the routing device generating the packets
is ASBR
B: External bit, set the bit when the routing device generating the packets
is ASBR
Number of Links: number of links described in LSA
Link ID: identify a link of the routing device
Link Data: the data of the link, the meaning varies with the link type
Link type: the type of the link
Number of TOS: the cost of the TOS (type of service), set for the forward
compatibility of protocol
Metric: cost of the link
TOS: Type of the service
TOS Metric: the cost related with the service type
Format of Network LSA Packet

Figure 11-21 Format of the Network LSA packet
Link State ID: for the Network LSA, it is the IP address of the DR interface
Network Mask: the subnet mask identifying network
Attached Router: the list of the routing devices adjacent to the DR in the
network
Format of the Network and ASBR Summary LSA Packet
Figure 11-22 Format of the Network and ASBR summary LSA packet
Link State ID: for type 3 LSA, it is the IP address of the advertised
network or subnet; for type 4 LSA, it is the router ID of the advertised
ASBR.
Network Mask: for type 3 LSA, it is the mask of the advertised network or
subnet; for type 4 LSA, the domain is set to 0.
Metric: the cost of the destination route
TOS: Type of the service
TOS Metric: the metric related with the service type

Format of the Autonomous System External LSA Packet
Figure 11-23 Format of the Autonomous System External LSA packet
Link State ID: for the ASE LSA, it is the IP address of the destination
Network Mask: the network or subnet mask of the advertised destination
E: External metric bit, the type of the external cost used by the route If
the E bit is set to 1, the cost type is E2; if the E bit is 0, the cost type is E1.
Metric: the cost of the route, set by the ASBR
Forwarding Address: the destination address of the generated packets If

the forwarding address is 0, the packets of the advertised destination
should be sent to the ASBR generating the packets.

External route tag: the tag of the external route
Format of NSSA External LSA Packet
Figure 11-24 Format of the NSSA External LSA packet
The meaning of other domains excepting Forwarding Address is similar to

ASE LS.
Forwarding Address: If between the NSSA AS boarder router and the

adjacent AS advertised to the OSPF is the internal route of OSPF, the
forwarding address is the next-hop address. If it is not the OSPF internal
address, the forwarding address is the interface of the routing device.

Option Domain in the OSPF Packets
Figure 11-25 Option domain of the OSPF packets
*: not defined; it should be set to 0.
DC: set the bit in the case of configuring the demand line
EA: set the bit when the source routing device has the capability of
receiving/sending external attributes LSA
N: used only in the hello packets, set it to 1 when the NSSA external LSA
is supported; set it to 0 when the NSSA external LSA is not supported;
when N is set to 1, the E bit must be 0.
P: used only in the NSSA external LSA headers If P bit is set, the ABR of
NSSA must convert type 7 LSA to type 5 LSA.
MC: set the bit when the source routing device forwarded multicast
packets.
E: set the bit when the source routing device received the ASE LSA
packets.
OSFP Features
O SFP Fe atures
1. OSPF is a kind of IGP, designed for using in the AS system
2. The link state advertisement packet is small in size, each

advertisement describes one part of the link state database.
3. Support NBMA; OSPF processes the network like processing LAN-select

DR, generate network LSA. Some configurations are required when the
routing devices discover the network neighbor.
4. In OSPF, the AS system can be divided into multiple areas. It has the
following advantages: 1) the routes in an area and the routes between
areas are separated; 2) dividing the AS system into areas can reduce
the calculation of SPF.
5. Input external information flexibly: each external route in the OSPF is

input in the AS system in a single LSA. It reduces the flooded data
volume. As a result, when a single route changes, it is possible to
update part of the routing table.

6. Four route levels: intra-area, inter-area, external type1, and external

type 2. Then, the route protection of multiple levels is implemented
and the route management of the AS is simplified.
7. Support virtual link: through the configuration of allowing virtual link,

the OSPF can partly remove the restriction over the AS system of the
physical topology.
8. The authorization of the routing protocol: when the OSPF routing

device receives a routing protocol packet, it checks the authorization in
the packet.
9. Flexible metric: in the OSPF, the metric is specified as the output cost
of the routing device interface. The path cost is the total of the cost of
all interfaces. The route metric can be specified by the system
administrator according to the network features (delay, bandwidth, and
cost).
10. Multiple paths with the same cost to the same destination: the OSPF
finds the paths and balances the load.
11. Support subnets of different lengths: the OSPF supports networks of

different lengths using the advertisement destination added with
network mask.
12. Support stub area: when the area is set to stub area, the external
LSAs cannot be flooded to the stub area. In the stub area, the route to
the external destination is specified by the default route.
R esource Cost of the O SPF

Link bandwidth: in the OSPF, the reliable flooding mechanism ensures the
synchronization of the link state database of the routing device. When the
network topology is not changed, single LSA packet update lasts for long
(30 minutes by default). When the size of the database increases, the
bandwidth used by flooding algorithm also increases.
Memory of routing device: the link state database of the OSPF may
become very large, especially when many external link states are
advertised. In this case, the memory of the routing device must be very
large. In the process of updating and synchronizing the link state database,
large amount of memory is used.
CPU usage: in the OSPF, it is related with time of running the SPF
algorithm. Moreover, it is related with the number of routing devices in the
OSPF system. In addition, when the link state database is very large, in
the process of protocol convergence, if large amount of packets should be
exchanged, a great deal of CPU is occupied.

Specify the router role: specify the router in the multi-access network to
receive and send more packets than other routing devices. At the same
time, when the specified router fails, it is switched to a new specified
router. Because of this, the number of the routing devices connected to a
network should be restricted.
Precaut ions of O SPF

Limiting the size of the OSPF system can save the memory of the routing
device.
In the area, to reduce the database size, do as follows: 1. the area can
use the default route, so reduce the external route that should be input; 2.
EGP external gateway protocol can use its own information to pass the
OSPF AS area instead of depending on the IGP (such as OSPF) to transmit
information; 3. You can specify the routing device to be the stub area; 4.
If the external network is regular address, you can summarize the
addresses. After the summary, the external information of the OSPF
decreases dramatically.
Proper E n vi ronm ent

OSPF is suitable for the transmission AS, because: 1. OSPF can contain
lots of external routes; 2. the input of OSPF external information is flexible,
including the forwarding address in the AS External LSA, two types of
external costs (ext type 1, ext type 2); 3. when the external information
changes, the update capability of the OSPF is powerful.
The OSPF is also suitable for the small and independent AS or stub AS,
because: 1. fast convergence; 2. support multiple paths to the destination
with the same cost.
I mproper En vi ron ment

The capability of the OSPF expression policy is limited. It has the policy
mechanism only when four types of route levels are create: intra-area,
inter-area, type 1 and type 2 external routes. When the system needs to
use more complex policy between Ass, run the policy-based EGP between
them.

IS-IS Dynamic Routing Protocol

Main contents:
Terms of IS-IS protocol
Introduction to the IS-IS protocol
Typical application of the IS-IS protocol
Terms of IS-IS Protocol

PDU- Protocol Data Unit, the packet unit carrying protocol data
SPF- Shortest Path First Algorithm
IS- Intermediate System, similar to the router in the TCP/IP, the basic
unit generating routes and transmitting routing information Hereinafter,
the IS and the router have the same meaning.
ES-End System, equivalent to the host system in the TCP/IP. ES does not
participate in the processing of IS-IS routing protocol, ISO has dedicated
ES-IS protocol defining the communication between the terminal system
and the IS.
NET-Network Entity Title, identifies the ISO address of an IS, similar to

the IP address; it can be divided into area ID and system ID.
Area- the routing area divided in the IS-IS protocol, including level-1 area
and level-2 area.
LSP- Link State PDU, carries the link state information that should be
published, including adjacency information and reachable subnet
information.
LSDB- Link State Database, composed of the LSPs generated by all IS

systems in the entire area; describes the adjacent topology and relevant
routing information in the entire area. LSDB has the same backup in each
IS system. The IS system uses the SPF algorithm to calculate the route
according to its own LSDB.
IIH- Intermediate System to Intermediate System Hello PDU, for

discovering and keeping alive the IS neighbor
SNP- Sequence Number PDU, advertising the abbreviation information of

a group of LSP packets, including PSNP and CSNP; for confirming the LSP
packets, request LSP packets, and abbreviated description information of
the LSDB.

PSNP- Partial Sequence Number PDU, one type of SNP packets; for
confirming the LSP packets (point-to-point network) and the request LSP
packets (broadcast network).
CSNP- Complete Sequence Number PDU, one type of SNP packet, used
for advertising the abbreviated description information of the LSDB
Pseudo-node-a virtual IS node of DIS in the broadcast network; for

simplifying adjacent topology of the broadcast network
DIS- Designated IS, an IS system selected from all IS systems in the

broadcast network, responsible for vitalizing a Pseudo-node and
maintaining the synchronization of LSDB of all IS systems in the broadcast
network.
Introduction to the IS-IS Protocol

The Intermediate System to Intermediate System (IS-IS) is an interior
gateway protocol (IGP) based on the SPF algorithm. The basic design
concept and algorithm of IS-IS protocol are similar to that of OSPF. The
IS-IS protocol is based on the link layer. It is irrelevant with the network
layer (IPv4, IPv6, and OSI). Therefore, it is not restricted by the network
layer and is easy to expand.
The IS-IS protocol supports routes of multiple protocol stacks, including

IPv4, IPv6, and OSI. The IS-IS protocol is originally applied in OSI
protocol stack (ISO10589). After expansion, it is applied in routes of IPv4
protocol stack (RFC1195) and IPv6 protocol stack (draft-ietf-isis-ipv6). In
addition, after expansion, it supports the CSPF calculation of MPLS-TE
(RFC3784).
The IS-IS protocol has the following advantages: Good compatibility

(different devices with different expansion functions are compatible), large
network capacity, supporting multiple protocol stacks, smooth upgrade,
and simple and stable protocol. Therefore, the IS-IS protocol is applicable
to large-scale core backbone network.
In this chapter, the IS-IS protocol for IPv4 and IPv6 are described. The
OSI route is not widely used, so it is not described in this document.

I S-I S Protocol Stack St ructure and the

Posit ion in the N et work Prot ocol Stack
Figure 11-26 Structure of the IS-IS protocol stack
As shown in the preceding figure, IS-IS protocol can be classified into

basic part and the application part. The basic part of the IS-IS maintains
the topology of the entire network and uses the SPF algorithm to calculate
the shortest path of each IS in the destination network. After obtaining the
shortest path of each IS system, generate routes according to the
reachable subnet (IPv4, IPv6, OSI, such as 10.0.0.0/8) of the advertised
IS system. (for example, the path to the subnet 10.0.0.0/8 is the shortest
path to the IS system publishing the subnet).
Figure 11-27 Position of IS-IS protocol in the network protocol stack
As shown in the preceding figure, the IS-IS protocol is based on the link
layer, independent from the network layer of the IPv4, IPv6, and OSI
protocol stack. In the broadcast network, the packets are sent in the
multicast mode. In the Ethernet, IS-IS uses the following MAC addresses.
Table 1-6 Multicast address used by IS-IS
Address Name Multicast MAC address Description

AllL1ISs 01-80-C2-00-00-14 The multicast MAC address of layer 1
IS-IS packets
AllL2ISs 01-80-C2-00-00-15 The multicast MAC address of layer 2
IS-IS packets
AllIntermediateSystems 09-00-2B-00-00-05 The multicast MAC address of all IS

systems
AllEndSystems 09-00-2B-00-00-04 The multicast MAC address of all ES
systems
I S-I S Pa cket Struc ture
Figure 11-28 IS-IS packet structure
As shown in the preceding figure, the position of the IS-IS protocol in the
network protocol stack is based on the link layer. Therefore, the IS-IS
protocol is encapsulated in the link layer packet. The routing information
carried in the IS-IS packet are organized in the TLV mode. It can be
organized and expanded flexibly. TLV: data type (1 byte)+data length (1
byte)+ data value (0-255 bytes). At the same time, according to the IS-IS
protocol, the TLV that cannot be identified should be ignored, instead of
being dropped.
IS-IS is based on the link layer and is irrelevant with the network layer,
and the routing information is organized flexibly in the TLV mode. In
addition, the TLV that cannot be identified can be ignored. This determines
the features of easy expanding and smooth upgrade.
The IS-IS protocol is shown in the following table.
Table 1-7 IS-IS protocol packets
IS-IS PDU Packet Type Category Function

Type
IIH Level 1 LAN IS to IS Hello PDU 15 Discover and keep alive layer 1
neighbor on the broadcast network
Level 2 LAN IS to IS Hello PDU 16 Discover and keep alive layer 2
neighbor on the broadcast network
Point-to-Point IS to IS Hello PDU 17 Discover and keep alive layer 1 and
layer 2 neighbors on the point-to-
point network
LSP Level 1 Link State PDU 18 Publish routing information in layer 1

area
Level 2 Link State PDU 20 Publish routing information in layer 2
area
CSNP Level 1 Complete Sequence 24 Advertise the database abbreviated
Numbers PDU description information to the layer 1
neighbor
Level 2 Complete Sequence 25 Advertise the database abbreviated
Numbers PDU description information to the layer 2
neighbor
PSNP Level 1 Partial Sequence Numbers 26 Request or confirm LSP packets from
PDU layer 1 neighbors
Level 2 Partial Sequence Numbers 27 Request or confirm LSP packets from
PDU layer 2 neighbors
N ET of I S
Figure 11-29 IS-IS NET
When the IS-IS protocol is used to route for the TCP/IP protocol, it is still a
CLNP protocol of ISO. In the OSPF protocol, use the router ID to identify a
routing device. In the IS-IS protocol, use an ISO network address to
identify a routing device (IS). The ISO network address is the NET
(Network Entity Title). The description of NET is shown in the preceding
figure. The example in the figure is: NET 47.0000.0000.0000.0011.00.
Area ID is used to identify the layer 1 area. Level-2 Area is the backbone
of a network. Only one level-2 area is allowed. Therefore, ID is not
required.
System ID is used to identify an IS in an area. It must be unique in an IS-

IS AS.
SEL (NSAP Selector, also N-SEL), is similar to the protocol ID in the IP.
Different transmission protocol corresponds to different SEL. In IS-IS, all
SELs are 00.
Note the description of NET is for the routing purpose of the TCP/IP
protocol in the IS-IS. NET is defined in the ISO8348.

H ierarch ical Topolog y of I S - IS
Figure 11-30 Hierarchical topology of IS-IS
Area Division of IS-IS Routing Domain
The preceding figure illustrates the two-layer network topology of the IS-
IS protocol. A typical IS-IS network is composed of a level-2 area serving
as the core backbone network and multiple level-1 areas serving as the
access network. Each level-1 area uses one or multiple Level-2 Switch to
access the level-2 area. Each level-1 area is connected through level-2
area. Then, a level-2 network topology is formed. In an IS-IS network,
there can be one level-1 area or one level-2 area. More detailed area
division is not required.
Route Learning in the IS-IS Area
The LSDBs of each area are independent. They are also independent in
SPF routing calculation. The function of dividing areas is to divide the
entire network into many small routing domains. Then, the size of the
LSDB is reduced. Consequently, the consumption of the memory and the
SPF calculation is reduced. But, a new problem occurs; the SPF calculation
can only implement the route learning in the area. How the route learning
should be performed between areas?
Route Learning Between the IS-IS Areas
According to the preceding topology, the level-1 areas are connected

through Level-2 area. If the problem of the route between level-1 area
and level-2 area, the entire network can be interconnected.
Level-1 Area and Level-2 Area are connected through Level-2 switch.
Level-2 Switch runs the level-1 protocol and level-2 protocol of IS-IS at
the same time. To solve the problem of route between level-1 area and

level-2 area, deal with level-2 switch. Level-2 switch advertises the route
learned from level-1 area to level-2 area, advertises the attach tag to
level-1 area to show that it is connected to level 2 core network.
Learning Routes of Level-2 Area Reaching Level-1 Area
On level-2 switch, redistribute the routing information of level-1 area

calculated by level-1 SPF to the level-2 routing information for publishing.
As a result, all switches in the level-2 area can learn the routes of all
subnets that reach the level-1 area.
Learning Routes of Level-1 Area Reaching Level-2 Area
Mark the attach tag in the level-1 routing information published on level-2
switch. It indicates that the route is connected to the level-2 core network.
As a result, all switches in the level-1 area generate a default route to the
level-2 switch. Then, all switches in the level-1 area have the default route
reaching level-2 area.
C reatio n of Neighbor a nd Generat ion of

Adjacenc y Infor mation in I S - IS Protoco l
For the IS-IS protocol, the interface network can be classified into point-
to-point network and broadcast network. The neighbor creation and the
generation of adjacency information are different in the two interface
network types.
Designated IS
The designated IS (DIS) only exists in the broadcast network. It is

selected by all the IS systems in the same broadcast network. The
selection of the DIS is based on the priority of the interface connecting to
the broadcast network in each IS system and the SPNA address (in
Ethernet, it is the MAC address; in other networks, it is the IS system ID).
First, select the DIS with higher priority. When the priorities are the same,
select the greater SNPA address.
The functions of the DIS are as follows: 1. create the Pseudo-node,

generate and publish the adjacency information about the pseudo-node; 2.
Send the CSNP packets periodically to ensure the synchronization of the
LSDB in all IS systems on the broadcast network.
Pseudo-node

The Pseudo-node network only exists in the broadcast network. The

purpose is to simplify the adjacent network topology of the route
calculation. It is generated by the DIS. Pseudo-node has all IS systems
adjacent to the broadcast network. But no neighbor exists. The adjacency
information including Pseudo-node generates its own adjacent network
topology, as shown in the preceding figure.
Neighbor ID
Figure 11-31 IS-IS neighbor ID
The network node in the adjacent network topology is identified using the
neighbor ID in the LSDB, as shown in the preceding figure. There are two
types of nodes in the adjacent network topology: 1. IS, in its neighbor ID,
the system ID is its own system ID, the Circ ID is always 0x00; 2. Pseudo-
node, created by the DIS; in its neighbor ID, the system ID is the DIS ID,
the Circ ID is the ID of the interface generating the Pseudo-node of the
DIS; it must be non-zero to distinguish the neighbor ID of the IS.
Concepts of Neighbor and Adjacency
Figure 11-32 Relationship between neighbor and adjacency in IS-IS

broadcast network
Key Words Description

Neighbor Discover and keep alive through the hello packets (IIH). It represents the
physical connection between IS systems.
Adjacency The topology around the host advertised to the entire IS-IS routing
domain; describes the reachable network nodes (IS or Pseudo-node), used
to organize the LSP packets. All LSP packets of the IS system form the
LSDB to describe the entire network topology for SPF route calculation.
Relationship Adjacency is generated by the neighbor. For the point-to-point network,
between the adjacent topology is equivalent to the neighbor topology. For the
neighbor and broadcast network, as shown in the preceding figure, Pseudo-node is
adjacency added for bridging in the adjacent topology. But neighbors are all-topology
relation.
Different The difference between the neighbor and adjacency lies in the broadcast

between network. The topology composed of neighbors is physical topology. Direct

neighbor and neighbor relations of all IS systems in the same broadcast network form
adjacency the full-connection relation. The neighbor topology does not contain the
pseudo-node generated by the DIS.
The topology composed of adjacencies is for the topology of the SPF route
calculation. In the same broadcast network, all IS systems show that they
are adjacent to the pseudo-node of the broadcast network. The adjacent
topology contains the pseudo-node.
Creation of Neighbors
In the IS-IS protocol, the discovery and keep-alive of neighbors are

implemented through sending and receiving hello packets (IIH). When an
interface runs the IS-IS protocol, it sends hello packets (IIH) periodically.
The creation of neighbors covers point-to-point network and broadcast
network. After the neighbor is created, hello packets (IIH) should be sent
periodically to keep neighbors alive.
On the point-to-point network, the point-to-point neighbor relation is

created through three-way handshake (RFC3373).
On the broadcast network, the LAN neighbor relation is created through

the three-way handshake. After the neighbor is created, all IS systems on
the broadcast network select a DIS.
Generation of Adjacency Information
The adjacency information describes the IS systems that the host can
reach directly. The generated adjacency information is described in the
point-to-point mode.
For the point-to-point work, the point-to-point format is used. It generates

adjacency information according to the neighbor relationship.
For the broadcast network, to simplify the adjacent network topology, the
DIS virtualizes a Pseudo-node in the broadcast network. All IS systems in
the broadcast network generate adjacency information to the pseudo-node.
The adjacency information of the pseudo-node is the IS systems adjacent
to the broadcast network. The adjacency information of the Pseudo-node
is generated and published by the DIS.
Publis hi ng I S- IS Routin g Inform ation

Content of the Routing Information
The routing information of the IS-IS protocol is organized in the Type

Length Value (TLV) format. It is carried in the LSP packets and thus
cannot be published. The routing information published by the IS-IS
protocol includes two types: adjacency information, used to form the

entire network topology; reachable subnet information, used to describe

the subnet of the host (such as 10.0.0.0/8).
The adjacency information is obtained through the neighbor relationship.

Detail is provided previously.
The reachable subnet information comes from: 1. the directly-connected

routing information of the covered interfaces; 2. redistribute the routing
information about other protocols; 3. route leakage between layers.
Publishing the Routing Information
The IS-IS routing information is carried in the LSP packets. The

information is published to all the IS systems in the entire area through
the flooding mode. Flooding: when an IS system receives an LSP packet, it
saves a copy to the LSDB, and then sends the LSP packet to the interfaces
except the receiving interface.
Why the LSDB between IS systems should be Synchronized
If the LSDBs of each IS are not synchronous, the calculated SPF trees are
not consistent. The route loopback may occur. Therefore, in the entire
area, when the status is stable, ensure that the LSDBs of each IS system
must be synchronous.
Why the LSDBs between IS systems are not Synchronous
The LSDB is composed of LSP packets. The LSDBs are not synchronous
because the IS-IS packets are transmitted based on the link layer, it does
not depend on the transmission mechanism. Therefore, the LSP packets
may be dropped in the transmission process. Ensuring the synchronization
of the LSDBs is to ensure the reliability of the LSP packets. Therefore, for
the point-to-point network and the broadcast network, the synchronization
protection mechanisms are different.
Synchronization Protection Mechanism of the LSDB between IS

Systems in the Point-to-Point Network
In the point-to-point network, the sent LSP packets are acknowledged

through the PSNP packets to ensure the reliable transmission of the LSP
packets. The PSNP packets contain the abbreviated description information
about the LSP packets to be acknowledged.
Synchronization Protection Mechanism of the LSDB between IS

Systems in the Broadcast Network

In the broadcast network, different from the point-to-point network, the

LSDB synchronization is implemented by the DIS. The DIS sends CSNP
packets to the broadcast network periodically advertising the abbreviation
information about the LSDB, namely the LSP packets in the LSDB. In the
broadcast network, after other IS systems receive the CSNP packets, the
IS systems compare the CSNP packets with the LSDB. If it has multiple
LSP packets, the packets will be sent to the broadcast network; if it lacks
certain LSP packets, the PSNP packets will be sent to the DIS to apply for
the LSP packets. As a result, the LSDBs of all IS systems in the broadcast
network are synchronous.
I S-I S Rou te Calcu lation

The route calculation of the IS-IS protocol includes the following two steps:
Step 1: Calculate the SPF tree through the SPF algorithm according to the
network topology composed of the adjacency information of the LSDB. As
a result, the shortest path to each network node (namely the IS) and the
next-hop are obtained.
Step2: According to the information about the reached subnet (such as

10.0.0.0/8) advertised by each network node (namely the IS) in the LSDB,
together with the SPF tree, the route is generated.
Typical Application of the IS-IS Protocol
Figure 11-33 Network topology of the IS-IS typical application
Illustration

As shown in the preceding network topology, there are four switches (A, B,
C, and D), namely four IS systems. The following describes the process of
route learning through the example of switch A learns the subnet
10.0.0.0/8 route of switch D. The metric of each link is 10. The DIS
selected from the Ethernet network is switch B.
Step 1 : Publishing R ou ting Infor mat ion

Generation of Adjacency Information
Figure 11-34 Adjacency topology of the IS-IS typical application
The adjacency information generated by each system forms the preceding

adjacency topology. The adjacency information generated by each IS is as
follows:
Table 1-8 Adjacency information generated by IS in the IS-IS Example
Network System ID Neighbor ID Adjacency Information

Node
IS A 0000.0000.0001 0000.0000.0001.00 Adjacency to B
(0000.0000.0002.02) metric
10
IS B 0000.0000.0002 0000.0000.0002.00 Adjacency to B
(0000.0000.0002.02) metric
10
Pseudo-node 0000.0000.0002(same 0000.0000.0002.02 Adjacency to A
B as DIS) (0000.0000.0001.00) metric 0
Adjacency to B
(0000.0000.0002.00) metric 0
Adjacency to C
(0000.0000.0003.00) metric 0
IS C 0000.0000.0003 0000.0000.0003.00 Adjacency to B
(0000.0000.0002.02) metric
10

Adjacency to D
(0000.0000.0004.00) metric
10
IS D 0000.0000.0004 0000.0000.0004.00 Adjacency to C
(0000.0000.0003.00) metric
10
Generation of Reachable Subnet Information
In the IS D, publish the directly-connected reachable subnet 10.0.0.0/8.

The Metric is 10.
Publishing the Routing Information
Through the flooding of routing information, the LSDB of each IS contains

the preceding adjacency information and the reachable subnet information.
Step 2 : Perform SPF Calculat ion to Get the

Shortes t Path fro m Switch A to Each Swi tch
Figure 11-35 SPF tree of IS-IS route calculation example
In IS-A, according to the information about LSDB, take A as the start point;
use the SPF algorithm to calculate the SPF tree as shown in the preceding
figure. Then, the shortest path (Pseudo-node should be ignored when the
shortest path is obtained) to the IS D obtained is A->C->D. If the Ethernet
interface of A is vlan1, the IP address of Ethernet interface of C is 3.3.3.3,
the next-hop interface of IS D is vlan1, the next-hop address is 3.3.3.3,
and the metric is 20.

Step 3 : Generate Rou te According to

R eachable Subnet
D advertisement can reach subnet 10.0.0.0/8; the metric is 10; the next-
hop and metric reaching D on A is obtained through the SPF calculation.
With the information, A can obtain the IPv4 route: the next-hop interface
to 10.0.0.0/8 is vlan1, the next-hop address is 3.3.3.3. The metric is 30.
BGP Dynamic Routing Protocol

Main contents:
Terms of BGP protocol
Introduction to the BGP protocol
Terms of BGP Protocol

AS- Autonomous System AS is a set of routing devices and hosts in the
same management control domain and policy. The AS number is allocated
by the internet registration organization.
EBGP-BGP between AS systems. An EBGP neighbor is a routing device of

the management and policy control beyond the local AS.
IBGR-the BGP in the same AS. An IBGP neighbor is the routing device in
the same management control domain.
CIDR- Classless Interdomain Routing. CIDR is an address allocation

scheme, used to solve the explosive increase of IP address entry in the IP
routing table of the routing device and to solve the problem of exhaustion.
In CIDR, an IP network is represented by a prefix. The prefix address is
represented by the IP address and the most significant bit.
NLRI- Network Layer Reachability Information NLRI is a part of the BGP

update packets, used to list the collection of the reachable destination.
Ultranet-a network advertisement whose prefix rang is one bit less than
the natural mask of the network. For example, the natural mask of class C
network 202.11.1.0 is 255.255.255.0. If we use 202.11.0.0/16 to
represent the network address, the mask is 16 bits, which is less than 24
bits. Therefore, it is an ultranet.
IP Prefix-It is a kind of IP network address. It indicates the mask bits

forming the network.

SYN-Synchronize Before the BGP advertises the routes, the route must be
in the current IP routing table. Namely, the BGP and IGP must be
synchronized before the route is advertised.
Introduction to the BGP Protocol

Border Gateway Protocol (BGP) is a kind of route selection protocol for
exchanging network layer reachability (NLRI) between route selection
domains. Its main function is to exchange NLRI with other BGP peers. A
BGP peer refers to any device running BGP.
BGP uses the TCP as the transmission protocol (port 179). Then, reliable
data transmission is provided. The retransmission and acknowledgement
of data are implemented by the TCP, instead of BGP. As a result, the
process is simplified. The reliability need not be designed in the protocol.
Create a TCP connection between two routing devices running BGP. Then,
the two routing devices are called peers. Once the connection is created,
the two peer routing devices acknowledge the connection parameters
through exchanging the open packets. The parameters include BGP
version number, AS number, duration, BGP identifier and other optional
parameters. After the two peers negotiate parameters successfully, the
BGP exchanges routes by sending update packets. The update packets
contain the list of reachable destinations passing each AS system (namely
NLRI), and the path attributes of each route. When the route changes,
incremental update packets are used between peers to transmit the
information. BGP does not require refreshing routing information
periodically. If the route does not change, the BGP peers only exchange
keepalive packets. The keepalive packets are sent periodically to ensure
the valid connection.
BG P Message H eader
The BGP message header contains a 16-byte tag, 2-byte length field, and
1-byte type field. The following figure illustrates the format of the BGP
message header.

Figure 11-36 Format of the BGP message header
The header can be followed by data or not. It depends on the message

type, for example, the keepalive message only requires the message
header, and no data is followed.
Marker: the marker field occupies 16 bytes, used to detect the

synchronization loss between BGP peers. If the message type is open, or
the open packets do not contain the authentication information, the
marker fields must be set to 1. Otherwise, the marker field is calculated by
the authentication technology.
Length: the length field occupies 2 bytes. It indicates the length of the
message. The minimum allowed length is 19 bytes and the maximum is
4096 bytes.
Type: The type field occupies one byte. It indicates the type of the BGP
message. The four types of the BGP message are as follows:
Figure 11-8 BGP message types
Number Type
1 Open
2 Update
3 Notification
4 Keepalive
O pen Messages
After the TCP connection is created, the first packet is the open message.
The Open message contains BGP version number, AS number, duration,
BGP identifier, and other optional parameters.

If the open message is acceptable, it means that the peer routing devices
agree with the parameters. In this case, the keepalive message is sent to
acknowledge the open message.
Except the fixed BGP header, the open message contains the following
fields:
Figure 11-37 Format of the BGP open message
Version: the version field occupies one byte. It indicates the version
number of the BGP protocol. When the neighbors are negotiating, the peer
routing devices agree on the BGP version numbers. Usually, the latest
version supported by the two routing devices is used.
My Autonomous System: the field is two bytes. It indicates the AS number

sending the routing device.
Hold Time: the field is two bytes. It indicates the maximum waiting time
when the sending party receives the adjacent keepalive or update
messages. The BGP routing device negotiates with the peer and set the
hold time to the smaller value of the two hold times.
BGP Identifier: the field is four bytes. It indicates the identifier of the BGP
sending routing devices. The field is the ID of the routing device, namely
the maximum loopback interface address or the maximum IP address of
the physical interfaces. You can set the address of the router-id manually.
Optional parameter Length: the field is one byte. It indicates the total
length of the optional parameter fields (the unit is byte). If there are no
optional parameters, the field is set to 0.

Optional Parameters: variable length field. It provides the list of the

optional parameters of the BGP neighbor negotiation.
U pdate Message
The update message is used to exchange routing information between BGP
peers. When you advertise routes to a BGP peer or cancel the routes, the
update message is used. The update message contains the fixed BGP
header and the following optional parts:
Unfeasible Routes Length: two-byte field. It indicates the total length of

the withdrawn route field. If the field is 0, there is no withdrawn routes.
Withdrawn Routes: variable length field. It contains the IP address prefix

list of the routes withdrawn from the services.
Total Path Attribute Length: the field is two bytes; it indicates the total
length of the path attribute field.
Path Attribute: the variable long field contains the BGP attribute list
related with the prefix in the NLRI. The path attribute provides the
attribute information of the advertised prefix, such as the priority or next
hop. The information is for route filtering and route selection. The path
attribute can be classified into the following types:
1. Well-Known Mandatory: the attributes must be contained in the BGP

update message and the attributes must be implemented and
recognized by all BGP vendors. For example, origin, AS_PATH, and
Next_HOP.
ORIGIN: one kind of the well-known mandatory attributes. It gives the

origin of the route update message. There are three possible origins: IGP,
EGP, and INCOMPLETE. The routing device uses the information in the
processing of multiple route selections. Select the route with the lowest
ORIGIN attributes. IGP is lower than the EGP and EGP is lower than the
INCOMPLETE.
AS_PATH: The AS_PATH is a kind of well-known mandatory attributes.

AS_PATH indicates the AS systems that the route in the update message
passes.
NEXT_HOP: It is a kind of well-known mandatory attributes. The attribute

describes the IP address of the next-hop routing device of the destination
listed in the reaching update message.
2. Well-Known Discretionary: the attributes that must be recognized by

all BGP implementations. But the BGP update message can contain the
attribute or not.
LOCAL_PREF: used to distinguish the priority of multiple routes to the

same destination. The higher the attribute of the local priority is, the

higher is the route priority. The local_pref is not contained in the update
message sent to the EBGP neighbor. If the attribute is contained in the
update message from the EBGP neighbor, the update message will be
ignored.
ATOMIC_AGGREGATE: used to warn that the path information is lost in the

downstream routing devices. Some routing information is lost in the route
aggregation for the aggregation comes from different sources with
different attributes. If a routing device sends the aggregation that causes
the information loss, the routing device requires adding the
atomic_aggregate attribute to the route.
3. Optional Transitive: not all BGPs support the optional transitive

attribute. If the attribute cannot be recognized by the BGP process, it
views the transitive tag. If the transitive tag is set, the BGP process
accepts the attribute and transmit it to other BGP peers.
AGGREGATOR: the attribute marks the BGP peer (IP address) performing
the route aggregation and the AS number.
COMMUNITY: the attribute indicates that one destination serves as one

member of the destination group, and these destinations share one
multiple features. The type code of the community attribute is 8. The
community is regarded as a 32-bit value. To facilitate management,
assume that: the community values from 0 (0x00000000) to 65535
(0x0000FFFF) and from 4294901760 (0xFFFF0000) to 429467295
(0xFFFFFFFF) are reserved. The left community value should use the AS
number as the first two bytes. The meaning of the last two bytes can be
defined by the AS. Beyond the reserved values, several well-known
community values are defined.
NO_EXPORT (4294967041 or 0xFFFFFF01): the received routes with the

value cannot be published to the EBGP peers. If an alliance is configured,
the route cannot be published beyond the alliance.
NO_ADVERTISE (4294967042 or 0xFFFFFF02): the received route with

value cannot be published to the EBGP or IBGP peers.
LOCAL_AS (4294967043 or 0xFFFFFF03): the received route with the

value cannot be published to the EBGP peer or the peers of other AS in the
alliance.
4. Optional Nontransitive: not all BGPs support the optional nontransitive

attributes. If the attribute is not recognized by the BGP process, it
views the transitive tag. If the transitive tag is not set, the attribute is
ignored and is not transmitted to other BGP peers.
MULTI_EXIT_DISC (MED): used by BGP peers to distinguish multiple exits

to a adjacent AS. The lower the MED is, the higher is the route priority.
MED attributes are switched between AS systems. When the MED attribute
enters an AS, it does not leave the AS (nontransitive). This is different

from the processing of local priority. The external routing device may
affect the route selection of another AS. The local priority only affects the
route selection in the AS.
ORIGINATOR_ID: the attribute is used by the route reflector. The attribute

is a 32-bit value generated by the route originator. The value is the
routing device ID in the AS. If the originator finds its own router-id in the
received originator-id of the route, it knows that route loopback is
generated. Then, the route is ignored.
CLUSTER_LIST: the attribute is a list of the cluster ID of the route reflector

that the route passes. If the route reflector finds its own local cluster-id in
the received CLUSTER_LIST of the route, it knows that route loopback is
Network Layer Reachability: the variable long field contains the list of
reachable IP address prefix advertised by the sender.
Keepal i ve Message
The keepalive messages are exchanged between peers periodically to
check whether the peer is reachable.
N oti fication Message

When any error is detected, the notification message is sent. The BGP
connection is closed after the message is sent. Except the fixed BGP
message header, the notification message contains the following fields:
Error Code: one byte, the field indicates the error type.
ERROR SUBCODE: one byte, the field provides more details about the
error.
DATA: variable length field, the field contains the data related with the
error, for example, invalid message header, illegal AS number. The
following table lists the possible error codes and the error subcodes.
Table 11-8 BGP Notification message error code and error subcode
Error Code Error Subcode

1-Message header error 1-Connection not synchronized
2- Message length is invalid
3-Message type is not supported
2-Open message errors 1-Version numbers not supported
2-AS number of invalid peers
3-Invalid BGP identifiers
4-Not supported optional parameters
5-Authentication failed
6-Unacceptable hold time
7-Not supported capability
3-Update message error 1-Format of the attribute list is incorrect

2-well-known attribute cannot be recognized

3-Well-known attribute is lost
4-Attribute tag error
5-Attribute length error
6-Source attribute is invalid
7-AS route cycling
8- next-hop attribute is invalid
9-Optional attribute error
10-Network field is invalid
11-AS path format is incorrect
4-Hold timer timeout Not used
5-FSM error (errors detected by FSM) Not used
6-Stop (critical errors except the listed Not used
errors)
BG P Finite -State Machin e

Before the BGP peer can exchange the NLRI, one BGP connection must be
created. The creation and maintenance of the BGP connection can be
described in the FSM. The following provides the complete BGP FSM and
the input events causing the state change.
Figure 11-38 BGP FSM

Table 11-8-3 Input Events (IE)
IE Description
1 BGP starts
2 BGP ends
3 BGP transmission connection opens
4 BGP transmission connection is terminated
5 Fail to open the BGP transmission connection
6 BGP transmission fatal errors
7 Retrying connection timer times out
8 Duration time terminated
9 Keepalive timer terminated
10 Receive Open messages.
11 Receive Keepalive messages.
12 Receive update messages
13 Receive notification messages
Idle: initial status, the BGP is in the idle status until an operation triggers
a startup event. The startup event is usually triggered by the creation or
restart of BGP session.
Connect: BGP is waiting for the completeness of the transmission protocol

(TCP). If the connection succeeds, send the Open message, and enter the
status of sending open message. If the connection failed, move to the
active status. If the re-connecting the timer times out, it remains in the
connection status; the timer will be reset and one transmission connection
is started. If any other events occur, it returns to the idle status.
Active Status: in the status, BGP attempts to create a TCP connection with
the neighbor. If the connection succeeds, send the Open message, and
move to the status of sending open message. If re-connecting timer times
out, the BGP restarts the connection timer and goes back to the
connection status to monitor the connection from the peers.
OpenSent: in the status, the open message is sent. BGP is waiting for the
open message sent from the peers. Check the received open message. If
any error occurs, the system sends a notification message and goes back
to the idle status. If no error occurs, the BGP sends a keepalive message
to the peer and resets the keepalive timer.
OpenConfirm: in the status, BGP is waiting for a keepalive or notification

message. If a keepalive message is received, it enters the created status.
If a notification message is received, it goes back to the idle status. If the
hold timer times out before the keepalive message reaches, send a
notification message, and goes back to the idle status.
Established: the last phase of the neighbor negotiation. In the status, the
connection between BGP peers is established. Between peers, the update,
notification, and keepalive messages can be exchanged.

BG P Path Att ributes

The path attribute is a major feature of the BGP route. The path attribute
provides the necessary information about the basic route function and
allows the BGP to set and interconnect the route policy.
The route attribute can be one of the following:
Well-Known Mandatory;
Well-Known Discretionary;
Optional Transitive
Optional Non-Transitive;
Well-known mandatory: all BGP update messages contain the attribute,

and all BGPs can parse the messages containing the attributes.
Well-known discretionary: BGP update messages can contain the attribute,

Optional Transitive: BGP does not need to support the attribute, but it
should accept the path with the attribute and the paths should be
advertised.
Optional Non-Transitive: BGP does not need to support the attribute. If it

is not recognized, the update message with the attribute is ignored; the
path is not published to the peer.
The meaning of the common path attribute is as follows:
ORIGIN: Well-known mandatory, specifies the source of the update

message;
AS_PATH: Well-known mandatory; use the AS sequence to describe the

path between AS systems or the routes to the destination specified by the
NLRI.
NEXT_HOP: Well-known mandatory; describes the next-hop IP address of

the published destination path.
MULTI_EXIT_DISC: Optional non-transitive; allows one AS to notify the

first entrance point to another AS.
LOCAL_PREF: Well-known; the attribute is used to describe the first level

of the BGP device whose route has been published;
ATOMIC_AGGREGATE: well-known discretionary; used to warn the path

information loss in the downstream devices;

AGGREGATOR: Optional transitive, indicates the AS number and IP

address of the device launching the aggregation route;
COMMUNITY: Optional transitive, simplifies the implementation of policy;
ORIGINATOR_ID: Optional non-transitive, the route originator prevents

loopback by identifying the ID in the attribute;
CLUSTER_LIST: Optional non-transitive, the reflector prevents loopback by

identifying the ID in the attribute;
BG P Route Decis ion

BGP Path Decision Process
When multiple routes with the prefix of the same length and to the same
destination exist, BGP select the best route according to the following rules:
1. Next-hop unreachable route will be ignored;
2. Preferentially select the route with the maximum weight value;
3. Preferentially select the route with the maximum LOCAL_PREF value;
4. Preferentially select the route originated locally;
5. Preferentially select the route with the shortest AS_PATH;
6. Preferentially select the route with lowest ORIGIN attribute;
7. Preferentially select the route with the minimum MED value;
8. Preferentially select the route obtained through the EBGP, instead of

through IBGP;
9. Preferentially select the route whose next-hop has the minimum IGP
metric;
10. Preferentially select the first received EBGP route;
11. Preferentially select the route with the minimum BGP ROUTER-ID;
12. Preferentially select the route with shortest CLUSTER_LIST;
13. Preferentially select the route from the lowest neighbor address;
14. If the BGP load balancing is started, rules 10-13 are ignored. All routes
with the same AS_PATH length and MED values are installed in the
routing table.
Example of LOCAL_PREF and MED Preferential Selection

Figure 11-39 In the same condition, preferentially select the route with
higher LOCAL_PREF value
User AS100 obtains routes from ISP1 and ISP2. But ISP1 is the preferred
ISP. When the device connected to the ISP1 announces routes to the
switch-F, set the LOCAL_PREF value higher. For the same destination,
preferentially select the routes learned by ISP1 for its LOCAL_PREF value
is higher.
lower MED value
The two-host structure is used between a user and an ISP. The ISP prefers
to use LINK2 and use LINK1 as the backup. When the user publishes
routes to the ISP, the update packets with lower MED value are
transferred on LINK2. If the routes transferred on EBGP neighbor created

on LINK2 and LINK1 have no different options, the route with lower MED is
selected preferentially. As a result, the traffic of ISP enters ISP from LINK2.
R oute Fi ltering
Route filtering means that a BGP speaker can determine the sent route
and the received route from any BGP peers. Route filtering is to define the
route policy. The procedure is as follows:
1. Identify Routes
2. Allow or deny routes
3. Operation attributes
We can complete route filtering through access list, prefix list, or AS path
access list. We can also use the route mapping to implement filtering and
attribute operation.
R oute Ref lector

The route reflector is the centralized routing device or focus of all internal
BGP (IBGP) sessions. The peer routing device of the route reflector is
called route reflector customer. The customers match with route reflector
and exchange routing information. Then, the route reflector exchanges or
reflects the information to all other customers to eliminate the
requirements for the full interconnection environment. As a result, large
amount of money is saved.
The route reflector is recommended only in the large scale internal BGP
closed network. The route reflector increases the overhead of the route
reflector server. If the configuration is incorrect, the route may be cyclic or
unstable. Therefore, route reflector is not recommended in every topology.
All iance
The alliance is another method for processing the sharp increase of IBGP
closed network in the AS. Similar to the route reflector, the alliance is
recommended only in the large scale internal BGP closed network.
The concept of the alliance is put forward because one AS can be divided
into multiple sub-AS systems. In each sub-AS, all IBGP rules are
applicable. For example, all BGP routing devices in the sub-AS must form
a fully closed network. Each sub-AS has different AS number. Therefore,
external BGP must be run between them. Although the EBGP is used

between sub-AS systems, the route selection in the alliance is similar to

the IBGP route selection in a single AS. Namely, when the sub-AS boarder
is crossed, the next-hop, MED, and local priority information is reserved.
An alliance looks likes a single AS.
The defect of the alliance is: in the case of changing the plan from the
non-alliance to the alliance, the routing devices should be reconfigured
and the logical topology should be changed. In addition, if the BGP policy
is not manually set, you cannot select the best route through the alliance.
R ou te Da mping
Route damping (route attenuation) is a technology controlling the
unstability of routes. It significantly reduces the unstability caused by
route oscillation.
The route damping divides the route into normal performance and bad
performance. Routes with normal performance demonstrate long-term
high stability. In addition, the route with bad performance demonstrate
unstability in short term. The route with bad performance should be
punished with direct proportion to the expected route unstability. Unstable
routes should be suppressed until the route becomes stable.
The recent history of the route is the basis of evaluating the future
stability. To know the route history, first, you should know the swing times
of the route in certain period. In the route damping, when the route
swings, it is punished. When the punishment reaches a predefined limit,
the route is suppressed. After the route is suppressed, the route can
increase punishments. The more frequent the route swing is, the earlier
the route will be suppressed.
Similar rules are used to un-suppress the route and re-advertise the route.
An algorithm is used to exit (reduce) punishment according to the power
law. The basis of configuring the algorithm is the parameters defined by
users.
BG P G rac eful R estart

Principle of BGP Graceful Restart
After the route device becomes faulty, the neighbors in the BGP route
layer will detect that the neighborship becomes down and up, which is
called BGP neighbor oscillation. The oscillation of neighborship finally

causes the route oscillation. As a result, route blackhole occurs after the
routing device is restarted for a while or the data service of the neighbor
bypasses the restarted routing device. Consequently, the reliability of the
network is decreased.
The BGP graceful restart in the case of routing device failure prevents the
route disturbance and accelerates the route aggregation, which ensures
the network reliability.
Procedure for BGP Graceful Restart
Through BGP graceful restart, the following aspects are expanded:
1. In the BGP OPEN message, the graceful restart capability is added. The
fields are as follows:
Restart-flag: indicates whether the neighbor is restarted, 1: Yes; 0: No.
AFI/SAFI: the address family supporting graceful restart;
Fwd-flag: if an address family has the graceful restart capability, and

request for reserving the address family route, the value is 1. Otherwise,
the value is 0;
2. In the BGP update packets, add the EOR flag to indicate that the
update is complete.
3. Three timers are added
Restart-timer: Helper end is started, indicates that the reconstruction

session enters the longest waiting time of the GR flow
Stale-path-timer: Helper end is started, the longest time of reserving

routes;
Defer-timer: restarter end is started, the longest time of delaying

calculation and advertisement

Figure 11-41 Graceful restart flow
Restarter end (Switch-A):
1. At the beginning of creating neighbors, negotiate the GR capability

through the open message;
2. When any fault occurs, the forwarding layer of switch A reserves the
route and continue guiding the forwarding;
3. Re-construct the neighbor, send open messages. The restart-flag is set

to 1, which indicates that the restart is performed, notifying the
restart-time value and the reserved address family route to the
neighbors.
4. After the neighbor is restarted, start defer-timer to receive updates

from the neighbors.
5. Delay the route calculation until the EOR flag from the neighbor is
received or the deter-timer times out.
6. Calculate the route, update the core route and advertise the route.
Helper end (Switch-B):
1. At the beginning of creating neighbors, negotiate the GR capability,

and record that the neighbor has the GR capability.
2. After the restarter end becomes faulty, if any TCP error is detected,
run step 3, if no TCP error is detected, run step 4.
3. Reserve Routes; start the restart timer.

4. Re-construct neighbors and delete the restart timer. If the timer exists,
start the stale-path timer.
5. Before the creation, the restart timer times out, or the fwd-flag in the
corresponding address family of the open message is not 1, or the
corresponding address family information is not contained, run step 8.
6. Send routes to the restart routing device. Then, send EOR flag.
7. If the stale-path times out before the EOR is received, run step 8.
8. Delete the reserved route and then enter the normal BGP flow.

ACL Technology
This chapter describes the ACL technology and its application. The
configurations related with the ACL function in the switch include the
action group configuration, traffic meter configuration, and time range
configuration.
Main contents:
ACL introduction and application
Introduction to action group
Introduction to traffic meter
Introduction to time domain
ACL Introduction and Application

This section describes the basic concepts and application of the ACL
technology.
Main contents:
Basic concepts of ACL
ACL classification
Typical application
Basic Concepts of ACL

Access Control List (ACL) is the basic control mechanism of filtering traffic
on the switch. ACL is the traffic filter and can identify the specified types of
traffic according to the packet attributes, such as IP address and port

number. After identifying the traffic, ACL can execute the specified
operations, such as prevent them from passing one interface.
ACL comprises a series of rules. Each rule is used to match one specified
type of traffic. The serial number of the rule (Sequence) decides the
location of the rule in the ACL. ACL checks the packets according to the
rule sequence from small to large. The first rule that matches with the
packet in the ACL decides the processing result for the packet, permit or
deny. If there is no rule to match the packet, the packet is denied, that is
to say, the packets that are not permitted are denied. This shows that the
rule order is important.
The following example defines one IP standard access list.
(config)# ip access-list standard 1

(config-std-nacl)# 10 permit 36.48.3.0 0.0.0.255
(config-std-nacl)# 20 deny 36.48.0.0 0.0.255.255
(config-std-nacl)# 30 permit 36.0.0.0 0.255.255.255
(config-std-nacl)# exit
The following figure shows the access authority of the ACL segments. The
action of the shadow part is deny and the action of the white part is permit.
The partition diagram of standard ACL segments
After the last rule (that is, after the above rule 30), there is one hidden
rule deny any. The serial number of the rule is larger than those of all
rules in the ACL. The hidden rule is invisible and denies all packets that do

not match with the previous rules. To make the hidden rule not take effect,
you need to configure one rule permit any manually to permit the
packets that do not match all other rules to pass.
ACL Classification
According to the usage of the ACL, ACL can be divided to six types:
IP standard ACL
IP standard ACL
MAC standard ACL
MAC extended ACL
IPV6 ACL
I P Sta ndard AC L
The IP standard ACL makes the rules only according to the source address
of the packet, so as to analyze and process the packet. For example, the
following standard IP ACL denies the packets sent from the host
171.69.198.102, but permits the packets sent from other hosts.
(config)# ip access-list standard 1

(config-std-nacl)# 10 deny host 171.69.198.102
(config-std-nacl)# 20 permit any
(config-std-nacl)# exit
I P Exte nded AC L
The IP extended ACL filters the packets according to the IP upper-layer
protocol number, source IP address, destination IP address, source
TCP/UDP port number, destination TCP/UDP port number, TCP flag, ICMP
message type and code, and TOS priority. For example, the following IP
extended ACL denies the telnet packets sent from 171.69.198.0/24 to
171.69.198.0/24, but permits other TCP packets.
(config)# ip access-list extended 1001

(config-ext-nacl)# 10 deny tcp 171.69.198.0 0.0.0.255 172.20.52.0 0.0.0.255 eq
telnet
(config-ext-nacl)# 20 permit tcp any any
(config-ext-nacl)# exit

M AC Standard AC L
MAC standard ACL makes the rules according to the source MAC address
of the Ethernet packet, so as to analyze and process the packet.
M AC Ext ended ACL

MAC extended ACL makes rules according to the source MAC address,
destination MAC address, 802.1P priority, VLAN ID, and Ethernet type of
the Ethernet packet, so as to analyze and process the packet.
H yb rid ACL
The Hybrid ACL can filter packets according to IP protocol number, source
IP address, source MAC address, DSCP, VLAN and so on.
I PV6 ACL
The IPV6 extended ACL filters the packets according to the IPV6 upper-
layer protocol number, source IP address, destination IP address, source
TCP/UDP port number, destination TCP/UDP port number, and TOS priority.
For example, the following IPV6 ACL permits the IPV6 packets sent from
the host 1:2:3:4::5.
#ipv6 access-list extended 7001

switch(config-v6-list)#permit ipv6 host 1:2:3:4::5 any
switch(config-v6-list)#
Typical Application
One basic function of ACL is used to limit the access for the network
resources, that is, one group of limited IP addresses access one group of
limited services. The most common used method of using ACL to control
the access authority is to create ACL to permit only the legal traffic to pass,
but prevent all illegal and un-authorized traffic. The following adopts one
example to describes the ACL function.
Application requirement:
In the intranet of one company, the port 0/0 of the switch is connected to
the news server and finance server; port 0/1 of the switch is connected to
the marketing department; port 0/2 of the switch is connected to the

accounting department; it is required that only the accounting department

(the segment range is 172.20.128.64-95) can access the finance server
and the marketing department (the segment range is 172.20.128.0-31)
cannot access the finance server, but the accounting department and
marketing department both can access the news server.
Network topology:
The example networking of using ACL to prevent the un-authorized access
1. Create the extended IP ACL 1001; permit all packets to reach the
news server via port 0/0; only permit the packets sent from the
accounting department to reach the finance server via port 0/0.
switch(config)# ip access-list extended 1001

switch (config-ext-nacl)# permit ip any host 171.23.55.33
switch (config-ext-nacl)# permit ip 172.20.128.64 0.0.0.31 host 171.23.55.34
switch (config-ext-nacl)# exit
2. Apply the ACL 1001 at the input direction of port 0/1 and port 0/2.
switch (config)# port 0/1-0/2

switch (config-port-range)# ip access-group 1001 in

switch (config-port-range)# exit
Introduction to Action Group

To support the packet classification and traffic control, the switch extends
the traditional ACL so that the ACL and each permit rule in the ACL can be
bound with one action group. Take the corresponding action for the
matching packet. The action group is the set of actions. One action group
can contain packet mirroring, packet re-direction, packet modification,
packet traffic control, and packet counting. Each entry of the ACL can be
bound to one action group. Execute the corresponding action for the
matching packet. The action group cannot be bound with the permit rule.
Introduction to IP+MAC Binding

To ensure that the user IP address can be used after being embezzled by
other users, you can bind user IP+ User MAC to protect the user security.
If other user adopts the bound IP address after binding user IP and MAC, it
is regarded as the illegal user and is not permitted to access any resources.
Introduction to Traffic Meter

Main contents:
Related terms of traffic meter
Introduction to traffic meter
Related Terms
SRTCM (Single Rate Three Color Marker): It is defined in RFC2697. Use
the three parameters (CIR, CBS, and EBS) to realize the single rate control
and packet coloring function. It includes colorblind mode and color
sensing mode;
TRTCM (Two Rate Three Color Marker): It is defined in RFC2698. Use CIR,
CBS, PIR, and PBS to realize the two rate control and the coloring for
packets. It includes the colorblind mode and color sensing mode;
CIR: Committed Information Rate;

CBS: Committed Burst Size;
EBS: Excess Burst Size ;
PIR: Peak Information Rate;
PBS: Peak Burst Size;
Introduction to Traffic Meter

To support the packet traffic control, you can specify one meter name in
the action group.
The meter supports two modes, including SRTCM and TRTCM. The function
of the meter is to remark or drop the packet according to the traffic. The
meter has the processing action for the colored packet. When being
configured as drop the colored packet, it is used to complete the packet
traffic limitation function; when being configured as remark the colored
packet, it is used to complete the packet classification according to the
traffic so that the user takes different QoS policies in the later data path.
After the meter is configured to color the packets, the counter in the
action group can count the packets.
Introduction to Time Domain

Main contents:
Related terms
Related Terms
Time domain: It is the set of the time periods. One time domain can
contain 0 to multiple time periods. The time range of the time domain is
the union set of the time periods.
Periodical time period: take the week as reference;
Absolute time period: Take year, month and day as reference;


Time domain is to support the control for the different access at different
time. The time domain can be bound with the ACL or the rules in the ACL.
The ACL or rules bound to the time domain takes effect in the range of the
time domain.

QoS Technology
This chapter describes the port-based QoS technology and the applications.
Main contents:
Priority mapping
Queue scheduling mode
Dropping mode
Rate restriction
Flow shaping
Set broadcast frame shielding
Priority Mapping
This section describes the theory of the priority mapping.
Main contents:
Related terms
Introduction to Priority Mapping
Typical application
Related Terms
802.1p priority: The 8021.p priority is located in the L2 packet header. It
is used when there is no need to analyze the L3 packet header, but need
to ensure QoS in L2 environment. As shown in Figure 13-1, the 4-byte
802.1Q header contains 2-byte TPID (Tag Protocol Identifier valued as

0x8100) and 2-byte TCI (Tag Control Information). The following figure
shows the detailed contents of the 802.1Q header.
Ethernet frame with 802.1Q header
802.1Q header
As shown in Figure 13-2, the Priority field in TCI is the 802.1p priority. It
comprises three bits and the value range is 0-7.
It is called 802.1p priority, because the application of the priority is

defined in detail in 802.1p standards.
DSCP priority: RFC2474 defines the ToS domain of the IP packet header
called DS field. Here, the first six bits indicates the Differentiated Services
Code Point (DSCP) and the value range is 0-63. The later two bits are
reserved, as shown in Figure 13-3.
DS field

Local Priority: It is the priority with the local meaning distributed by the
switch to the packet. By default, it corresponds with the cos queue as the
intermediary role of DSCP or 802.1p priority to the cos queue.
Introduction to Priority Mapping

Maipu series switch supports five types of priority mapping:
Map the DSCP of the packet to the local priority;
Re-tag the DSCP value of the packet according to the DSCP value of
the packet;
Map the 802.1p priority of the packet to the local priority;
Map the egress 802.1p priority of the packet according to the local
priority of the packet;
Map the egress dscp priority of the packet according to the local
priority of the packet;
After the packet enters into the switch, map to the local priority according
to the 802.1p priority or DSCP, and then to the cos queue. Meanwhile,
configure the DSCP to the local priority mapping and 802.1p priority to the
local priority. The former has higher priority (that is, the mapping from the
DSCP to the local priority takes effect).
Queue Scheduling Mode

This section describes the scheduling mode based on port queue.
Main contents:
Related terms
Introduction to queue scheduling mode
Typical application
Related Terms
SP (Strict Priority): It is one of queue scheduling algorithms. SP sends
the packets in the queue strictly according to the priority order from high
to low. When the queue with high priority is empty, send the packets in

the queue with lower priority. Queue 7 has the highest priority and queue
0 has the lowest priority.
RR (Round Robin): It is the packet-based fair scheduling. After one

queue schedules one packet, turn to the next queue.
WRR (Weighted Round Robin): It is the weighted scheduling based on

packet. You can configure the number of the packets scheduled by each
queue before turning to the next queue. When it is configured as 0, it
means SP.
WDRR (Weighted Deficit Round Robin): The algorithm is based on two

variables, that is, quantum and credit counter).The quantum means the
weight in the unit of byte and it is a configurable parameter. The credit
counter means the accumulation and consumption of the quantum, which
is a status parameter and cannot be configured. In the initial state, the
credit counter of each queue is equal to the quantum. Every time the
queue sends a packet, subtract the byte number of the packet from the
credit counter. When the credit counter is lower than 0, stop the
scheduling of the queue. When all queues stop scheduling, supplement
quantum for all queues. The value range of the weight N is 0-127. When
the weight is N, it means that quantum is (N*MTU_QUANTA) bytes
(MTU_QUANTA is 2K bytes). When N is 0, it means strict priority.
Introduction to Queue Scheduling Mode

Each port has eight output queues and can adopt the SP, RR, WRR, and
WDRR scheduling policies.
Ke y Poin ts of Q ueue Schedul ing Mo de

When configuring the weight of one queue as 0 in WRR and WDRR, it
means that the queue schedules according to the strict priority, that is,
the queue has the highest priority.

Typical Application
Scheduling mode
Illustration
The devices in the LAN are connected to the outer network via port 0/1 of
the switch. The packets sent by the devices in the LAN are mapped to the
output queue of port 0/1 according to the rules such as priority mapping.
Suppose the packets that queues 0, 6, and 7 are to send have high real-
time requirement and the other queues have the same priority. You can
configure port 0/1 to schedule by WRR and the weight of queues 0, 6, and
7 as 0. Therefore, the three queues schedule by strict priority and forward
packets first.
Drop Mode
This section describes the drop mode of the port.
Main contents:
Related terms
Introduction to drop mode
Typical application
Related Terms
SRED: Simple random early detection

Introduction to Drop Mode

In the drop mode of SWRED, the start point of the queue dropping the
packet is labeled as StartPoint and the end point is labeled as EndPoint.
When the average length of the queue is between StartPoint and EndPoint,
SWRED drops the packets randomly according to the drop rate; when the
queue length exceeds EndPoint, drop the packets by 100%; when the
queue length is smaller than StartPoint, SWRED does not drop this kind of
packets.
Typical Application
Drop mode
Illustration
The devices in the LAN are connected to the outer network via port 0/1 of
the switch. The packets sent by the devices in the LAN are mapped to the
output queue of port 0/1 according to the rules such as priority mapping.
By default, when the network is blocked, drop the excessive packets,
which is unfair to the later packets. Therefore, configure the SWRED drop
mode on the port, that is, drop the packets according to the rate before
the network is blocked.
Speed Restriction
The port-based input direction provides the speed restriction with
granularity as 64Kbit/s. The overspeed flow is dropped. The configured
parameters are bandwidth threshold (Kbit; 64K is the minimum
granularity) and burst flow (byte). The granularity of the burst flow is 4K
bytes. Use the port speed restriction to make the flow entering the
network with an even speed, preventing the network blocking from the
headstream.

Flow Shaping
The flow shaping has two kinds:
Port-based flow shaping
The port-based flow shaping at the output direction makes the packets be
sent out with an even speed. The configured parameters are bandwidth
threshold (Kbit; 64K is the minimum granularity) and burst flow (byte).
The granularity of the burst flow is 4K bytes.
Flow shaping based on port queue
The output flow shaping based on the port queue makes packets be sent
out with an even speed. The configured parameters are queue number,
commitment information speed, commitment burst size, peak burst size,
and peak information rate. Here, the granularities of both the commitment
information speed and peak information speed are 64kbit/s; the
granularities of both commitment burst and peak burst size are 4k bytes.
The switches classifies the queue to three types according to the relation
between the queue flow size and cir/pir, that is, first schedule the queue
with less than cir flow, then the queue with the flow between cir and pir,
and at last the queue with more than pir flow.
VLAN-based Traffic Shaping

VLAN-based traffic shaping is to map the data flow of some VLAN to 16
virtual queues, and then schedule and shape the 16 virtual queues.
The following is the principle of realizing the VLAN queue shaping.
After the packet enters the switch, enter the corresponding virtual queue
according to the VLAN number of the packet. On the virtual queue, the
queue scheduling and shaping can be realized. After VLAN queue shaping,
the traffic enters queue 9 of the port.

Set Broadcast Frame Shielding

The unknown unicast frames, unknown multicast frames and broadcast
frames are broadcasted in VLAN. In some applications, the ports do not
need to send the packets. Enable the broadcast frame shielding on the
port and then the port does not send the packets.

AAA Technology
This chapter describes the AAA security service theory, RADIUS and
TACACS protocols, the ID authentication mechanism of MP series router,
and the common used debug commands and displayed debug information.
Main contents:
AAA terms
Basic theory of AAA
Introduction to RADIUS protocol
Introduction to TACACS protocol
Introduction to ID authentication mechanism
AAA Terms
AAA: It is short for Authentication, Authorization and Accounting. It
provides one consistency frame used to configure the three kinds of
security functions. In fact, AAA configuration is to manage the network
security. Here, the network security mainly refers to the access control,
including:
1. Which users can access the network server?
2. Which services does the user with access authority have?
3. How to charge the user that is using the network resources?
NAS: It is short for Network Access Server. Enable the AAA security
services on the router as NAS. When the users wants to set up the
connection with NAS via one network (such as telephone network), so as
to get the authority of accessing other networks (or get the authority of
using some network resources), NAS is used to identify the user (or the
connection).

Method list: It defines one ID authentication method sequence to be

queried in turn, so as to authenticate the user ID.
RADIUS: It is short for Remote Authentication Dial In User Service,

defined by RFC 2865 and 2866.
TACACS: It is short for Terminal Access Controller Access Control System.
Basic Theory of AAA

AAA enables you to dynamically configure the ID authentication and
authorization type for one single line (single user) or single server (such
as IP, IPX or VPDN). It creates the method list to define the ID
authentication and authorization type and then apply the method list to
the specified service or interface.
AAA uses the protocols (such as RADIUS and TACACS) to manage its
security function. AAA sets up the communication between NAS and
RADIUS, TACACS security server. Besides, the local user name, line
password and valid password can be used as the ID authentication method
of the access control.

As shown in the above figure, suppose that one method list is defined on
NAS. In the list, R1 is first used to get the ID authentication information,
then R2, T1, T2, and at last, the local user name database on NAS. If one
remote user tries to dial to the network, NAS first queries the ID
authentication information from R1. Suppose that the user passes the ID
authentication of R1, R1 sends out one PASS response to the network
access server. In this way, the user gets the authority of acing the
network. If R1 returns the FAIL response, the user is denied to access the
network and the session is ended. If R1 has no response, NAS regards it
as ERROR and queries the ID authentication information from R2. This
mode keeps in the following specified methods until the user passes the ID
authentication, is denied or the session is ended.
Note
NAS tries the next method only when the previous method has no
response. If the ID authentication fails at one point of the period, that is,
the security server or local user name database responds by denying the
user access, the ID authentication ends and do not try other ID
authentication method any more.
Introduction to RADIUS
RADIUS is one UDP-based customer/server protocol. NAS serves as the
RADIUS client machine, but RADIUS server is the background process that
runs on the UNIX or Windows NT host.
RADIUS packet exists in the data domain of the UDP packet. The length is
variable. The domain attribute varies with the RADIUS packet type. The
following is the structure of the RADIUS packet.
The figures in the brackets mean the number of the bytes.

Code field
1-authentication request packet (Access-Request)
2-authentication pass packet (Access-Accept)
3-authentication failure packet (Access-Reject)
4-accounting request packet (Accounting-Request)
5-accounting response packet (Accounting-Response)
Identifier field
Identifier is used to match the request and response packet.
Length field
It is the total length of the packet.
Authenticator field
1-Request Authenticator
In the Access-Request packet, it is one random 16-byte number;
In the Accounting-Request packet, it is the following hash value:
RequestAuth = MD5(Code+ID+Length+16 Octets Zero+Attributes+Secret)
Here, Secret is the key shared by NAS and server.
2-Response Authenticator
In the Access-Accept, Access-Reject, and Accounting-Response packets, it

is the following hash value:
ResponseAuth = MD5(Code+ID+Length+RequestAuth+Attributes+Secret)
Attribute field
The Attribute field carries the specified authentication, authorization,

information and configuration details of RADIUS request and response.
Attribute can have multiple instances its format is as follows:

0 1 2
Type Length Value
The Type field indicates the Attribute type.
The Length field indicates the length of the whole Attribute, including Type,
Length and Value.
The Value field is 0 or multiple bytes, including the specified Attribute

information. The format and length of Value depend on the Type and
Length.
The following lists several common Attributes:
Attribute Type Data Type Attribute Length

User-Name 1 String Length >=3
User-Password 2 String 18<=Length<=130
NAS-IP-Address 4 Address Length=6
Service-Type 6 Integer Length=6
Reply-Message 18 String Length>=3
Acct-Status-Type 40 Integer Length = 6
Introduction to TACACS
TACACS provides the authentication, authorization and accounting services.
TACACS adopts the TCP packet to transmit the data and uses the port 49
to receive the TCP packet. The format of the TACACS packet header is as
follows. The packet header always adopts the plaintext to transmit.
major version field
It is the major version number.
minor version field
It is the minor version number.

type field
It is the packet type, indicating authentication, authorization or accounting.
1-authentication
2-authorization
3-accounting
seq_no field
It is the serial number of the packet.
flags field
It is the flag. The lowest bit indicates whether the packet is encrypted.
session_id field
It is the session ID. It is one random 4-byte number. The ID does not
change in one session.
length field
It is the length of the packet body (excluding the packet head).
What is near to the packet head is the authentication, authorization or

accounting packet. All are encrypted.
The authentication has three types of packets, including START, REPLY,

and CONTINUE. START and CONTINUE are sent by the customer and
REPLY is sent by the server.
The authorization session uses one pair of packet REQUSET and

RESPONSSE to complete the authorization; the accounting session adopts
one pair of packet REQUSET and REPLY and carries the specified attributes
in the packet.

Introduction to ID Authentication
Mechanism
Login Authentication
1. If AAA is not configured and Line is not configured, the login via
console port or telnet directly pass the authentication; for SSH, you
should use the local login.
2. If AAA is not configured, but Line is configured, authenticate according

to the Line configuration.
Authentication Type Description

Configured on Line
no login Pass
Login (the default login Authenticate according to the line password.
mode of telnet) If the line password is not configured, log in via the console port
and pass the authentication;
For telnet and ssh login, do not pass the authentication.
(Note If the line password is not configured, the login fails.)
login local (the default login Authenticate according to the local password.
mode of ssh) (Note If the local user is not configured, the login fails.)
3. Configure AAA
Authenticate according to the configured method list. One method list

supports 4-6 authentication methods, but four authentication methods can
be configured at most.
When the user logs in via the interface or line, the system authenticates
the ID according to the method list referenced by the interface or line. If
the interface or line does not reference ant method list or the referenced
method is not defined, the system uses the default method list to
authenticate the ID; if the default method list is not configured, adopt the
default method to authenticate.
For the login via console port, the default method is none; for telnet and
ssh login, the default method is local.
If the user adopts the valid user name to log in, it is not required to input
the user name when authenticating ID in the privileged mode and you just
need to input the desired password.

Authenticate in Privileged Mode

1. AAA is not configured.
Use the enable password to authenticate:
If the login user has the enable password, authenticate according to the
password;
Otherwise, if there is the global enable password, authenticate according

to the global enable password;
If there is no any enable password, the user that logs in via the console
port directly passes the authentication, but the telnet user does not pass
the authentication.
2. Configure AAA
Authenticate according to the configured default method list. The method

list supports four authentication methods.
After the user logs into the router, request entering the privileged mode.
The system authenticates the ID according to the default method list; if
the method list does not exist, adopt the default method to authenticate:
For the login via the console port, the default method list is enable none;
For the telnet and ssh login, the default method is enable.

EIPS Technology
EIPS is a link layer protocol especially applied in Ethernet ring. It can

prevent the broadcast storm caused by the data loop. When a link on the
Ethernet ring is disconnected, the standby link can be enabled rapidly to
recover the communication between the nodes on the ring network.
Compared with STP protocol, EIPS has the features that the topology
aggregation speed is fast (lower than 50ms) and the aggregation time is
not related with the nodes on the ring network.
The EIPS technology supports two modes. One is sub ring mode. When
processing the intersecting rings, de-compound the two intersecting rings
to one master ring and one sub ring; there is one public link between the
master ring and the sub ring. The other mode is called hierarchical mode.
When processing the two intersecting rings, choose one ring as the master
ring. After removing the public link with the master ring, the ring
connected to the master ring becomes the low-level link connected to the
master ring.
Sub Ring Mode EIPS

Main contents:
Basic concepts of EIPS
EIPS packet format
Basic theory of EIPS
Typical application of EIPS
Basic Concepts of EIPS

EIPS (Ethernet Intelligent Protection Switchover): IETF defines the
auto protection switchover standard of Ethernet ring in RFC3619 (2003.10
information), indicating that the auto protection switchover mechanism is
performed in the Ethernet ring.

EIPS domain: The EIPS domain is identified by the integer ID. A group of
switches that are configured with the same domain ID and are
interconnected form one EIPS domain. The EIPS domain comprises EIPS
ring, EIPS control VLAN, master node, transmission node, edge node and
assistant edge node.
EIPS ring: The EIPS ring is identified by the integer ID. It physically
corresponds with one ring Ethernet topology. Each EIPS ring is one local
unit of the EIPS domain. The EIPS protocol takes effect on the EIPS ring.
The EIPS rings in the EIPS domain are divided to master ring and rub ring.
In one EIPS domain, there is only one master ring, but there can be one or
multiple sub rings. The sub ring intersects with the upper ring via the edge
node and the assistant edge node.
EIPS master ring: It is the EIPS ring with level as 0.
EIPS sub ring: It is the EIPS ring whose level is larger than 0.
EIPS control VLAN: It is relative to the data VLAN. In the EIPS domain,
the control VLAN can only be used to transmit the EIPS protocol packets.
Each EIPS ring has one control VLAN. The master ring protocol packets are
transmitted in the master control VLAN. The sub ring protocol packets are
transmitted in the sub control VLAN. It is not permitted to configure the IP
address on the master control VLAN and sub control VLAN interfaces. The
port connected to the Ethernet ring on the switch belongs to the control
VLAN and only the port connected to the Ethernet ring can be added to the
control VLAN. The port on the master ring belongs to the master control
VLAN and the sub control VLAN. The port on the sub ring only belongs to
the sub control VLAN. The whole master ring is regarded as one logical
node of the sub ring. The EIPS protocol packets of the sub ring are
transmitted transparently as the user packets of the master ring. The EIPS
protocol packets of the master ring do not enter the sub ring, but are only
transmitted in the master ring.
EIPS node: each switch on the EIPS ring is one node on the EIPS ring.
The nodes on one ring have the same EIPS domain ID and the EIPS ring
ID. Each EIPS node has two EIPS ports connected to the EIPS ring, which
are specified as the master port and standby port by the user during the
configuration.
Master node: The master node is the initiator of polling the status of the
ring network (the master node sends HEALTH packets periodically from
the master and standby ports. If at least one port can receive the packet
from another port, it indicates that the ring is complete. If the HEALTH
packet cannot be received for a long time, it is regarded that the ring fails).
The master node is also the decider of executing the operation after the
network topology status changes.
The master node has the following three states:
Complete State:

When all links on the ring network are in the UP state, the master node
can receive the HEALTH packet sent by itself from the standby port, which
indicates that the master node is in the complete state. The status of the
master node reflects the status of the EIPS ring. Therefore, EIPS ring is
also in the complete state. Here, the master node blocks the standby port,
so as to prevent the packets from forming the broadcast loop on the ring
topology.
Failed State:
When all links on the ring network are in Down state, it indicates that
master node is in the Failed state. Here, the master node enables the
standby port to ensure that the communication between the nodes on the
ring network is not interrupted.
PRE-UP State:
When the master node is in the failed state, it first turns to the Pre-up
state after receiving the HEALTH packet. If it still can receive the HEALTH
packets within a period, it turns to the complete state. This is to prevent
the network flap.
Transmission node: Besides the master node, there are all transmission
nodes on the EIPS ring. The transmission node is responsible for
monitoring the status of the direct-connected link and reporting the status
change to the master node via the EIPS protocol packet, and then the
master node decides how to process. The two transmission nodes
intersecting with the master ring on the sub ring are divided to edge node
and assistant edge node (there is only the transmission node on the
master ring; the edge node and assistant edge node are just on the sub
ring). If the transmission node on the master ring has the public port with
the edge node of the sub ring, it needs to send the sub ring protocol
channel status detection packet on its port. If the transmission node on
the master ring has the public port with the assistant edge node of the sub
ring, it needs to transmit the received sub ring protocol channel status
detection packet to the corresponding assistant edge node.
The transmission node has the following three states:
Link-Up State (UP state):
The master port and standby port of the transmission node are both in the
up state. The transmission node is in the Link-Up state.
Link-Down State (Down state):
When the master port or standby port of the transmission node is in the
Down state, the transmission node is in the Link-Down state. When the
transmission node in the Link-up state finds that the master port or
standby port is in the Link-Down state, it turns from the Link-Up state to

the Link-Down state and informs the master node by sending the Link-
Down packet.
Preforwarding State (temporary blocked state):
The transmission node cannot directly return to the Link-Up state from the
Link-Down state. When one port of the transmission node in the Link-
Down state is in the Link-Up state, and then the master port and standby
port recover to the Up state, the transmission node turns to the
Preforwarding state and blocks the last recovered port. At the moment
when the master port and standby port of the transmission node recover,
the master node cannot get to known the message at once, while the
standby port is still in the enabled state. If the transmission node returns
to the Link-UP state at once, the packets form the broadcast loop on the
ring network. Therefore, the transmission node first turns from the Link-
Down state to the Preforwarding state.
When the transmission node in the Preforwarding state receives the

COMPLETE-FLUSH-FDB packet sent by the master node, it turns to the
Link-Up state. If the COMPLETE-FLUSH-FDB packet is lost during
transmission, the EIPSA protocol provides one backup mechanism to
recover the temporary-blocked port and trigger the status switchover, that
is, if the transmission node does not receive the COMPLETE-FLUSH-FDB
packet in the specified time, it automatically turns to the Link-Up state and
enables the temporary-blocked port.
Edge node and assistant edge node: The edge node and assistant edge
node are used to detect the status of the sub ring protocol packet channel
in the master ring. The edge node is the initiator of the detection
mechanism, the assistant edge node judges the channel status and
reports to the edge node, and at last, the edge node makes decision
according to the channel status.
The edge node and the assistant node are both the special transmission
node, so they have the same three state as the transmission node, but the
meanings are a little different, as follows:
Link-Up State (UP state):
When the edge port is in the UP state, it indicates that the edge node
(assistant edge node) is in the Link-Up state.
Link-Down State (Down state):
When the edge port is in the Down state, it indicates that the edge node
(assistant edge node) is in the Link-Down state.
Preforwarding State (temporary-blocked state):
The transferring of the edge node (assistant edge node) status is basically
the same as the transmission node. The difference is that when the port

link statues change results in the status transferring of the edge node
(assistant edge node), it only depends on the status of the edge port
(refer to the previous introduction of the edge node status).
The edge node and the assistant edge node is the two main bodies of the
mechanism of detecting the sub ring protocol packet channel status in the
master ring. The edge node is the initiator of the mechanism, the assistant
edge node judges the channel status and reports to the edge node, and at
last, the edge node makes decision according to the channel status. The
mechanism is described in details later.
EIPS port: EIPS port is one abstract concept, corresponding to one of the
links that form the EIPS ring. The link can be one single physical link or
the aggregation link formed by multiple physical links. On each EIPS node,
there are always two ports connected to the EIPS ring. The EIPS rings may
intersect, so one EIPS port may belong to multiple EIPS nodes.
EIPS master port and EIPS standby port: The ports on the master
node and the common transmission node (non-edge node and assistant
edge node) are divided to master port and standby port. For the master
node, when the loop is complete, the user data VLAN of the standby port
needs to be blocked; for the transmission node, the master port and
standby port do not have special meaning.
EIPS public port and EIPS edge port: The ports on the edge
transmission node and assistant edge transmission node are divided to
public port and edge port. The public port is the port connected to the
public link of two intersecting rings and belongs to multiple EIPS rings. The
edge port only belongs to one sub ring. When the public port fails, do not
need to report to the master node of the sub ring, but only need to report
to the master node of the master ring.
EIPS Packet Format

The protocol frame format of the Ethernet ring protection protocol is as
follows:
Table 15-1 EIPS packet format
0 15 16 31 32 47
Destination MAC address (6 bytes)
Source MAC address (6 bytes)
Type (Ether Type) (TPID) PRI + CFI + VLAN ID Frame Length

DSAP/SSAP CONTROL OUI = 0x00E02B

0x00BB 0x99 0x0B ERP_LENGTH
ERP_VER ERP_TYPE Domain_ID Ring_ID
0x0000 SYSTEM_MAC_ADDR (high 4 bytes)
Low 2 bytes HEALTH_TIMER FAIL_TIMER
STATE 0x00 HEALTH_SEQ 0x0000
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
The description of the frame format:
Destination MAC address: 48bits
Table 15-2 The description of the destination MAC address
Destination MAC Description

0180.6307.0000 1. The destination MAC of the HEALTH packet, sent out by
the master and standby ports of the master node,
passing all transmission nodes or common L2 switches;
the transmission node only forwards the HEALTH packet,
but does not send it to CPU. The ports of the master
node receive the HEALTH packet;
2. The destination MAC address of the LINK-DOWN packet,
initiated by the transmission node, edge node or
assistant edge node; inform the master node when the
links of the nodes change;
3. The destination MAC address of the ASK-RING-STATE
packet.
0180.6307.0002 The destination MAC of the COMM-FLUSH-FDB/COMP-FLUSH-
FDB packet. The packet is initiated by the master node. The
transmission node forwards the packet and sends it to CPU;
the master node does not forward it, but just sends it to CPU.
0001.7A4F.4826 1. The destination MAC address of the EDGE-HEALTH
packet. The packet detects the master ring link between
the edge node and assistant edge node;
2. The destination MAC of the MAJOR-FAULT packet. It is
initiated by the assistant edge node. When the master
ring link between the edge node and assistant edge node

is disconnected, inform the edge node that the master

ring link fails;
3. The destination MAC address of the MAJOR-RESUME
packet. It is initiated by the assistant edge node. When
receiving the EDGE-HEALTH packet of the edge node
again, inform the edge node that the link is recovered.
0001.7A4F.4AB6 Topology request packet
0001.7A4F.4AB4 Uni-directional detection packet
0001.7A4F.4AB5 The HELLO1 packet sent after the standby node does not
receive the Hello packets of the master node within some
time.
Source MAC address: 48bits, the MAC address of the sending node;
TPID: 8 bits, fixed as 0x8100;
PRI+CFI: 4bits, not defined, the priority can be defined (7 is

recommended by default), the standard format frame with CFI as 0;
VLAN ID: 16bits, not defined;
Frame Length: 16bits, the length of the Ethernet frame, fixed as 0x48;
DSAP/SSAP: 16bits, fixed as 0xAAAA;
CONTROL: 8bits, fixed as 0x03;
OUI: 24bits, fixed as 0x00E02B;
ERP_LENGTH: 16bits, fixed as 0x40;
ERP_VERS: 16bits, fixed as 0x0001;
ERP_TYPE: 16bits, the frame type;
Domain_ID: 16bits, the domain ID;
Ring_ID: 16bits, the ring ID;
SYSTEM_MAC_ADDR: 48bits, the MAC address of the sending node;
HEALTH_TIMER: 16bits, the period of sending the HEALTH frames set

by the master node and edge control node (the unit is 16ms);
FAIL_TIMER: 16bits, the timeout of not receiving the HEALTH frames

set by the master node and edge control node (the unit is 16ms);
STATE: 8bits, the node status;
HEALTH_SEQ: 16bits, the serial number of the HEALTH frame,

generated by the maser node;

ERP_TYPE: the packet type, defined as follows:
Table 15-3 The definition of packet type
Packet Type Value Description
5 The packet is initiated by the master node, detecting

HEALTH packet
the loop integrality for the network.
6 The packet is initiated by the master node. When the
EIPS ring turns to the HEALTH state, inform the
COMP-FLUSH-FDB
transmission node to update the MAC entries and
packet
inform the transmission node to un-block the
temporary-blocked port.
7 The packet is initiated by the master node. When the
COMM-FLUSH- EIPS ring turns to the DWON state, inform the
FDB transmission node to update the MAC entries or when
packet the transmission node has one port in the Link-Down
state, initiate the packet, too.
8 The packet initiated by the transmission node, edge
LINK-DOWN node or assistant edge node. When the links of the
packet nodes are down, inform the master node that the loop
disappears.
9 When the port is up and the ring status is not
ASK-RING-STATE confirmed, the non-master node queries the current
ring status from the master node.
10 The packet is initiated by the edge node, detecting the
EDGE-HEALTH
master ring link between the edge node and the
packet
assistant edge node.
11 The packet is initiated by the assistant edge node.
MAJOR-FAULT When the master ring link between the edge node and
packet the assistant edge node is disconnected, inform the
edge node that the master ring link fails.
12 After the assistant edge node finds that the master ring
MAJOR-RESUME fault recovers, inform the edge node that the master
ring fault recovers.
LINK-HELLO 14 The uni-directional detection packet
15 The topology collection packet, including the topology
TOPOLOGY
request and topology response packets.
Basic Theory of EIPS

Basis o f EI PS Protoco l
All nodes on each domain are configured with the same EIPS domain
ID;
The master ring protocol packets are broadcasted in the main control
VLAN; the sub ring protocol packets are broadcasted in the sub control
VLAN;
The EIPS ports on the master ring node are added to the main control
VLAN and sub control VLAN; the EIPS ports on the sub ring are only
added to the sub control VLAN;

The protocol packets of the sub ring are processed as the packets in
the master ring, being blocked/enabled synchronously with the
packets;
Poll ing Mechanis m

The Polling mechanism is the mechanism that the master node of the EIPS
ring actively detects the health status of the ring network. The master
node periodically sends HEALTH packets from two ports at the same time,
which are transmitted on the ring via the transmission nodes in turn. If the
master node can receive the HEALTH packet sent by itself from any port, it
indicates that the ring network link is complete and the assistant port is
blocked, as shown in Figure 15-1. If the two ports cannot receive the
HEALTH packets within the specified time, it is regarded that the ring
network link fails; enable the assistant port; send COMM-FLUSH-FDB
packets from two ports, as shown in Figure 15-2. When the master node
in the Failed state receives the HEALTHA packets sent by itself from the
assistant port, it first turns to the PRE-UP state. After some time, it turns
to the Complete state, blocks the assistant port, refreshes FDB, and sends
COMP-FLUSH-FDB packets from the master port to inform all transmission
nodes to enable the temporary-blocked ports and refresh FDB.
There are two aspects of reasons why the master node sends HEALTH
packets from two ports at the same time:
When the ring is uni-directional, if do not send HEALTH packets from

two ports at the same time, maybe the master node cannot receive
the HEALTH packets, so it enables the assistant port and as a result,
the uni-directional link become loop;
When enabling the standby master node function, if the one link in the
loop is DOWN, the standby master node is at the port that does not
send the HEALTH packets on the master node, so the standby master
node function cannot take effect.

Figure 15-1 The running when the uni-ring is in the non-fault state
Figure 15-2 The master node cannot receive the HEALTH packets
M echanis m of N oti f ying Link Status Chan ge

The mechanism of notifying link statue change provides the mechanism of
processing the ring network topology change that is faster than the Polling
mechanism. The initiator of the mechanism is the transmission node,
which always monitors its port link status. Once the status changes, the
transmission node sends the packet to inform the master node and then
the master node decides how to deal. If it is found that the port is Down,
send the LINK-DOWN packet, as shown in Figure 15-3. After the master
node receives the LINK-DOWN packet, it turns to the Failed state and
sends the COMM-FLUSH-FDB packets to the transmission node on the ring
via two ports.

Figure 15-3 Transmission node detects that the physical line is down
Sub Ring Protocol Packe t Detect ion

M echanis m
The edge node sends the EDGE-HEALTH packets to the assistant edge
node from two directions via the two ports of the associated transmission
node, so as to detect the faults of the master ring link, as shown in Figure
15-4. When the assistant edge node does not receive the EDGE-HEALTH
packet, it indicates that at least two points on the master ring are broken.
The assistant edge node switches to the MAJOR-FAULT state and sends the
MAJOR-FAULT packets to the edge node via its edge port. After the edge
node receives the MAJOR-FAULT packet, the status machine switches to
the MAJOR-FAULT state and blocks the edge port, so as to avoid the loop
during the dual-homing, as shown in Figure 15-5.
When the edge node and assistant edge node receive the COMP-FLUSH-
FDB packets of the sub ring, turn to the LINK-UP state unconditionally.
When the assistant edge node receives the EDGE-HEALTH packet, the
status machine turns to the LINK-UP state.
To avoid that the edge node becomes disordered when receiving the
MAJOR-FAULT and COMP-FLUSH-FDB packets and the status of the edge
node becomes wrong, when the assistant edge node turns to the LINK-UP
state, send the MAJOR-RESUME packet to the edge node. After receiving
the packet, the edge node needs to turn to the LINK-UP state.

Figure 15-4 The sub ring detecting the master ring link
Figure 15-5 The sub ring detects the master ring link fault

EIPS Typical Application

U ni- r ing Net working Applicat ion
Figure 15-6 EIPS uni-ring networking
As shown in Figure 15-6, there is only one ring in the network topology.
Here, you just need to define one EIPS domain and one EIPS ring. The
feature of the networking is that when the topology changes, the response
speed is high and the convergence time is short, which can meet the
application when there is only one ring in the network.
Sub Ring App licat ion
Figure 15-7 Typical network of the EIPS sub ring
As shown in Figure 15-7, there are two or more rings in the network
topology, but there are two public nodes between the rings. Here, you just
need to define one EIPS domain and select one ring as the master ring
and the other as the sub ring. The typical application of the networking is
that the master node of the sub ring can go upstream via two edge nodes
and provide the upstream link backup.

Hierarchical EIPS
Main contents:
Basic concepts and abbreviations of EIPS
Basic network topology of EIPS
Ports and protocol packets on the ring
EIPS protocol mechanism
Basic Concepts and Abbreviations

Basic Concep ts
Ethernet Ring: It is a set of a group of Ethernet switch nodes those are
interconnected as a ring.
Master Node (master, M for short): It is the main decision maker and
control node on the ring of one domain. There is only one master node on
one single ring. The two ports of the master node on the ring are the
master port and the assistant port. When the link pf the domain controlled
by the master node is complete, the assistant port blocks all data to avoid
the loop. When the link on the ring fails and if the port of the faulty link is
not the assistant port of the master node, enable the forwarding function
of the assistant port.
Transmission node (transit, T for short): It is the node that transmits

data and cooperates with the master node to protect the ring in one
domain. It has two ports in the ring. When finding the link of the port fails,
the transmission node informs the master of the domain, updates the port
address forwarding table according to the received control packet, and
enables the port. Besides the master node, the others are transmission
nodes in one single ring.
Topology level (level): It is the division of the loops protected by one

EIPS domain. The loop protected in one domain comprises one ring or
several intersecting rings. When there is only one ring in the domain, set
the ring level as major-level ring and the level is 0; when there are
multiple intersecting rings in the domain, choose one ring as the major-
level ring and topology level is 0. The ring connected to the major-level
ring becomes the low-level link connected to the major-level ring after
removing the public link part with the major-level ring. The ring connected
to the low-level link becomes the lower-layer link after removing the public
link with the low-level link. For the low-level links in the topology, the
lower the level is, the higher the level number is. The level number of the

major-level ring is highest (it is 0). Here, the major ring is one complete
ring. The low-level links are the un-complete ring link set after removing
the public links with the access upper layer.
Topology segment (segment): It is the division of different low-level

links on the same level in the domain. It is used to distinguish the
different low-level links of the same level. There can be multiple low-level
links on the same level in the domain. The segment is used to distinguish
the different low-level links of the same level. The multiple low-level links
of the same level use the different segment numbers. Here, the segment
number of the major-level ring is 0. After dividing the levels and segments
in the domain, the ring or low-level link of each level and segment has one
unique level number and segment number in the whole domain, called
level segment. The low-level link whose level number and segment
number are defined is called low-level segment link. The low-level
segment link is the segment link connected on the major-level ring or
between the two edge ports of the upper-level segment link.
Edge-control node (edge-control, E-C for short): It is the main

decision maker and control node of the low-level segment links in the
domain. The edge-control node has one port in the level segment. The
ports of the low-level segment link connected to the upper-level segment
belong to the low-level segment link. If the accessed nodes of the upper-
level segment controls the edge ports to protect the low-level segment link,
the accessed nodes of the upper-level segment are called edge control
nodes, which belong to the low-level segment link, but not the accessed
upper-level segment. The function of the edge control node is similar to
the low-level master node. When the links of the controlled level segment
are complete, the edge ports block the forwarding function of protecting
the service VLAN, avoiding the closed ring in the domain. When the link of
the controlled level segment fails and if the ports of the faulty link are not
the edge ports, the edge ports enable the forwarding function of protecting
the service VLAN so that the VLAN data of protecting the services can pass
the edge ports. Select one of the two nodes those connect the low-level
segment link to the upper-level segment as the edge control node, which
is responsible for controlling the level segment link.
Edge assistant node (edge-assistant, E-A for short): It is the node of

the low-level segment link that transmits the data and cooperates with the
decision node to protect the ring in the domain. The edge-assistant node
has one port in the level segment. When the access node of the low-level
segment link connected to the upper-level segment is not the edge control
node, the node is the edge assistant node, which only belongs to the
accessed low-level segment link, but not belong to the accessed upper-
level segment. The edge assistant node is responsible for transmitting the
loop status detection packets sent by the edge control node on the level
segment link. When it is found that the level segment link fails, the edge
assistant node serves as the decision node to send the link fault
notification packet.

Edge node: It is the intersecting point of the two rings. It is associated

with multiple different levels and has at least three ports in one domain. It
is the compound role. The edge node can have different roles in different
levels. In the accessed low-level segment link, it can be edge control node
or edge assistant node; in the accessed upper-level segment, it can be the
master node or transmission node.
Control VLAN: To control the EIPS protocol packets to be transmitted

only in the EIPS domain, use one VLAN to control the EIPS protocol
packets. The EIPS control VLAN cannot be configured with the L3 interface.
Abbre via tions

ERP: Ethernet Ring Protection
EIPS: Ethernet Intelligent Protection Switching
MAC: Media Access Control
FDB: Forwarding Database
VLAN: Virtual Local Area Network
STP: Spanning Tree Protocol
MSTP: Multiple Spanning Tree Protocol
Basic Network Topology of EIPS

U ni- r ing Topolog y
When the domain includes one single ring, define the single ring as the
major-level ring, the level is defined as 0 and the segment is defined as 0,
as shown in Figure 15-8.

Figure 15-8 EIPS uni-ring
In Figure 15-8, the nodes T1, T2, T3, and M form the major-level ring
(level 0, seg 0); the node M is the master node; the nodes T1, T2, and T3
are the transmission nodes. When the major-level ring is not faulty,
EIPS blocks the services of the second port S.
Int ersecting R ing Topol og y

When the domain includes multiple physical rings those intersect with each
other, de-compound it to one hierarchical structure that includes one
major-level ring and several low-level segment links. The level of the
major ring is defined as 0 and the segment is defined as 0. The low-level
segment link is distributed with one level number and segment number.
The lower the level is, the higher the level number is.
Figure15-9 EIPS intersecting rings
In Figure 15-9, choose one of the intersecting rings as the major-level ring
and the other rings degenerate as the low-level segment link. The nodes
T1, T2, T3, T4, and M form the major-level ring; the node M is the master
node; the nodes T1, T2, T3 and T4 are the transmission nodes. Divide the
level and segment for other links; (level 1, segment 1) includes the nodes
T1, T2, T3 and T4. Here, the node T2 is the edge control node; the node
T1 and T2 are the transmission nodes; the node T3 is the edge assistant
node. When ((level 1, segment 1) link is not faulty, the node T2 blocks the
edge port connected to (level 1, segment 1). The major-level ring is one
single ring and the low-level segment link is one link. The larger the level
number is, the lower the level is.

N ode Roles
Master Node:
The major-level ring of one domain has one master node, that is, the
master node of the major-level ring. The master node is the initiator of
detecting the major-level ring status actively and the decision maker of
executing the operation after the major-level ring topology changes.
The master node sends the HEALTH packets periodically from two ports,
which are transmitted via the transmission nodes on the ring. If the
master node can receive the HEALTH packets sent by itself, it indicates
that the major-level ring link is complete; if the two ports cannot receive
the HEALTH packets within the specified time, it regards that the ring
network link fails.
The master node has the following four states:
Complete State
The major-level ring is in the stable state and there is no broken link in
the ring. The master node blocks the service forwarding function of the
protect VLAN of the assistant port, to as to prevent the network storm
caused by the loop. Meanwhile, the master node periodically sends the
HEALTH packet, which is transmitted via the transmission node when the
loop is normal and returns to the port of the master node.
Failed State
When the link of the major-level ring is disconnected, the master node
enters into the Failed State after receiving the event that the link is
disconnected. If the corresponding port of the faulty link is not assistant
port, the assistant port enables the data forwarding function of the protect
VLAN. Because the topology of the major-level loop changes, the master
node needs to send the COMM-FLUSH-FDB control messages from the
main port and assistant port to inform all other nodes of the level segment
to clear up the address entries of the master node and the protected VLAN.
Init State
When the master node begins to initiate, the link status of the current loop
is not known, so set the current status as Init State until the actual status
of the loop is detected.
PRE-UP State

To avoid that the fault point flaps repeatedly and the loop status
frequently switches, which causes the interruption of the service data, the
master node waits for some time and then enters the Complete State from
the Failed State. During the waiting time, the status of the master node is
PRE-UP State.
Transmission Node:
The transmission node is responsible for monitoring the status of the link
on the direct-connected loop. When the link fails, send the LINK-DOWN
packet to inform the control node of the level segment and then the
control node decides how to deal. When the COMP-FLUSH-FDB and COMM-
FLUSH-FDB packets of the control node are received, update the FDB table
related with the protection service VLAN.
The transmission node has the following four states:
Complete State:
When receiving the COMP-FLUSH-FDB packet of the level segment, enter

the Complete State.
Failed State:
When receiving the COMM-FLUSH-FDB packet of the level segment, enter

the Failed State.
Init State:
When the transmission node begins to initiate, the link status of the
current loop is not known, so set the current status as Init State and send
the ASK packet to query the control node of the level segment.
Pre-forwarding:
The status appears at the moment when the link recovers. When in the
state, the original Down port becomes up. The EIPS control VLAN is
enabled and can forward the EIPS protocol packets, but the service VLAN
is still blocked. After the loop enters the Complete state and the
transmission node receives the COMP-FLUSH-FDB packet of the control
node, enable the forwarding function of the service VLAN and turn to the
Complete state. If the transmission node does not receive the COMP-
FLUSH-FDB packet within the specified time, automatically turn to the
Complete state.
Edge Control Node:
The edge control node is the control node that has only one port on the
low-level segment link. There is no master node in the level segment link.

The edge control node periodically sends the HEALTH packet to the level
segment link from the access port. When the link is complete, the returned
HEALTH packet can be received. The edge control node is similar to the
master node and has the following four status:
Complete State
The level segment link is in the stable state and there is no broken link.
The edge control node blocks the service forwarding function of the protect
VLAN of the access port, to as to prevent the network storm caused by the
loop. Meanwhile, the access port periodically sends the HEALTH packets,
which are transmitted via the nodes of the low-level segment link when
the loop is normal and return to the access port of the edge control node.
Failed State
When the access port of the edge control node does not receive the
returned HEALTH packets within the specified time or receives the event
that the link is disconnected on the level segment link, the node enters the
Failed State. If the corresponding port of the faulty link is not the access
port of the edge control node, enable the data forwarding function of the
protection service VLAN of the access port. Because the topology of the
level segment link changes and the edge control node needs to send
COMM-FLUSH-FDB control message to inform the other nodes on the level
segment link and the related nodes of the upper level to clear up the FDB
table of the node and the protected VLAN.
Init State
When the edge control node begins to initiate, the link status of the
current level segment link is not known, so set the current status as Init
State until the actual status of the loop is detected.
PRE-UP State
To avoid that the fault point flaps repeatedly and the loop status
frequently switches, which causes the interruption of the service data, the
edge control node waits for some time and then enters the Complete State
from the Failed State. During the waiting time, the status of the edge
control node is Preforwarding State.
Edge Assistant Node:
The edge assistant node is the non-control node that has only one port on
the low-level segment link. When receiving the HEALTH packet sent by the
control node of the level segment link, return it to the control node from
the receiving port and cooperate with the control node to detect the level
segment link status. If the edge assistant node does not receive the
HEALTH packet within the specified time, it is regarded that the link
between the edge assistant node and the control node fails. When the
edge assistant node receives the LINK-DOWN packet of the level segment
link, it is also regarded that the link between the edge assistant node and
the control node fails. The edge assistant node is responsible for

monitoring the status of the link on the direct-connected loop. When the
link fails, send LINK-DOWN packet to inform the control node of the level
segment. When the edge assistant node finds that the link between itself
and the control node of the level segment link fails, it serves as the
temporary control node and send the COMM-FLUSH-FDB packet to inform
the other nodes on the level and the upper-level nodes to update the FDB
table related with the protection service VLAN.
Port and Protocol Packets on Ring

M ain Port and Assistan t Port
The master node and transmission node are connected to two ports of the
ring link. One is the main port and the other is the assistant port. The port
role depends on the user configuration. The main port and assistant port
of the master node are different on the function, while the main port and
assistant port of the transmission node has no difference on function.
The master node of the main ring send the HEALTH packets from two
ports. If at least one port can receive the packet from the other port, it
indicates that the main ring is complete, so you need to block the data
forwarding function of the protection service VLAN of the assistant port.
Contrarily, if the HEALTH packet is not received within the specified time
or the LINK-DOWN packet of the main ring is received, it indicates that the
major-level ring fails. If the corresponding port of the faulty link is not the
assistant port, you need to enable the protection service VLAN forwarding
function of the assistant port, so as to ensure the normal communication
of all nodes on the ring. Besides, the master node of the main ring
receives the address update packet from other low-level segment link, but
does not forward it.
The main port and assistant port of the transmission node has no
difference on the function. The port role also depends on the user
configuration.
Edge Port
The edge node has only one port connected to one level segment link and
the port is the edge port. When the address refresh message COMP-
FLUSH-FDB and COMM-FLUSH-FDB is received from the edge port and if
the upper level does not get the status change notification of the level
segment link that sends the control message, send the packet to the
upper level and update the FDB table of the port related with the
protection service VLAN.

D ata Fo r ward ing Funct ion of Port

The data forwarding function of the node port (including main port,
assistant port and access port) has the following two status:
Block: block port, prohibit the data from being forwarded via the port;
Forward: enable port, permit the data to be forwarded via the port;
For example, when the link on the main ring is normal, the master node of
the main ring blocks the assistant port so that the data in the protection
service VLAN cannot pass the assistant port of the master node, avoiding
the loop. When the link on the main ring fails and the corresponding port
of the faulty link is not the assistant port of the master node, the master
node enables the assistant port and permits the data in the protection
service VLAN to pass the assistant port and recover the communication of
service data.
For mat of EI PS Protoco l Packet

The format of the Ethernet ring protection protocol packet is as follows:
Table 15-4
0 15 16 31 32 47
Destination MAC address (6 bytes)
Source MAC address (6 bytes)
Type (Ether Type) (TPID) PRI + CFI + VLAN ID Packet length (Frame
Length)
DSAP/SSAP CONTROL OUI = 0x00E02B
0x00BB 0x99 0x0B ERP_LENGTH
ERP_VER ERP_TYPE CTRL_VLAN_ID LEVEL_ID SEG_ID
0x0000 SYSTEM_MAC_ADDR (high 4 bytes)
Low 2 bytes HEALTH_TIMER FAIL_TIMER
STATE 0x00 HEALTH_SEQ 0x0000
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
RESERVED (0x000000000000)
The packet format is described as follows:

Destination MAC address: 48bits, described as follows:
Destination MAC Description

0180.c200.0035 The destination MAC of the HEALTH, LINK-DOWN or ASK-
RING-STATE packet; the transmission node sends the packet
from another port to other nodes and does not send it to the
CPU of the transmission node for processing.
00E0.2B00.0004 The destination MAC of COMM-FLUSH-FDB/COMP-FLUSH-FDB
packet; the transmission node sends the protocol packet to
CPU for processing and sends it from another port to other
nodes.
0001.7A4F.4AB6 The topology request packet
0001.7A4F.4AB4 Uni-directional detection packet
0001.7A4F.4AB5 The HELLO1 packet sent when the standby master node does
not receive the HELLO packet within the specified time
Source MAC address: 48bits, the MAC address of the sending node;
TPID: 8 bits, fixed as 0x8100;
PRI+CFI: 4bits, not defined, the priority can be defined (7 is

recommended by default), the standard format frame with CFI as 0;
VLAN ID: 16bits, not defined;
Frame Length: 16bits, the length of the Ethernet frame, fixed as 0x48;
DSAP/SSAP: 16bits, fixed as 0xAAAA;
CONTROL: 8bits, fixed as 0x03;
OUI: 24bits, fixed as 0x00E02B;
ERP_LENGTH: 16bits, fixed as 0x40;
ERP_VERS: 16bits, fixed as 0x0001;
ERP_TYPE: 16bits, the packet type;
CTRL_VLAN_ID: 16bits, the ID of the control VLAN;
LEVEL_ID: 8bits, the level number of the segment link, the major-
level ring is 0; the low-level link is larger than 0;
SEG_ID: 8bits, the ID of the segment link; the major-level ring is 0;
SYSTEM_MAC_ADDR: 48bits, the MAC address of the sending node;
HEALTH_TIMER: 16bits, the period of sending the HEALTH frames set

by the master node and edge control node (the unit is ms);
FAIL_TIMER: 16bits, the timeout of not receiving the HEALTH frames

set by the master node and edge control node (the unit is ms);
STATE: 8bits, the node status;

HEALTH_SEQ: 16bits, the serial number of the HEALTH frame,

generated by the maser node;
ERP_TYPE: the packet type, defined as follows:
HEALTH=5, the link health detection HEALTH packet; the

destination MAC address of the packet is 0x0180c2000035; the
protocol packet does not need to be transmitted to the CPU of the
transmission node;
COMP-FLUSH-FDB=6, the COMP-FLUSH-FDB packet of informing

that the link is complete; the destination MAC address of the
packet is 0x00E02B000004; the protocol packet needs to be
transmitted to the CPU of the transmission node;
COMM-FLUSH-FDB=7, the COMM-FLUSH-FDB packet of informing

that the link fails; the destination MAC address of the packet is
0x00E02B000004; the protocol packet needs to be transmitted to
the CPU of the transmission node;
LINK-DOWN=8, the link fault alarm LINK-DOWN packet; the

transmission node;
ASK-RING-STATE=9, the link status query ASK packet; the packet

when the transmission node and assistant edge node asks the
current loop status of the master node during initialization; the
transmission node;
LINK-HELLO =14 uni-directional detection packet;
TOPOLOGY=15 topology collection packet, including topology

request and topology response packet;
Other values are reserved;
The definition of the STATE value:
IDLE=0
COMPLETE=1
FAILED= 2
LINK-UP =3
LINK-DOWN =4
PRE-FORWARDING=5
The other values are reserved.

EIPS Protocol Mechanism

U ni- r ing Running Mech anis m
The uni-ring is one major-level ring. The nodes on the major-level ring
detect and protect the links of the major-level ring, ensuring that the data
communication of the protection service VLAN of any two nodes on the
major-level ring has one connected logical path at most and the Ethernet
control packet of the major level can only be transmitted in the major-
level ring.
N on -fau lt Status
When the links and nodes on the uni-ring has no fault, the master node
periodically sends the HEALTH packets from the main port, which are
transmitted via the transmission nodes and links on the ring to reach the
assistant port of the master node. The master node blocks the protect
VLAN forwarding function of the assistant port so that the data in the
protect VLAN cannot be transmitted via the assistant port of the master
node, avoiding the loop. The control VLAN does not block and the EIPS
protocol packets can pass the blocked assistant port of the master node.
As shown in Figure 15-10, the master node M periodically sends the
HEALTH packets; because the loop is not faulty, the HEALTH packet
reaches the assistant port of the master node; the master node blocks the
data forwarding function of the protect VLAN of the assistant port,
avoiding the loop.
Figure15-10 The non-fault status of the uni-ring

Loop Fau lt Status

When the link on the ring fails, block the data forwarding function of the
corresponding port of the faulty link after the neighbor node of the faulty
link detects the fault. To prevent the loop protocol packet from passing the
faulty link during uni-direction, the protocol packet cannot pass the
corresponding port of the faulty link. If it is detected that the faulty node
of the link is transmission node, send the LINK-DOWN packet from
another non-fault port. After the master node receives the LINK-DOWN
packet, it is regarded that the ring fails, as shown in Figure 15-11. To
prevent the LINK-DOWN packet from being lost, the master node has the
standby detection mechanism. When the master node does not receive the
HEALTH packet within the specified time, it is regarded that the loop fails.
After the master node detects that the link fails, enable the data
forwarding function of the assistant port at once.
Figure 15-11 Transmission node detects the link fault
If the master node itself fails, the processing is different. If the main port
fails, block the main port and enable the data forwarding function of the
assistant port; if the assistant port fails, the assistant port is still blocked.
Fault Reco ve r y
After the link fault on the ring disappears, the neighbor node of the faulty
link detects that the link fault of the port disappears; set the port of the
link on which the fault disappears as the status of forwarding the ring
network control packets so that the port can forward the EIPS protocol
packets. Set the port status as Pre-Forwarding, but the port still cannot
forward the packets of the protect VLAN.
When the link fails, the master node periodically sends the HEALTH packet
from the main port. After the link fault disappears, the master node
regards that the link recovers when the assistant port receives the HEALTH
packet. To prevent the link status flap, turn to the PRE-UP state, enable
the PRE-UP timer, and enable the data VLAN. After the PRE-UP timer times

out, turn to the COMPLETE state, re-block the data forwarding function of
the protect VLAN of the assistant port and send the COMP-FLUSH-FDB
packet to the main port. Meanwhile, the master node updates the FDB
address table of the port. After the transmission node on the ring receives
the COMP-FLUSH-FDB packet, update the FDB table of the port, set the
two neighboring ports of the faulty link as Forward state, and enable the
protect VLAN data forwarding function of the port.
To prevent the COMP-FLUSH-FDB packet from being lost, set the Pre-
Forwarding port as Forward and enable the protect VLAN data forwarding
function of the port when the neighboring node of the link on which the
fault disappears does not receive the COMP-FLUSH-FDB packet within the
specified time so that the data of the protect VLAN is transmitted
according to the topology. To prevent that the transmission node receives
two COMP-FLUSH-FDB packets, which results in the repeated updating of
the port FDB address, record the current loop status as Complete State
when the transmission node receives the COMP-FLUSH-FDB packet. If the
recorded current loop status of the transmission node is Complete State,
do not process after receiving the COMP_FLUSH-FDB packet, avoiding the
repeated updating of the port FDB table. To make the status of all
transmission nodes on the ring consistent, the master node periodically
sends the COMP-FLUSH_FDB packets. As shown in Figure 15-12, after the
link fault between the nodes T2 and T3 recovers, the nodes T2 and T3
detect that the link fault of the port disappears and set the port of the link
on which the fault disappears as the status of permitting forwarding the
ring network control packets so that the port can forward the Ethernet ring
network protect control packets. Set the port status as Pre-Forwarding,
but the port still cannot forward the packets of the protect VLAN. If the
HEALTH packets sent by the master node from the main port can pass the
link on which the fault recovers to reach the assistant port, it is regarded
that the loop recovers and starts to work and the status turns to PRE-UP;
enable the PRE-UP timer. After the PRE-UP timer times out, turn to the
COMPLETE. As shown in Figure 15-13, the master node blocks the protect
VLAN data forwarding function of the assistant port and sends the COMP-
FLUSH-FDB packet to inform other nodes of the loop recovery and to
update the FDB table of the port. After other nodes on the ring receive the
COMP-FLUSH-FDB packet, update the FDB table of the port, the
neighboring node of the link on which the fault recovers enables the Pre-
Forwarding port so that the data of the protect VLAN can pass and the
loop completes the fault protect switchover.

Figure 15-12 Fault recovering
Figure 15-13 Fault recovery is complete
R unning Mechanis m of Intersect ing Rings

After dividing the intersecting rings to major-level ring and low-level
segment link, the major-level ring is one single ring and is protected
according to the uni-ring protect running mechanism. The nodes on the
low-level segment link detect the low-level segment link, ensuring that the
data communication of the protect service VLAN of any two nodes on the
low-level ring has one connected logical path at most, and the HEALTH
and LINK-DOWN packets of the low-level segment link can only be
transmitted in the low-level segment link. When the loop of the low-level
segment link switches, the edge node sends the COMP-FLUSH-FDB and
COMM-FLUSH-FDB packets to the high-level node, informing the high-level
node to update the FDB table of the port.

N on -fau lt Status of Lo w -le vel Link

The edge control node periodically sends the HEALTH packets from the
edge port, which are transmitted via the transmission nodes and links on
the low-level segment link to reach the edge assistant node. After the
edge assistant node receives the HEALTH packet, detect that the level and
segment of the HEALTH packet are the local level segment and return the
packet from the receiving port. The edge port of the edge control node can
receive the HEALTH packet returned by the edge assistant port. The edge
control node blocks the protect VLAN forwarding function of the edge port
so that the data in the protect VLAN cannot pass the edge port of the edge
control node, preventing the loop, but the Ethernet loop protect protocol
packets can pass the blocked edge port of the edge control node.
Lo w-le ve l Link Fault

If the edge control node does not receive the HEALTH packets within the
specified time, it is regarded that the link fails. The nodes all detect the
link status on the ring. When the node detects that the port link status of
itself is faulty, send the LINK-DOWN packet to the edge control node and
edged assistant node of the level segment link so that the edge control
node and edge assistant node knows that the link fails. To avoid the loop
when the link recovers, the two neighboring nodes of the faulty node
blocks the data forwarding function of the protect service VLAN of the
faulty port and prevents the EIPS protocol packets from being forwarded
via the faulty port.
After the edge control node and edge assistant node detects the fault
status of the level segment link, the edge control node sends the COMM-
FLUSH-FDB packets from the edge port and the two ports of the accessed
level. If the faulty port is not the edge port of the edge control node,
enable the data forwarding function of the edge port protect service VLAN
of the edge control node; when the edge assistant node detects that the
local level segment link fails, send the COMM-FLUSH-FDB packets from the
edge port and the two ports of the accessed level.
When the transmission node receives the COMM-FLUSH-FDB packet and if

the level of the node is higher than or equal to the level of the sending
source, refresh the port FDB table. When the edge node receives the
COMM-FLUSH-FDB packet from the edge port and if the level of the edge
access port is higher than equal to the level of the sending source and the
upper-level node does not know the link status change of the sending
source level, forward the COMM-FLUSH-FDB packet to the upper level and
update the FDB table of the port related with the protect service VLAN.

Lo w-le ve l Segm ent Link R eco ver y

When the link is faulty, the edge control node periodically sends HEALTH
packets from the edge port. When receiving the HEALTH packet returned
by the edge assistant node, it is regarded that the low-level link between
the edge ports recovers; block the data forwarding function of the protect
service VLAN of the edge port; send the COMP-FLUSH-FDB packet from
the edge port and the two ports of the accessed level; update the FDB
table of the port related with the protect service VLAN. When the
transmission node receives the COMP-FLUSH-FDB packet and if the level
of the node is higher than or equal to the level of the sending source,
refresh the port FDB table.
When the edge node receives the COMP-FLUSH-FDB packet from the edge
access port and if the level of the edge access port is higher than or equal
to the level of the sending source and the upper-level node does not know
the link status change of the sending source level, forward the COMP-
FLUSH-FDB packet to the upper level and update the FDB table of the port
related with the protect service VLAN. After the two neighboring ports of
the faulty link detects that the link recovers, the EIPS protocol packets are
forwarded via the port on which the fault recovers; set the port status as
Pre-Forwarding. If the COMP-FLUSH-FDB packet of the local level segment
is received, enable the data forwarding function of the protect service
VLAN on the port; if the COMP-FLUSH-FDB packet is not received within
the specified time, automatically time out and enable the port.
Extended Functions
Realizing the Ethernet intelligent protect switch is the basic function of
EIPS and is also the main function. The following describes several
extended functions.
Main contents:
Payload balance function
Topology auto collection function
The networking mode of not sending HELLO command
Uni-directional detection function
Reliability realization

Payload Balance Function

The basic function of EIPS is to prevent the network ring by blocking the
port. In this way, all user data has only one link to choose, regardless of
the networking, so it is easy to form the traffic bottleneck. The block
granularity can be accurate to the instance on the port, which can solve
the problem validly. The EIPS payload balance function is based on the
method, so the each ring control granularity of EIPS needs to be accurate
to the instance.
The EIPS node is based on one or multiple spanning tree instances.

Perform the protection and switch on the data of the instances. One
physical ring can be configured with multiple EIPS rings and different rings
block different ports, so as to realize the payload balance, as shown in the
following figure.
Figure 15-14 EIPS payload balance
The four switches M1, M2, M3, and M4 are interconnected with each other,
forming one physical ring. Configure four EIPS rings on the physical ring;
the master node of R1 is M1 and the protect instance is inst 1; the master
node of R2 is M2 and the protect instance inst 2; the master node of R3 is
M3 and the protect instance is inst 3; the master node of R4 is M4 and the
protect instance is inst 4. When the physical ring is complete, the EIPS
ring R1, R2, R3, and R4 are all complete. The master node of R1 M1 blocks
the data of inst 1 at the assistant port S; the master node of R2 M2 blocks
the data of inst 4 at the assistant port S. The data traffic of each instance
can pass different link, so as to realize the payload balance.

Topology Auto Collection Function

To manage and maintain the network nodes on the ring, EIPS provides the
L2 topology auto collection function. Any one node that enables the EIPS
can see the other nodes in the ring and can describe the topology
structure.
Basic Theor y
Each node on the ring collects the topology separately. When EIPS is
enabled on the node, the ports of the node actively send the multicast
topology request packet. After the other nodes on the same logical ring
receive the packet, add one to the TTL value. The receiving port returns
the unicast topology response packet to the requester. The response
packet contains the basic information of the node, including the node type,
node status, the information about the contained ports and so on.
Meanwhile, for the master node and transmission node, continue to
forward the topology request from another port. Each node need to reply
after receiving the topology request sent by other node. After the node
receives the topology response packet, save the information and confirm
the location in the node according to the TTL value in the packet. After all
nodes respond, the whole topology structure can be described completely.
The topology collection can reflect the topology status of the current ring,
that is, whether it is one complete ring topology structure. The main ring
and sub ring cannot see the topology structure of each other, but only can
see whether there is other edge node on the transmission node.
For the edge node and assistant edge node, there is only one port, so the
seen topology is the topology collected by the port; but for the master
node and the assistant edge node, when the topology is complete, the
topologies collected by the two ports are complete and consistent; when
the topology is in-complete, for example, one link is disconnected, they
can only collect the part of the topology and you need to combine the
collected parts to form one complete topology. The seen by the user on
the node is the complete topology after combining the topologies collected
by the two ports.
The realtime requirement is not high. Each node sends one topology
request every 10 seconds, so when the topology changes, it cannot get
the response at once and needs to be re-discovered by the re-collection of
the topology after 10 seconds. Each collection updates the previous
topology according to the new response packet. If one node is not updated
within 10 seconds, it is regarded that the node is in the topology range.

Topolog y Request Packe t

The topology request packet is as follows:
Figure 15-15 The structure of the topology request packet
The request packet is formed by standard EIPS packet +topology

information head. In the standard EIPS packet field, the destination MAC
address of the Ethernet head field is 0001.7A4F.4AB6. The packet whose
destination address is the address received by any node needs to be sent
to CPU. ERP_TYPE in the standard EIPS packet is TOPOLOGY(15).
The meanings of the fields in the topology information head are as follows:
type: one byte; 1 indicates the topology request; 2 indicates the

topology response;
ttl: one byte; indicating the location of the node relative to the request
node; fill 0 in the topology request packet; add one after passing one
node;
baseMac: 6 bytes, indicating the MAC address of the device; for the
topology request packet, it is the device MAC address of the request
node; it should be null in the topology response packet;
DMAC: 6 bytes, indicating the MAC address of the destination port; in

the topology request packet, it is all 1; in the topology response
packet, it is the MAC address of the request port;
SMAC: 6 bytes, indicating the MAC address of the source port; in the
topology request packet, it is the MAC address of the request packet;
in the topology response packet, it is the MAC address of the response
port;
Topolog y Response Packet

The topology response packet is as follows:

Figure 15-16 The structure of the topology response packet
The topology response packet is formed by standard EIPS packet +

topology information head + node information.
In the standard EIPS packet field, the destination MAC address is the MAC
address of the initiating port of the topology request initiator. The MAC
address is got from the SMAC field in the information head of the received
topology request packet. ERP_TYPE in the standard EIPS packet is
TOPOLOGY(15). In the topology information head, type is 2; ttl is the hops
from the initiator to responder; DMAC is the destination MAC address, the
MAC address of the initiating port of the initiator, that is the value of the
SMAC field of the head information field in the topology request packet;
SMAC is the MAC address of the sending port.
The meanings of the fields in the node information are as follows:
hop: one byte, indicating the hops from the responder to initiator,
equal to the TTL value in the packet;
nt: four bits, short for node type, indicating the type of the response
node;
ns: three bits, short for node status, indicating the current status of
the response node;
b: one bit, short for border, indicating whether there is the edge node
connection; 0 means no; 1 means yes;
bm: four bits, short for backup master, indicating whether it is the
backup master node; 0 means no; 1 means yes;
ar: four bits, short for actor role, only valid for the backup master
node; o means that the backup master node role is not the master
node; 1 means that the backup master node serves as the master
node;
host name: 32 bytes, the host name of the response node;

base mac: 6 bytes, the device MAC address of the response node;
sys oid: 16 bytes, the system OID of the response node;
r_role: one byte, indicating the port role of the port that receives the
request packet;
r_b: four bits, short for r_blockstatus, indicating the BLOCK status of
the port that receives the request packet on the ring of the node; 0
means non-BLOCK; 1 means BLOCK;
r_l: four bits, short for r_linkstatus, indicating the LINK status of the
port that receives the request packet; 1 means UP; 2 means DOWN;
r_i: two bytes, short for r_index, indicating the number of the port
that receives the request packet;
r_n: 16 bytes, short for r_name, indicating the name of the port that
receives the request packet. To save the memory space, intercept a
part of the port name. If it is the common port, omit port. For
example, save as 0/0/1 or 0/1; if it is the aggregation port, omit
linkaggregation. For example, save the aggregation port 1 as 1
and aggregation port 2 as 2;
r_mac: 6 bytes, indicating the MAC address of the port that receives
the request packet;
s_role: one byte, indicating the role of the port that forwards the
request packet;
s_b: four bits, short for s_blockstatus, indicating the BLOCK status of
the port that forwards the request packet on the ring of the node; 0
means non BLOCK; 1 means BLOCK;
r_l: four bits, short for r_linkstatus, indicating the LINK status of the
port that forwards the request packet; 1 means UP; 2 means DOWN;
r_i : two bytes, short for r_index, indicating the number of the port
that forwards the request packet;
r_n: 16 bytes, short for r_name, indicating the name of the port that
receives the request packet. To save the memory space, intercept a
part of the port name. If it is the common port, omit port. For
example, save as 0/0/1 or 0/1; if it is the aggregation port, omit
linkaggregation. For example, save the aggregation port 1 as 1
and aggregation port 2 as 2;
r_mac: 6 bytes, indicating the MAC address of the port that forwards
the request packet;

Networking Mode of Not Sending HELLO

The master node supports the mode of not sending HELLO command.
When the master node sets the hello timer as 0, do not need to send the
HELLO packet; when not receiving the HELLO packet, the receiving timer
times out and does not modify the EIPS node status. In the mode of not
sending the HELLO packet, as long as the master node detects that the
two ports both become up, it turns to the PRE-UP state.
Figure 15-17 EIPS supports connecting to higher-level network
As shown in Figure 15-17, EIPS is configured as the master node, being

connected to one network N via two lines main and backup. If both the
main line and backup line do not have fault, EIPS blocks the port S of the
backup line and the data is aggregated to network N via the main line. If
the main line fails, EIPS enables the port S of the backup line and the data
is aggregated to the network N via the backup line.
Uni-directional Detection Function

In the present EIPS technology, the EIPS node detects the line fault
according to the signal status of the physical line. If there are other
transmission devices between two nodes on the EIPS ring as shown in
Figure 15-18, the uni-directional fault appears between the transmission
devices M1 and M2. EIPS cannot detect the fault according to the line
signals. The EIPS master node also cannot detect the uni-directional fault
according to HELLO. In the actual network, the possibility of the uni-
directional fault is small.

Figure 15-18 Uni-directional fault on transmission device between two

EIPS nodes
To solve the problem, the EIPS nodes send the detection packet LINK-
HELLO to each other. The LINK-HELLO adopts the standard EIPS packet
and uses the SYSTEM_MAC_ADDR field and the front two fields in the
packet to detect. The destination MAC address in the standard EIPS packet
is 0001.7A4F.4AB4, but can automatically learn according to the peer
destination MAC address. ERP_TYPE is LINK-HELLO(14).
SYSTEM_MAC_ADDR records the MAC address of the peer port and the
front two fields record the port number of the peer port. Meanwhile, adopt
the front fields of the reserved field in the packet record the port number
of the sending port. When the eight bytes about the peer information are
all 0.
As shown in Figure 15-16, if one node can receive the LINK-HELLO packet
of the neighbor and SYS_MAC_ADDR in the packet is the MAC address of
the local port and the port number is the number of the local port, it is
regarded that the line is bi-directional.
Figure 15-19 EIPS node sends LINK-HELLO to detect the uni-direction
By default, the period of sending LINK-HELLO is 1s. LINK-HELLO timeout

period is three multiples of the sending period. The sending period can be
configured. When sending LINK-HELLO, the source MAC address in the
packet is the MAC address of the sending port of the sending node;
SYSTEM_MAC_ADDR is the MAC address of the receiving port of the peer
node. As shown in the figure, the node S1 the LINK-HELLO packet whose
source MAC address is the MAC address of the node S1;
SYSTEM_MAC_ADDR is the MAC address of S2. S1 gets the MAC address
of S2 from the LINK-HELLO packet of S2. Only when the MAC address of
the LINK-HELLO received by the node is the peer MAC address,
SYSTEM_MAC_ADDR in the packet is the MAC address of the local port,

and the port number is the local port number, it is regarded that the LINK-
HELLO packet that takes part in the timeout judging is received. When the
nodes does not know the peer MAC address, SYSTEM_MAC_ADDR and the
port number of the LINK-HELLO packet are set as all 0.
If SYSTEM_MAC_ADDR in the LINK-HELLO packet received by one port of

one EIPS node is not the MAC address of the receiving port of the node or
the port number field is not the number of the receiving port, it is
regarded that the uni-directional fault appears. Perform the shutdown
operation on the uni-directional physical port and send the TRAP
information to the gateway. After the physical port is shutdown, EIPS gets
the notification at once.
After the receiving time out, it indicates that one direction or two
directions may be disconnected. If one direction is disconnected, the
neighbor can detect; if two directions are disconnected, the EIPS master
node can detect. Therefore, when the receiving times out, you just need to
clear up the recorded MAC address of the neighbor and do not need more
operations.
If the port belongs to multiple EIPS nodes, choose the control VLAN of one
node as the VLAN field in the LINK-HELLO packet at random when forming
the LINK-HELLO packet. For the selection convenience, select the control
VLAN of the EIPS node with the minimum node number.
Reliability Realization
In the ring topology network, if the control platform of the master node
becomes abnormal and breaks down, but the data platform is complete, it
makes the data platform become ring. To avoid the problem, back up the
master node to realize the EIPS reliability. Therefore, the concept of
backup master node is put forward. The main function of the backup
master node is to serve as the master node when the control platform of
the master node breaks down. When it is detected that the topology is
complete, block the assistant port to avoid the ring and inform other
nodes to refresh FDB.
The backup master node can only be the transmission node. The edge
node and assistant edge node, as well as the transmission node that is
connected to the edge node or assistant edge node cannot serve as the
backup master node. To avoid the influence for the link caused by blocking
the assistant port of the backup master node and the assistant port of the
master node, the assistant port of the configured backup master node
must be direct-connected to the assistant port of the master node, as
shown in the following figure.

Figure 15-20 Assistant port of backup master node is direct-connected to

assistant port of master node
Set the Hello packet and LINKDOWN packet on the backup master node to
go to CPU and be forwarded. When the backup master node cannot
receive the HELLO packet of the master node, send the HELLO1 packet
(the format of the HELLO1 packet is the same as that of the HELLO packet;
only the destination MAC address is different; the destination MAC address
of the HELLO1 packet is 0001.7A4F.4AB5) that detects the integrality of
the data platform of the master node and the complete status of the ring.
If the assistant port can receive the HELLO1 packet, it indicates that the
loop is complete and the data platform of the master node is complete,
but the control platform breaks down. Here, the assistant port should be
blocked, and send the COMP-FLUSH-FDB packet from the main port. Set
the working status of the backup master node as the master node, as
shown in Figure 15-21.

Figure 15-21 The control platform of master node breaks down
When the backup master node works as the master node, its working
theory is basically the same as the master node. When the LINKDOWN
packet on the ring is received, you need to enable the assistant port and
send the COMM-FLUSH-FDB packet to the ring via two ports. If the HELLO
packet of the master node is received and the assistant port is in the
BLOCK state, you need to enable the assistant port and switch the working
status to the transmission node status.

ULFD Technology
ULFD Protocol and Application

Unidirectional Link Fault Detection protocol (ULFD) is a L2 protocol. It can
be used by the devices connected with the fibers or twisted-pairs so that
they can monitor the physical configuration of the cables and check
whether uni-directional link exists. When discovering a uni-directional link,
UDLD disables the interface.
The uni-directional link results in the a series of problems, including the

spanning tree topology ring.
This section describes the theory and realization of the ULFD protocol.
Related Terms of ULFD Protocol

Uni-directional link: Sometimes, there is one special phenomenonuni-
directional link, that is to say, the local device can receive the packets sent
by the peer device, but the peer device cannot receive the packets of the
local device. The uni-directional link causes a series of problems, such as
spanning tree topology ring.
Take fiber as an example. The uni-directional link includes two types. One
is that the fibers are cross-connected; the other is that one fiber is not
connected or one fiber is disconnected. As shown in Figure 16-1, the fibers
of the two devices are cross-connected; as shown in Figure 16-2, the
hollow wire means that one fiber is not connected or one fiber is
disconnected. The typical case of Figure 16-2 is that one device is not
connected or disconnected.

The cross connection of fibers

One fiber is not connected or disconnected
Introduction to ULFD Protocol

The ULFD protocol is used for the network uni-directional detection.
The ULFD protocol has the following features. ULFD is the link layer
protocol and it cooperates with the physical layer protocol to monitor the
link status of the devices. The auto negotiation mechanism of the physical
layer is used to detect the physical signals and faults; ULFD is used to
identify the peer devices and uni-directional link and close the un-
reachable ports. After enabling the auto negotiation mechanism and ULFD,
they cooperate to work and can detect and close the physical and logical
uni-directional connection and prevent other protocols (such as STP
protocol) from become invalid. If the links of the two ends can work
separately at the physical layer, ULFD detects whether the links are
connected correctly and whether the two ends can exchange packets. The
detection cannot be realized via the auto negotiation mechanism.
Protoco l Packet De finit ion

The ULFD protocol runs at the LLC layer. It uses one special broadcast
address as the target address and adopts the standard SNAP format.
The destination MAC address is Destination MAC address 01-00-0C-CC-CC-

CC.
Source MAC address is the L2 MAC address of the device.
ULFD SNAP format:
LLC value: 0xAAAA03
Org ID : 0x00017a
HDLC protocol type: 0x0111
ULFD PDU Field Definition:
Ver field ( 3bits):
0x01:ULFD PDU Version Number, the current ULFD protocol version

number
Opcode field (5 bits):

Packet Type Value Description

Keepalive (probe) 0x01 Used to generate the packet for discovering the
neighbor and keeping the neighbor alive; used when
maintaining the neighbor table and requesting re-
synchronizing the neighbor.
Detection (echo) 0x02 The packet used for the unidirectional detection; when
the new neighbor is detected (or the old neighbor is re-
synchronized), adopt the packet for the uni-directional
detection.
Clear (flush) 0x03 Notify the neighbor when ULFD is disabled on one
device or port; it is used to synchronize the neighbor
information rapidly; after the neighbor receives the
message, clear up the corresponding buffer information
at once.
Flags(8 bits):
1 Byte
0 1 2 3 4 5 6 7
Recommended timeout flag(RT) ReSynch flag(RSY) Reserved
The RSY flag is used to indicate that the packet is normal probe keepalive
packet or the probe packet that requests re-synchronization and detection.
When the RSY flag is 1, the receiving end needs to return the echo packet.
ULFD PDU Encapsulation Format:
Byte0 Byte1 Byte2 Byte3

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Ver Opcode Flags checksum
Device-ID TLV
Port-ID TLV
Device Name TLV
Echo TLV
Message Interval TLV
Timeout Interval TLV
Sequence Number TLV
TLV format:

Byte0 Byte1 Byte2 Byte3

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
TLV_type TLV_Len
TLV_Value

If the TLV type is in the TLV type range defined by ULFD, the TLV is
regarded as invalid.
Protoco l Action
The work of the ULFD protocol contains the following aspects:
Neighbor discovery: The port sends its own information and the re-
synchronization request via the probe packet, while the peer port realizes
the neighbor discovery according to the content information of the probe
packet after receiving the probe packet. After the port receives one probe
packet, judge whether the sending port is in the neighbor table. If no, it
indicates that it is the new neighbor, so add it to the neighbor table and
return the echo packet for uni-directional detection; if the sending port is
in the neighbor table, but the probe packet is set with the RSY flag, it
indicates that the neighbor requests re-synchronization and send the echo
packet to the port for the uni-directional detection; if the sending port is in
the neighbor table and is not set with the RSY flag, the probe packet is
one common keepalive packet and update the information of the neighbor.
Neighbor aging: After the neighbor is added to the neighbor table, the
port sets one aging time Tlf according to the Message Interval value in the
received probe packet. If the port does not receive the probe keepalive
packer sent by the neighbor after reaching the time Tlf, the neighbor is
aged and deleted from the neighbor table.
Uni-directional detection: The port performs the uni-directional

detection only when the neighbor table changes. The detection initiator
first initiates one probe packet with the synchronization request (RSY flag),
requests that the peer returns the echo packet after receiving the packet
and adds its own neighbor table information to the echo-tlv field of the
echo packet. If the initiator receives the echo packet sent by the peer,
check whether the contents of the echo-tlv field is correct, including the
packet format and whether the local port and device ID information is
contained. If the format of the received echo packet is correct and the
echo-tlv filed contains the local port and the device ID information, it is
regarded that the port is in the bi-directional status; if the contents of the
received echo packet are not correct, it is regarded that the port is in the
uni-directional status; if the echo packet is not received, the processing
method depends on the ULFD detection mode.

Uni-directional processing: After the port status is confirmed as the

uni-directional, the neighbor table of the port is cleared up; send the
FLUSH packet to inform the neighbor that the port information to clear up
the port information, and then shut down the port. To re-enable the port,
the user needs to execute the Reset operation manually or configure other
auto recovery mechanism.
Keepalive mechanism: After the port status is stable, the port

periodically sends probe keepalive packet, informing other ports of its
status. The peer uses the keepalive packet to refresh the status of the
neighbor. If the probe keepalive packet is not received within the
keepalive period, the port is deleted from the neighbor table. The probe
keepalive packet carries all neighbor information of the port. The sending
period of the probe keepalive packet Tmsg can be set via the global
command.
Two Kinds o f Detec tion Modes

The ULFD mode has two kinds of working modes, that is, normal mode
and aggressive mode. In the two modes, the methods of judging the uni-
directional link are different.
In normal state, if the port does not the packet of the peer end in the
keepalive stage, the port is in the un-confirmed status; if the port does not
receive the echo packet of the peer end or the received echo packet does
not have the local port information in the uni-directional detection stage, it
is regarded that the local port and the peer link are in the uni-directional
state. The Normal mode is often used to check the uni-directional status
caused by the crossover connection.
In the aggressive mode, if the port does not receive the packet of the peer
end and as a result, all neighbor are aged in the keepalive stage, and no
any neighbor is learned after the process of Re-establishing the link, it is
regarded that the local port is un-reachable (not the uni-directional link on
strict meaning), and shut down the local port; if the port does not receive
the echo packet of the peer end or the received echo packet does not have
the local port information in the uni-directional detection stage, it is
regarded that the local port and the peer link are in the uni-directional
state. The Aggressive mode is used to check the uni-directional connection
caused by the fiber crossover connection or disconnection.

Typical Application
When using ULFD, ensure that the corresponding ports are configured with
the ULFD function and work in the same detection mode; the ULFD global
setting of the device is enabled.
In this section, configure one basic ULFD protocol for reference.
The network topology is as follows:
ULFD configuration instance
Illustration
Port 0/0 of the local switch A is connected to Port 0/1 of the peer switch B
via the fiber. Now, configure the ULFD function on the connection to detect
the connection status of the link.
The configuration of Switch A:
Command Description
SwitchA(config)# port 0/0 Enter the port configuration mode
SwitchA (config-port-0/0) #ulfd port Configure the ULFD work node aggressive on
aggressive port 0/0
SwitchA (config-port-0/0) #exit Exit the port configuration mode
SwitchA (config)#ulfd message time 16 Configure the interval of sending packets as
16s
SwitchA(config)#ulfd enable Enable ULFD globally
SwitchA (config)#exit Complete the ULFD configuration
The configuration of Switch B:
Command Description
SwitchA(config)# port 0/1 Enter the port configuration mode
SwitchA (config-port-0/1) #ulfd port Configure the ULFD work node aggressive on
aggressive port 0/1
SwitchA (config-port-0/1) #exit Exit the port configuration mode

SwitchA(config)#ulfd enable Configure the interval of sending packets as

16s
SwitchA (config)#exit Enable ULFD globally

OAM Technology
The chapter describes the MAN OAM technology and the applications. OAM
is short for Operation, Administration and Maintenance.
Main contents:
CFM protocol and its application
E-LMI protocol and its application
Ethernet OAM protocol and its application
CFM Protocol and Application

This section describes the basic theories of the Ethernet connectivity fault
management (CFM).
Main contents:
Terms of Ethernet CFM
Introduction to Ethernet CFM protocol
Terms of Ethernet CFM

CFM: Connectivity Fault Management;
OAM: Operation, Administration and Maintenance;
Maintenance Domain (MD): It is a part of the network covered by the

connectivity fault management. Its limit is defined by a series of
maintenance points (MP) configured on the ports. The maintenance

domain name is used to identify the MD. According to multi-domain OAM

network model of 802.1ag, MD has hierarchical levels. The high level can
include the low level, but they cannot intersect, that is, the range covered
by high level is larger than that covered by the low level. The integers of
0-7 are used to identify different levels. The higher the level, the bigger
the number.
Maintenance Association (MA): It is a set in MD, including some MPs. MA is

identified by MD name + short MA name. MA serves one VLAN, in which
the packets sent by the MPs in MA are forwarded and the packets sent by
other MPs in the MA are received at the same time. Therefore, MA is also
called Service Instance (SI).
MP (Maintenance point): It is one Maintenance Association End Point (MEP)

or Maintenance Association Intermediate Point (MIP). It is configured on
the port and belongs to one MA. On one port, each MA can be configured
with only one MP.
Maintenance Association End Point (MEP): It can receive and send any
CFM packet. Each MEP is identified by an integer, which is called MEP ID.
MEP is configured on the port and decides the MD range. The MA and MD
to which the MEP belongs decide the VLAN attribute and level attribute of
the packet sent by MEP. According to the location of MEP in MA, the MEP
direction includes inward and outward. If the packet in MA is received from
the port on which the MEP is configured, the MEP direction is outward.
Similarly, the outward MEP can only send packets to the network via the
port on which the MEP is configured. Contrariwise, if the packet in MA is
received from other port, the MEP direction is inward. The inward MEP
cannot send packets to the network via the port on which the MEP is
configured.
Maintenance Association Intermediate Point (MIP): It can process and

respond to some CFM packets (such as LT packet or the packet whose
destination is the LB which is at the same layer as itself), but cannot send
packets initiatively. The MA and MD to which the MIP belongs decide the
VLAN attribute and the MD level of the received packet.
Introduction to Ethernet CFM Protocol

The IEEE 802.1ag protocol calls the Ethernet OAM function as connectivity
fault management (CFM), which is a supplement of the 802.1Q protocol. It
is the end-to-end Ethernet OAM function based on VLAN. It defines the
protocol and protocol entities for checking, confirming and locating the
connectivity fault in the VLAN-based network.

This section describes some basic concepts and functions of Ethernet CFM.
M aintenance Do main
The maintenance domain is a part of the network covered by the
connectivity fault management. Its limit is defined by a series of
maintenance points (MP) configured on the ports, including MEP and MIP,
as shown in figure 17-1.
Maintenance domain
The carrier-class Ethernet needs to provide different management scopes

and contents for different organizations. Usually, there are three kinds of
organizations that refer to carrier-class Ethernet services, including
customers, service providers, and network carriers. Users purchase
Ethernet services from service providers; service providers can use their
own network or other carriers network to provide end-to-end Ethernet
services. In IEEE 802.1ag, carrier-class Ethernet is divided to one multi-
domain OAM network model, including three maintenance grades, that is,
customers, service providers, and carriers. They correspond to different
management domains. The service providers are responsible for end-to-
end service management and the carriers provide service transmission.
Figure 17-2 shows three maintenance domains, that is, customers, service
providers, and carriers, as well as the hierarchical structure of the
maintenance domains. CE is the edge device of the customer (Customer
Edge); PE is the edge device of the service provider (Provider Edge).

Hierarchical structure of Ethernet CFM maintenance domain
M aintenance Asso cia tion

Maintenance Association (MA): It is a set in MD, including some MPs. MA is
identified by MD name + short MA name. MA serves one VLAN, in which
the packets sent by the MPs in MA are forwarded and the packets sent by
other MPs in the MA are received at the same time. Therefore, MA is also
called Service Instance (SI).
M aintenance Poin t
One maintenance point is one function point configured on the port, which
takes part in the CFM protocol operation. According to the different
locations of the maintenance points in the maintenance domain, the
maintenance point is divided to Edge Maintenance Point (MEP) and
Maintenance Intermediate Point (MIP).
MEP defines the limit of one maintenance domain. Meanwhile, these

maintenance points can limit the CFM packets in the range of the
maintenance domain according to the level of the maintenance domain.
MEP can send and receive any CFM packet.
Each MEP is identified by an integer, which is called MEP ID. MEP is

configured on the port and decides the MD range. The MA and MD to which
the MEP belongs decide the VLAN attribute and level attribute of the
packet sent by MEP. According to the location of MEP in MA, the MEP
direction includes inward and outward. If the packet in MA is received from
the port on which the MEP is configured, the MEP direction is outward.
Similarly, the outward MEP can only send packets to the network via the

port on which the MEP is configured. Contrariwise, if the packet in MA is

received from other port, the MEP direction is inward. The inward MEP
cannot send packets to the network via the port on which the MEP is
configured.
MIP can process and respond to some CFM packets (such as LT packet or
the packet whose destination is the LB which is at the same layer as itself),
but cannot send packets initiatively.
Figure 17-3 shows the case that MEP and MIP are on the devices of the
customers, service providers, and carriers.
Hierarchical management of MD and locations of MEP and MIP
802.1ag supports hierarchical management and the management level is

identified by the level of the maintenance domain. The low levels can be
nested. The high-level maintenance domain can include the low-level
maintenance domain, but the low-level maintenance domain cannot
include the high-level maintenance domain. All CFM packets are initiated
by MEP. MIP does not send any CFM packet actively, but responds to LT or
the LB packet at the same layer as itself.
Figure 17-3 shows the hierarchical management of the maintenance

domain. The bigger the ID, the higher the level, the wider the control
scope.
When the maintenance domain is used to locate the fault, you can first use
LT or LB to determine the fault interval on Level 5. If the fault is between
two MIPs on Level 5, continue to use LT or LB to locate the fault on Level 3.
The packets sent or received by each MP belong to its MA, have the
features of the VLAN and layer, and do not interfere with each other. The
rest is deduced by analogy until the minimum fault area is found.

Similarly, MEP sends CCM, and remote MEP receives and processes it.
When the MD and MA configured by remote MEP are inconsistent with
those configured by the MEP that sends CCM, you can find out the
configuration error in the network.
C onnecti vi t y Check
The continuity check function is the most basic function in 802.1ag, used
to check the connection failure of the Ethernet flow between MPs. The
connection failure may be caused by the fault or configuration error. The
connectivity check is suitable for checking the unidirectional connection
failure. Figure 17-4 shows the example chart of one CC function. The
maintenance domain (Provider Domain) contains two Operator Doamians
(Operator A and Operator B).
Connectivity checking
When the network connection is normal, each MEP periodically sends

multicast CCM (Continuity Check Message). The destination address is the
multicast address, which is determined by the level of the maintenance
domain where the MEP is located, as shown in Table 1-1.
After the MEP receives the CCM sent by the equivalent MEP in the same
maintenance domain and analyzes it correctly, the information of the peer
MEP is saved in the CCM database. The information includes MEP ID, MAC
address of MEP, remote error ID (RDI) of MEP, Sender ID of MEP, and so
on.
The local MEP compares the MEP ID of the received CCM to ensure that
there is no repeated MEP ID in the local configuration. If there is repeated

MEP ID, it indicates that the network configuration is wrong or there is

loop.
The timeout of CCM is the 3.5 multiples of the sending interval, that is, the
connection between the local MEP and the remote MEP is regarded as
wrong when three successive CCMs are lost.
Multicast address of connectivity check packet (CCM)
01-80-C2-00-00-3y
MD Level of CCM Four address bits y
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
CCM can reach any MEP in one MA. When other MEPs receive the CCMs
from one MA, first get the packet information and save it in the CCM
database, and then check whether the CCMs of all other MEPs in one MA
are received within the specified time.
Suppose MEP sends one CCM. When the CCM reaches the MIP in the MA,
the MIP continues to forward it; when the CCM reaches the destination
MEP of the same MA, the MEP checks whether the Level is the same as
CCM. When the timer does not time out, process the packet, re-set the
timer, and wait for receiving the next CCM sent by the remote MEP.
When receiving the CCMs sent by the other MEPs in the same MA, MEP
periodically multicasts the CCMs outward. The local MEP is responsible for
checking whether the MEP in the local CCM database times out. If the MEP
times out, it indicates that the connection with the remote MEP fails;
report the error to the network administrator.
When the sending interval of the received CCM is inconsistent with the
configured value in MA, it triggers error notification (FNG alarm). When

the MA IDs of the received CCMs are inconsistent, it indicates that there is
cross-connection error, which also triggers FNG alarm.
Loopback Check
Loopback (LB) check function is used to check the connection status with
the remote device. It is suitable for checking the bidirectional connectivity
failure. The LB function is shown in Figure 17-5.
Loopback check
Execute the command to send Loop Back Message (LBM) actively on MEP
via the network management system. The target can be any MP in MA. For
the other remote MEP in MA, the local MEP can get its MAC address via
CCM; for MIP, the local MEP gets its MAC address by sending Link Trace
Message (LTM).
Each LBM has a unique serial number. After sending LBM, the serial
number of the packet is reserved for at least 5 seconds, used to
distinguish whether the received Loop Back Reply (LBR) is the correct
reply packet of the sent LBM.
When CC finds the network connectivity error, the network administrator

uses the command to trigger sending LBM to perform error track. When
MP receives LBM, first check the validity of the packet (for example, the
source address must be one unicast address), and then reply one LBR to
the source MEP. Exchange the source address and destination address of

LBR with those of LBM; the packet type is changed to LBR; the contents
are the same as those of LBM.
When MEP receives LBR, it checks whether the serial number is consistent
with that of the latest LBM. If inconsistent, it indicates that there is error;
if MIP receives one LBR, it is regarded as one error packet and drop it.
Link Trace Funct ion

Link Trace (LT) function is used to search the neighboring relation and
locate the fault. The LT function is as show in Figure 17-6.
Link trace function
LTM is the multicast packet. The multicast address is as shown in table 1-2.
Multicast address of link trace packet (LTM)
01-80-C2-00-00-3y
MD Level of LTM Four address bits y
7 F
6 E
5 D
4 C

3 B
2 A
1 9
0 8
TLV of LTM contains one original address (Original MAC) and one target
address (Target MAC). The original address is the address of the port
where the MEP that sends LTM is located; the target address is the MAC
address of the target MEP to which the LTM is sent. Their difference is the
destination address and source address of the Ethernet data frame. There
is a unique serial number in the LTM packet, which is added with one
every time sending.
Each MP with the same level to the target address sends one LTR packet
to the original address. The packet is one unicast packet, whose source
address is equal to the target address of LTM and the destination address
is equal to the original address in TLV of LTM.
When the FNG alarm appears, send the LTM packet to track and locate the
error link. MEP sends one LTM and MIP decides whether to receive the LTM
packet according to the level of the maintenance domain. When receiving
the packet, MIP first checks whether the TTL value of LTM is 0. If yes, drop
the packet. Otherwise, subtract one from TTL and then search for the
egress port to forward the LTM packet according to the target address and
VLAN ID of LTM in the FDB table. If the egress port is not found in the FDB
table, drop the LTM packet. When the LTM packet is forwarded, the other
information except for the source MAC address and TTL value does not
change. The MIP on the port replies one LTR packet to the source MEP
after one random delay. When the network fails, LTM can only reach the
MP before the faulty point. The MPs between the faulty point and the
target MEP do not reply LTR. In this way, the faulty area can be found.
C FM Packe t
The CFM packet type is 0x8902. The public head of the CFM packet is as
shown in Figure 17-7.

Public head of CFM packet
The CCM packet is as shown in Figure 17-8.
CCM packet
The LBM and LBR packets are as shown in Figure 17-9.
LBM and LTM packets
The LTM packet is as shown in Figure 17-10.

LTM packet
The LTR packet is as shown in Figure 17-11.
LTM packet
E-LMI Protocol and Application

Main contents:
E-LMI protocol and application
Definition of E-LMI protocol
Relation of E-LMI protocol and 802.1ag
UNI-N of E-LMI
UNI-C of E-LMI
Typical application

Terms of E-LMI Protocol

EVC (Ethernet Virtual Connection): MEF defines EVC as port-class point-to-
point or multipoint-to-multipoint Ethernet L2 circuit. EVC is the association
of two or more UNIs. The EVC status information can be used by CE as the
routing basis of the access service providers network.
UNI (User Network Interface): It is he Ethernet physical interface between

the edge device of the service provider (PE) and the edge device of the
user (CE). It comprises UNI-N (on the PE device) and UNI-C (on the CE
device). The E-LMI protocol runs on one UNI and its limits are UNI-N and
UNI-C.
CE (Customer Edge): the edge device of the customer;
PE (Provider Edge): the edge device of the service provider;
Introduction to E-LMI Protocol

Referring to the local management interface standards of frame relay (FR-
LMI), MEF defines the Ethernet Local Management Interface (E-LMI). E-
LMI is one OAM protocol applied on the user network interface (UNI) and
works between the edge device (CE) of the customer and the edge device
of the service provider (PE). E-LMI makes the service provider can
automatically configure CE according to the purchased services. With E-
LMI, CE can automatically receive the mapping information from specific
Ethernet service instance (such as VLAN 100) to EVC and the bandwidth
and QoS settings. The auto configuration function of the CE device reduces
not only the service construction work, but also the negotiation work
between the service provider and the enterprise user. Therefore, the user
does not need to know the configuration of the CE device, which is
configured and managed by the service provider, reducing the risks of
wrong manual operations. Besides, E-LMI provides the EVC status
information for the CE device. Once the EVC fault is found (such as
802.1ag), the edge device of the service provider can inform the CE device
of the fault information so that the CE device can do the corresponding
adjustment in time (for example, switch the access route).
Definition of E-LMI Protocol

E-LMI protocol runs on one UNI; the protocol edges are UNI-CE and UNI-
PE, as follows.

User Network User Network

Interface Interface
Metro Ethernet Network

E-LMI E-LMI
CE PE PE CE
Typical topology of E-LMI protocol running on one UNI
E -L MI Protoco l Act ion

The actions of the E-LMI protocol include CE polling and PE informing.
CE Polling:
The UNI-C device transmits the E-LMI Check message (E-LIMI Check
STATUS ENQUIRY) to the UNI-N device for active polling; the polling
interval is T391s (by default, it is 10s). Every after N391 times (360 times
by default) of active polling, UNI-C transmits one complete status request
message (FULL STATUS ENQUIRY). UNI-N transmits the status and
configuration information of UNI and EVC to UNI-C as response. UNI-N
enables the T392 timer to wait for the request message of UNI-C. The
configured value of T392 must be larger than T391.
After receiving the correct response of Full Status Enquiry, CE modifies

and updates the status and configuration information of EVC and UNI in
the local database according to the information carried in the response, so
as to ensure that the EVC configuration and status information of CE is
synchronous with that of PE.
PE Informing:
If finding that the EVC status on PE changes, PE immediately sends the

Single Evc Asynchronous Status message to inform the CE. CE modifies
and updates the EVC status information in the local database according to
the information carried in the response, so as to ensure that the EVC
status information of CE is synchronous with that of PE.
E -L MI Message Type
MEP 16 defines two kinds of message types to realize the E-LMI protocol
interacting, including STATUS ENQUIRY message and STATUS message.

The content type (Report Type) transmitted by the E-LMI packet is divided
to the following four types:
E-LMI Check: the checking packet during normal polling;
Full Status: full-status packet;
Full Status Continues: Full-status follow-up packet;
Single EVC Asynchronous Status: active EVC status informing packet; the
packet can only be sent by UNI-N to inform CE of the EVC status change
information.
STATUS ENQUIRY Message:
The STATUS ENQUIRY message is sent by UNI-C to ask the UNI-N for the
configuration and status information of EVC and UNI. After receiving one
valid STATUS ENQUIRY message, UNI-N should send one STATUS
message to reply the request message.
The structure of the STATUS ENQUIRY message:
Message type: STATUS ENQUIRY

Direction: UNI-C to UNI-N
Information element Type
Protocol Version Mandatory
Message type Mandatory
Report Type Mandatory
Sequence Numbers Mandatory
Data Instance Mandatory
Structure of STATUS ENQUIRY message
STATUS Message:
The STATUS message is sent by UNI-N to reply the STATUS ENQUIRY

message or actively inform UNI-C of the EVC status change information.
The Report Types of the messages are different, so the contents of the
STATUS massages are different. The content relation is as follows.
STATUS message

Report Type Full E-LMI Check Single EVC Full

Information Element Status Asynchronous Status
Value Status Continued
Information Element
Sequence Numbers X X X
Data Instance X X X
UNI Status X
EVC Status X X X
CE-VLAN ID/EVC Map X X
E -L MI Message Fra me Encapsula tion For mat

Destination Source E-LMI E-LMI PDU CRC
Address Address Ethertype (message)
6 Octets 6 Octets 2 Octets 46 1500 Octets 4
(Data + Pad) Octets
E-LMI message encapsulation frame format
In MEF-16, the destination address of the E-LMI message is defined as 01-

80-c2-00-00-07; E-LMI EtherType is defined as 0X88EE. The PDU contents
comprise the series of TLV. For details, refer to MEF-16 standards.
Relation between E-LMI Protocol and

802.1a
The E-MLI protocol runs on the UNI connection from PE to CE and gets the
EVC and UNI configurations and status information from the UNI-N end to
complete the auto configuration function of CE at the UNI-C end. But at
the UNI-N end, the E-LMI module cannot get the EVC status information,
but depends on the CC (Cross Check) function of the 802.1ag protocol
(CFM module) to check the connectivity between UNIs of EVC, so as to
determine the current operation status of EVC.
UNI-N End of E-LMI

The EVC, UNI and CFM configurations need to be configured on UNI-N end.
The defined EVC needs to be applied on UNI. On one UNI, use EVC
Reference ID to identify one EVC. Different EVCs map with CE-VLAN IDs.
The number of the bound EVCs depends on the UNI type.

CFM
Refer to the configurations and technology description of the 802.1ag

protocol.
EVC
EVC needs to be defined on UNI-N. The EVC is divided to point-to-point

and multipoint-to-multipoint types.
Point-to-point EVC comprises only two UNIs; Multipoint-to-mulitpoint EVC

comprises two or more UNIs.
One EVC needs to be bound with the CFM management domain instance.
The connectivity between the UNIs in EVC can be got via CFM
management domain instance.
UNI
UNI has the following three types:
Bundling: Multiple EVCs can be configured on one UNI and one EVC can
map with multiple CE-VLAN IDs;
Service Multiplexing with no Bundling: Multiple EVCs can be

configured on one UNI, but each EVC can map with only one CE-VLAN ID;
All to one Bundling: One UNI can be bound with only one EVC and all
CE-VLAN IDs map to the EVC;
The port of the UNI-N end needs to be configured as the MEP node of one
CFM domain and enable the CC function of CFM. In this way, UNI-N end
can get the connection status between the UNIs of EVC configured on the
PE device via the 802.1ag protocol, so as to get the current operation
status of the EVC.
Enable PE Mode of E-LMI Protocol:
After enabling the PE mode of the E-LMI protocol on the UNI-N, the UNI-N
waits for the request of UNI-C and makes the corresponding response.
When UNI-N finds that the status of the EVC bound to the UNI changes, it
actively sends the EVC status notification message to the PE.

UNI-C of E-LMI
The UNI-C of E-LMI only needs to enable the E-LMI protocol and run in the
CE mode. After being configured as the CE mode, UNI-C periodically sends
the E-LMI Check request to UNI-N and initiates one Full Status request to
ask UNI-N for the EVC and UNI configuration and status information when
finding that the Data Instance values of EVC and UNI do not match with
each other via the E-LMI Check message. Besides, the local UNI-C
information is updated.
Typical Applications
The following is one typical application of E-LMI.
Topology of typical E-LMI application
In the above figure, one EVC EVC_Provider is defined to show the

network connection of the service provider. It comprises PE1, PE2, and
PE3. The blue ellipse means one CFM management domain- Service
Provider Domain, whose level is 4. The three edge devices are configured
as three MEP nodes of the domain. The CFM management domain checks
the connectivity between the three MEPs to determine the EVC_Provider
operation status.
Enable the E-LMI protocol on the UNI connection UNI1 between CE1 and
PE1. CE1 gets the UNI1 configuration information, and EVC_Provider
configuration and status information from PE1 via the E-LMI protocol, so
as to complete the auto configuration function of CE1.

Ethernet OAM Protocol and

Application
Main contents:
Ethernet OAM protocol and related terms
Introduction to Ethernet OAM protocol
Related Terms of Ethernet OAM Protocol

OAM: Operations Administration and Maintenance
Errored symbol: the times of the error symbol on the port
Errored frame: the number of the received error packets
Introduction to Ethernet OAM Protocol

As one L2 protocol, Ethernet OAM is the tool of monitoring and solving
network problems. It can report the network status at the data link layer
so that the network administrator can manage the network more
efficiently. Ethernet OAM is defined in IEEE 802.3ah.
Currently, Ethernet OAM mainly solves the OAM problems of the Ethernet
devices at the last one km, including link performance monitoring, fault
detecting and alarming, loopback test, remote MIB and variable request.
All functions of Ethernet OAM can become valid only after the Ethernet
OAM connection is set up.
The main functions of the Ethernet OAM are as follows:
1. Discovering and setup of Ethernet OAM connection
2. Link monitoring of Ethernet OAM connection
3. Remote fault diagnose of Ethernet OAM connection
4. Remote loopback of Ethernet OAM connection
5. MIB variable request of Ethernet OAM connection

Locat ion of Protoco l in Syst em
Location of Ethernet OAM in the system
As shown in the above figure, the Ethernet OAM is located between MAC
Control layer and the LLC layer.
Protoco l Structure
Structure of Ethernet OAM protocol
As shown in the above figure, Ethernet OAM comprises the OAM sublayer
and OAM client.
The OAM sublayer is responsible for the flow dividing and remote loopback
policy processing of the sent and received packets on the interface; OAM
client is responsible for the connection maintenance and remote loopback
control of the protocol.

Structure of OAM sublayer
As shown in the above figure, the OAM sublayer comprises Multiplexer,

Parser, and Control.
Multiplexer is responsible for the OAM processing at the sending direction

of all packets (including service data packets) on the interface. There are
two modes, that is, Forward mode (send all packets normally) and Discard
mode (discard all packets of non-Ethernet OAM protocol).
Parser is responsible for the OAM processing at the receiving direction of

all packets (including service data packets) on the interface. There are
three modes, that is, Forward mode (receive all packets normally),
Loopback mode (loopback the non-Ethernet OAM protocol packets), and
Discard mode (discard all non-Ethernet OAM protocol packets).
Control is responsible for sending and receiving the Ethernet OAM protocol
packets.

Basic For mat of O AM Protoco l Packet
Basic format of Ethernet OAM packet
As shown in the above figure, the destination address of the Ethernet OAM
packet is 01-80-C2-00-00-02; the OAM packet belongs to the low-speed
protocol (the protocol number is 88-09); the subtype is 0x03;
Flags identifies the status of the Ethernet OAM;
Code identifies the type of the Ethernet OAM packet;
Data/Pad is the data content of the Ethernet OAM packet, which varies
with Code;
Information OAMPDU:
Information OAMPDU packet is used to send the status information of the

OAM entity (including local information, remote information and
customized information) to the remote OAM entity, keeping the OAM
connection. The packet format is as follows:
Information OAMPDU packet format

Event Notification OAMPDU:
Event Notification OAMPDU packet is used for the link monitoring,

alarming the link fault of the remote OAM entity. The packet format is as
follows.
Event Notification OAMPDU packet format
Variable Request OAMPDU
Variable Request OAMPDU packet is the variable request packet, which is

sent when requesting the MIB variable. The packet format is as follows:
Variable Request OAMPDU packet format
Variable Response OAMPDU
Variable Response OAMPDU packet is used to respond the variable request,

which is sent when responding the MIB variable request. The packet
format is as follows:

Variable Response OAMPDU packet format
Loopback Control OAMPDU
Loopback Control OAMPDU packet is used for remote loopback control. The
device can select whether to use the packet. To realize the loopback
control, the local DTE sends the loopback control command to the remote
DTE. If the loopback control function of the remote DTE is enabled, the
sent packet is returned to the sending party. The packet format is as
follows.
Loopback Control OAMPDU packet format
D isco ver y and Setup o f O AM C onnection

The OAM connection is set up during OAM Discovery. When setting up the
OAM connection, the connected devices can exchange their OAM
configuration information and announce the OAM capabilities supported by
the local node. The other OAM functions can be performed only after the
OAM connection is set up.
Active Mode and Passive Mode

The device can select Active mode or Passive mode to set up the OAM
connection. The DTE (Data Terminating Entity) processing capabilities in
active mode and passive mode are as follows.
Comparison of DTE processing capabilities in active mode and passive

mode
Processing Capability DTE in Active Mode DTE in Passive

Mode
Initiate OAM Discovery Yes No
Respond OAM Discovery Yes Yes
Need to send Information OAMPDUs Yes Yes
Allow sending Event Notification OAMPDUs Yes Yes
Allow sending Variable Request OAMPDUs Yes No
Allow sending Variable Response Yes, but the peer DTE
OAMPDUs also needs to be in the Yes
active mode.
Allow sending Loopback Control OAMPDUs Yes, but the peer DTE
also needs to be in the No
active mode.
Respond Loopback Control OAMPDUs Yes Yes
Allow sending Organization Specific
Yes Yes
OAMPDUs
Status Transferring and Triggering Event of Connection

Status transferring of connection
The above figure shows the status transferring of the Ethernet OAM
connection. Besides the status transferring described in the above figure,
there are several special status transferring:
1. When the connected timeout timer times out, all status return to
Active or Passive;
2. When the port is down or the OAM function is shut down, all status
return to Fault;
Transferred Status of OAM Connection
Transferred status of connection
Status Description
Fault Ethernet OAM does not begin running.
Active Active status, actively sending out the information OAMPDU
packet that contains Local information TLV periodically to
discover the connection.
Passive Passive status, passively waiting for the the information
OAMPDU packet that contains Local information TLV to accept

the connection
Discovered Discovered connection status, periodically sending out the
information OAMPDU packet that contains Local information
TLV and Remote information TLV to negotiate the connection
and enable the connection timeout timer
Local-stable The connection status that the local passes the attribute
matching, periodically sending out the information OAMPDU
packet that contains Local information TLV and Remote
information TLV to negotiate the connection and enable the
connection timeout timer
Up The setup status of the connection, periodically sending out
information OAMPDU packet that contains Local information
TLV and Remote information TLV to keep alive the connection
and enable the connection timeout timer
Event Triggering OAM Connection Status Transferring
Events triggering OAM connection status transferring
Event Description
Ethernet OAM port UP The Ethernet OAM port becomes up
Ethernet OAM port DOWN The Ethernet OAM port becomes down, including port down
and Ethernet OAM function shutdown
Receive the information The information OAMPDU packet is received.
OAMPDU packet
Local attribute matching According to information OAMPDU, match the local attribute
passed and the matching is passed
Local attribute matching not According to information OAMPDU, match the local attribute
passed and the matching is not passed
Remote attribute matching According to the flags digit of the information OAMPDU
passed packet, judge that the remote attribute matching is passed
Remote attribute matching According to the flags digit of the information OAMPDU
not passed packet, judge that the remote attribute matching is not
passed
Connection times out The connection is invalid and the timer times out
Serious Link Eve nt of O AM Connect ion

When there are serious link events on the link, set the related link status
on the Flags field of the Ethernet oAM packet header and inform the
connected peer end via the Event Notification OAMPDU packet.

The serious link event types of the Ethernet OAM connection and the
definitions are as follows:
Serious link event of Ethernet OAM connection
Event Definition
Link fault Hardware PHY finds the link fault at the receiving
direction;
Dying gasp The un-recoverable fault event happens to the local. For
example, Ethernet OAM is down.
critical-event Un-predictable serious event happens (currently, there is
no definition)
Link Mon itoring o f O AM Connect ion

Ethernet OAM can monitor and check the error signals and error frames on
the link periodically and execute the specified operation when the error
number exceeds the specified threshold (such as shut down the port) and
inform the connected peer end via the Event Notification OAMPDU packet.
The link monitoring types and the definitions of Ethernet OAM connection
are as follows:
The link monitoring types and the definitions of Ethernet OAM connection
Link monitoring event Definition

Errored Symbol Period The number of error signals exceeds the defined threshold
during the unit signal number period;
Errored Frame The number of error frames exceeds the defined threshold
during the unit time period;
Errored Frame Period The number of error frames exceeds the defined threshold
during the unit frame number period;
Errored Frame Seconds Summary The number of error frame seconds exceeds the defined
threshold during the unit time period;
R emo te Loopback o f O AM Connect ion

After the OAM connection is set up, the Loopback Control OAMPDU packet
can be sent to control the peer end to enter the remote loopback test
mode. During the remote loopback test, the packets sent by the local are

looped back by the peer end, so as to test the parameters of the link, such
as packet loss rate and delay.
Remote Loopback test mode only influences the non-Ethernet OAM

protocol packets. The Ethernet OAM protocol packets are still sent and
received normally.
In the remote loopback test mode, the processing of the OAM sublayer is
as follows:
Port status in remote loopback mode
Port status Multiplexer Parser Mode Description

Mode
Master (initiating Forward Discard When receiving the information
the loopback) OAMPDU packet that indicates
that the peer end is in the
loopback state, enter into the
mode
Slave (looping Discard Loopback When receiving the command
back) of enabling loopback in the
Loopback Control OAMPDU
packet, enter into the mode
In the Remote Loopback test mode, process of the non-Ethernet OAM

protocol packets is as follows:

Process of the non-Ethernet OAM protocol packets in Remote Loopback

test mode
M I B Variab le Request of O AM Connect ion

The local OAM entity can send remote MIB variable request (OAMPDU
packet) to the peer OAM entity to ask for the current MIB variable. The
function can be used to monitor the link status of the remote port in real
time.

EVC Technology
This chapter describes the EVC technology and application.
Main contents:
Related terms
Application description
Typical application
Related Terms
This section describes the related terms of EVC.
EVC (Ethernet Virtual Connection): EVC is put forward by MEF. It is

the virtual connection used to connect two or more UNIs and switch
Ethernet service frames between them.
EVC can be divided to three types according to the connection mode:
1. Point-to-point EVC: It is also called Eline Service, including two types:
EPL: Ethernet private line
EVPL: Ethernet virtual private line
The difference is that there can be multiple EVPLs on one UNI, while there
can be only one EPL on one UNI.
2. Multipoint-to-multipoint EVC, also called ELAN Service
3. Point-to-multipoint EVC: It is one special EVC. We call one side as root

and the other side as leaf. The EVC is formed by one or multiple roots
(usually it is one root) + one or multiple leaves. The main feature is
that the frames from the root node or the leaf node need to be copied
to all leaves, while the frames from the leaf node to the root node only
need to be transmitted to the root node and the frames do not need to
be copied between the leaf nodes. The main usage is IPTV. Currently,

Maipu switch does not support this kind of EVC directly, but can
support indirectly by configuring the port separation and L3 forwarding
features between UNIs.
UNI (User Network Interface): It is the Ethernet physical connection

between the network edge device of the service provider (PE) and the
customer edge device (CE). It is formed by UNI-N (defined on PE device)
and UNI-C (defined on CE device). The E-LMI protocol runs on one UNI
and its edge is UNI-N and UNI-C.
Currently, UNI supports three types of attributes:
Multiplexing with Bundling: One UNI can be configured with multiple

EVCs and each EVC can map with multiple CE-VLAN IDs;
Multiplexing with no Bundling: One UNI can be configured with

multiple EVCs, but each EVC can map with only one CE-VLAN ID;
All to one Bundling: One UNI can be configured with only one EVC; all
CE-VLAN IDs are mapped to the EVC;
The port of the UNI-N end needs to be configured as the MEP node of one
CFM domain and enables the CC (Cross Check) function of CFM. In this
way, UNI-N end can get the connection status between the UNI ends of
EVC configured on the PE device via 802.1ag, so as to get the current
operation status of EVC.
CE (Customer Edge): customer edge device
PE (Provider Edge): edge device of service provider
EFP (Ethernet Service Instance): Ethernet service instance
QINQ, ELMI, and CFM: Refer to the related technical manuals.
Application Description
EVC provides the public attributes and configurations, cooperating with the
modules to realize the service functions. For details, refer to EVC
Configuration Manual. The main attributes of EVC are described as follows:
Realization type of EVC: There can be multiple schemes to realize EVC.

Currently, Maipu switch supports QinQ.
EVC type: There are two types, that is, point-to-point and multipoint-to-
multipoint. Point-to-point means that there are only two UNI ports in one

EVC virtual connection, while multipoint-to-multipoint means that there

are multiple UNI ports in one EVC virtual connection, as follows:
Figure 18.1 point-to-point EVC
Figure 18.2 multipoint-to-multipoint EVC
Local MEP and remote MEP of EVC: MEP is the end point used to
maintain the connection and can send/receive ant CFM packet. Each MEP
uses one integer to identify, called MEP ID.
QINQ type: There are two kinds, including double and mapping. Double
supports the mapping of multiple CEVLANs and one single EVC, while
mapping only supports the mapping of one single CEVLAN and one single
EVC.
QINQ mode: There are two kinds, that is, one and multiple. The one
mode does not need to configure SVLAN and CEVLAN of EVC, adopting the
port default value; multiple has no limitation.
The combination of EVC and the related modules is described as follows:
1. The application combination of EVC and QINQ (for QINQ, refer to

QINQ Technical Manual):

Associate EVC to the local port and run QinQ function on the port to set up
the EVC connection. Bind EVC on the port, get the QinQ information in EVC
according to EVC ID and convert the QinQ information to the port
configuration. The UNI mapping type of the port should match with the
information in the bound EVC. The detailed matching rules are as follows:
You can bind EVC only to the Hybrid and Trunk port, but cannot bind EVC
to the Access port. The UNI mapping of the port is ALL-TO-ONE. The port
can only be bound to one EVC and all CEVLANs are mapped to the EVC.
If the UNI mapping of the port is BUNDLING, the port can be bound to
multiple EVCs and each EVC can be configured with multiple CEVLANs. The
CEVLANs in the multiple bound CEVLANs cannot be the same and SVLANs
cannot conflict with each other.
If the UNI mapping of the port is MULTIPLEXING, the port can be bound to
multiple EVCs, but each EVC can be configured with only one CEVLAN. The
CEVLANs in the multiple bound EVCs cannot be the same and SVLANs
cannot conflict with each other.
2. The application combination of EVC and ELMI (for ELMI, refer to ELMI
Technical Manual)
Bind the E-LMI protocol on the connected ports of the PE and CE devices,
and run the E-LMI protocol as the PE and CE modes. With the E-LMI
switching, the CE device can get the configuration information and status
information of all EVCs bound on the ports connected to the CE device
from the PE device. Meanwhile, when the EVC status on the PE port
changes, actively inform the CE device to update at once via the E-LMI
protocol.
3. The application combination of EVC and 802.1ag (for 802.1ag, refer to

802.1ag Technical Manual)
One EVC needs to be bound with the CFM management domain instance.
With the CFM management domain instance, you can get the connectivity
between the UNIs in the EVC.
The current status of EVC depends on the status of all local ports and
remote ports in EVC. The status of the remote port needs to be got via
802.1ag. Therefore, EVC needs to concern and process the following
events in 802.1ag: add remote MEP, remote MEP status UP, delete remote

MEP, remote MEP status DOWN, and delete CFM management domain
information. Process the events to update the current status of EVC.
Typical Application
The following figure shows one typical application instance of combining
EVC and E-LMI.
Figure 18.3 EVC networking instance
In the above figure, one EVC is defined. EVC_Provider indicates the

network connection of the service provider, which comprises PE1, PE2 and
PE3. The blue ellipse indicates one CFM management domain- Service
Provider Domain, whose level is 4. The three edge devices are three MEP
nodes of the domain. The CFM management domain is responsible for
checking the connectivity among the three MEPs, so as to confirm the
operation status of EVC_Provider.
Enable the E-LMI protocol on UNI1 between CE1 and PE1. CE1 gets the
UNI1 configuration information and the configuration and status
information of EVC_Provider from PE1 via the E-LMI protocol, so as to
complete the auto configuration function of CE1.

LLDP Technology
This chapter describes the LLDP technology and application.
Main contents:
Overview
LLDP working mechanism
TLV information type
Typical application of LLDP
Overview
LLDP (Link Layer Discovery Protocol) is the link layer protocol defined in
802.1ab. It organizes the information of the local device as TLV
(Type/Length/Value) to be encapsulated in LLDPDU (Link Layer Discovery
Protocol Data Unit), which is sent to the direct-connected neighbor.
Meanwhile, LLDP saves LLDPDU received from the neighbor in MIB
(Management Information Base). With LLDP, the device can save and
manage the information of itself and direct-connected neighbor device for
the network management system to query and judge the communication
status of the link. LLDP does not configure or control network elements or
traffic, but it only reports the L2 configuration. Another content in 802.1ab
is to make the network management software use the information
provided by LLDP to discover some L2 contradiction.
LLDP Working Mechanism

LLDP has the following four working modes:
TxRx: transmit and received LLDPDU
Tx: only transmit, but not receive LLDPDU

Rx: only receive, but not transmit LLDPDU
Disable: not transmit or receive LLDPDU
LLDPDU Transmitting Mechanism

When the port works in TxRx or Tx mode, transmit LLDPDU to the
neighbor device periodically according to the specified interval. When the
local configuration changes, to inform the change of the local information
to the neighbor device as soon as possible, you need to enable the polling
function on the device and configure the polling period; when the polling
time reaches, transmit LLDPDU at once. If the polling function is not
enabled, the change of the local configuration does not transmit LLDPDU
at once until transmitting the next LLDPDU by the transmitting period. To
prevent that the frequent change of the local information causes lots of
LLDPDU to be sent, delay some time every transmitting one LLDPDU, and
then continue to transmit the next LLDPDU.
When some configuration about the LLDP of the local device (such as
holdtime, select the released TLV type) changes, or when the polling
mechanism finds that the configuration information of the local system
LLDP changes after the polling function is enabled, to make other devices
discover the change of the local device as soon as possible, enable the
rapid transmitting mechanism, that is, transmit the LLDPDU of the
specified number (it is 3 by default) continuously at once, and then
recover to the normal transmitting period.
When the device disables LLDP globally or the port on which LLDP is
enabled performs the operations of shutdown, adding into aggregation
group, disabling LLDP, and executing the system reload, to make the
neighbor device learn the disabling of the local device LLDP rapidly, you
need to transmit one CLOSE TLV LLDPDU to inform the neighbor.
LLDPDU Receiving Mechanism

When the port works in the TxRx or Rx mode, check the validity of the
received LLDPDU or the carried TLV. After checking the validity, save the
neighbor information to the local device and set the aging time of the
neighbor information in the local device according to the TTL (Time To Live)
carried by LLDPDU. If the TTL value in the received LLDPDU is 0, age the
neighbor information at once.

Set the aging time of the local information on the neighbor device by
configuring holdtime. The default value is 120s. The maximum value of
holdtime is 65535s.
TLV Information Type

The TLV that can be encapsulated by LLDP includes basic TLV, the TLV
defined by the organization and related TLV of MED (Media Endpoint
Discovery). The basic TLV is regarded as a group of TLV of the network
device management basis; the TLV defined by the organization and the
related TLV of MED is the TLV defined by the standard organization and
other organization, used to improve the management for the network
devices. You can configure whether the TLV is transmitted in LLDPDU as
desired.
Basic Management TLV

In basic TLV, some types of TLV are mandatory for realizing the LLDP
function, that is, must be released in LLDPDU, as shown in Table 19-1.
Description of basic management TLV
TLV Type Description Whether to

be released
End of LLDPDU TLV Indicating the end of LLDPDU Yes
Chassis ID TLV The MAC address of the sending device Yes
Port ID TLV Used to identify the port of the LLDPDU Yes
sending end; when the device does not send
MED TLV, the content is the port name; when
the device sends MED TLV, the content is the
MAC address of the port.
Time To Live TLV The life time of the local device information on Yes
the neighbor device
Port Description TLV The description character string of the port No
System Name TLV The device name No
System Description TLV The system description No
System Capabilities TLV The main functions of the system and which No
functions are enabled
Management Address TLV Management address and the corresponding Yes
interface number and OID (Object Identifier).
The management address is the main IP
address of the VLAN permitted by the interface
with minimum VLAN ID. If the VLAN with the
minimum VLAN ID is not configured with the
main IP address, the management address is
127.0.0.1. By default, send the TLV.

TLV Defined by Organization

1. TLV defined by IEEE 802.1
Port VLAN ID TLV: VLAN ID of the port;
Protocol VLAN ID TLV: the protocol VLAN ID of the port;
VLAN Name TLV: the VLAN name of the port;
Protocol Identity TLV: the protocol type supported by the port;
The device does not support sending Protocol Identity TLV, but can receive
this type of TLV.
2. TLV defined by IEEE 802.3
MAC/PHY Configuration/Status TLV: the rate and duplex status of the

port, whether to support the auto negotiation of the port rate, whether
to enable the auto negotiation function and the current rate and
duplex status;
Power Via MDI TLV: the power capability of the port;
Link Aggregation TLV: Whether the port supports the link aggregation
and whether to enable the link aggregation;
Maximum Frame Size TLV: The supported maximum frame length,

adopting the configured MTU of the port (Max Transmission Unit);
Related TLV of LLDP-MED

LLDP-MED Capabilities TLV: The MED device of the current device and
the LLDP MED TLV type that can be encapsulated in LLDPDU;
Network Policy TLV: The VLAN ID of the port, the supported

application (such as voice and video), the applied priority and used
policy;
Hardware Revision TLV: the hardware version of the device;
Firmware Revision TLV: the firmware version of the device;
Software Revision TLV: the software version of the device;
Serial Number TLV: the serial number of the device;
Manufacturer Name TLV: the manufacturer of the device;
Model Name TLV: the Model Name of the device;

Assert ID TLV: the assert ID of the device, for the directory

management and assert tracking;
Location Identification TLV: the location ID information of the

connection device, used by other devices in the application based on
the location;
Neighbor Storage Capability of

LLDP
The LLDP protocol can receive LLDPD and store the neighbor in the form of
the neighbor information. The LLDP protocol has limitation for the storage
capability of the neighbor. Currently, the single port on Maipu switch
supports the information of 20 neighbors at most. The whole device
supports the storage of 2000 neighbors at most. If the number of the
neighbors reaches 2000, the notification packets of more neighbors are
dropped and are not saved.
Typical Application of LLDP
Networking of configuring LLDP
As shown in the above figure, the port 0/0/1 of SW1 is connected with
port 0/0/1 of SW2; port 0/0/2 of SW1 is connected with port 0/0/2 of SW3.

Configure LLDP function on the three devices. The three devices can
exchange information via LLDPDU and query the neighbor information of
each other. The remote NMS can be connected to the device for network
management and topology collection, so as to realize the cluster
management.

MAC Address Table

Management Technology
This chapter describes the management technology of the MAC address

table and application.
Management and Application of

MAC Address Table
This section describes the management theory of the MAC address table.
Main contents:
Related terms
Introduction
Related Terms
Dynamic MAC address: the auto learned MAC address of the packet
received by the switch. When the port receives one packet, search
whether the source/destination MAC address of the packet is in the MAC
address table. If not, associate the port, VLAN and source MAC address
and save in the MAC address table.
Static MAC address: the static forwarded MAC address configured by the
user via the shell command or snmp proxy; the static MAC address and
the dynamic MAC address have the same function, but compared with the
dynamic MAC address, the static MAC address does not age.

Filter MAC address: the static filtered MAC address configured by the
user via the shell command or the snmp proxy; when the source or
destination MAC address of the packer received by the gateway is the filter
MAC address, directly discard the packet.
MAC address entry: formed by the information, such as MAC address,

VLAN, port number and the type of the MAC address.
Aging time: the existing time of the dynamic MAC address in the MAC
address table after the switch learns the MAC address.
Introduction
The MAC address entry contains the address information of the packet
forwarding between the ports. There are three types of addresses in the
MAC address entries, including static MAC address, dynamic MAC address,
and filter MAC address. The MAC address entry is formed by the
information, such as MAC address, VLAN, port number and the type of the
MAC address.
The static MAC address can only be set manually or via other software.
Compared with the dynamic MAC address, the static MAC address is not
aged and cannot be learned, but can only be added and deleted manually.
According to the function, the static MAC address is divided to three kinds,
that is, the static MAC address of forwarding packets normally (FWD), the
static MAC address of only transmitting the packet to CPU, but not
forwarding the packet (TRAP) and the static MAC address of transmitting
the packet to CPU and forwarding packet (F&T).
The filter MAC address is global and functions on the whole switch. If one
MAC address is configured as the filter address, the host of the address is
prohibited to access the network via the switch, that is, the packet with
the destination or source MAC address as the MAC address is dropped.
The dynamic MAC address is the MAC address that is learned according to
the source MAC address of the packet after the switch receives the packet.
The MAC address entry is associated and saved according to the MAC
address, VLAN ID and port value. The MAC address table updates the
entries according to this mode. When receiving one packet whose
destination MAC address is in the MAC address table, forward it directly.
Otherwise, write the source MAC address into the MAC address table, that
is, learn one MAC address and forward the packet to other member ports

of the VLAN to which the port belongs. That is to say, the packet floods.
When the number of the MAC addresses learned by the port reaches the
maximum value, do not learn any more and flood the packet. If the device
does not receive the packet with the source MAC address packet as the
address before the aging time of the dynamic MAC address arrives after
learning one MAC address, the MAC address entry is deleted when the
aging time arrives.
The port-based MAC address learning number limitation is that the user
can configure to limit the number of the dynamic MAC addresses learned
by each port. Usually, the maximum number of the MAC addresses that
can be learned by one port is 32767. When the number of the MAC
addresses learned by the port reaches 32767, do not learn MAC address
any more. The new MAC address cannot be learned until the MAC
addresses are aged and the new address packets do not flooding.
The function of the static forwarding MAC address and dynamic MAC
address is fast forwarding, that is, the MAC address table is one fast
forwarding table, which can make the packet be forwarded via the
specified port rapidly and correctly, so as to prevent the packet from being
broadcasted in the whole VLAN.
Note
The static MAC address entries configured by the user manually and filter
MAC address entries are not covered by the dynamic MAC address entries,
but the dynamic MAC address entries can be covered by the static MAC
address entries and black-hole MAC address entries.

PWE3 Technology (Only for

S3400/S3900)
PWE3 provides the tunnel on the packet switching network (IP/MPLS) to

emulate the L2 VPN protocol of some services (FR, ATM, Ethernet, and
TDM SONET/SDH). The protocol can help to connect the traditional
network with the packet switching network, so as to realize the sharing of
resources and the expansion of the network. The protocol is the expansion
of the Martini protocol. It expands new signaling (optimize the signaling
expenses) and regulates multi-hop negotiation mode to make the
networking of the protocol more flexible. The manual describes the theory,
key technologies, and typical applications of the circuit emulation in the
packet network.
The circuit emulation in the packet network is a technology of bearing

traditional TDM data on the packet switching network (PSN). It adopts the
circuit emulation mode in PWE3 frame protocol to provide end-to-end
transmission for PDH and SDH data flow on the packet switching network.
The main contents of the chapter:
Basic concepts
Technology theory
Realizing method
Typical application
Basic Concepts
With the evolution of the network technology and the network
convergence, the network data transmission and switching mode with the
packet as the basic unit will be the dominant in the next generation
network. Both IP network and MPLS network are the representatives of the
packet switching network. However, the next generation network (NGN)
cannot be constructed overnight. The current PDH/SDH network serving

PSTN public voice communication services will exist for a long time, and
the existing TDM devices of users on the network will still be used. To
protect the investments of users on the TDM devices, it is necessary to
provide the capabilities of accessing the TDM services and transmitting the
TDM data transparently in the next generation packet switching network.
For the data transparent transmission of the TDM circuit switching service
on the packet switching network, several standard organizations put
forward their own standards and solutions. Currently, the TDM circuit
emulation is the most mature.
Background of TDM Circuit Emulation

Technology
At first, the TDM circuit emulation technology is to realize the transparent
transmission of the TDM circuit switching data on the IP network. It
appears as the competitive technology for the VoIP technology and
provides the processing flow that is more simplified than the VoIP protocol.
It provides the voice transmission service via the IP network. The initial
TDM service transparent transmission device only supports the transparent
transmission of the E1 and DS1/DS0 services. With the packet switching
network becoming the dominant in the NGN solutions gradually, especially
the rising of the Metro-E technology, TDM circuit emulation technology
becomes the important technology of transmitting TDM service on the
packet switching network. Currently, many protocol drafts or technology
standards for the transparent transmission of the E1/T1/E3/T3 structured
and non-structured TDM service, the structured transparent transmission
of the SDH service, and the transmission of the PDH and SDH signaling
are complete.
Related Technology Standards

The related standards of TDM circuit emulation technology are mainly from
four international standard organizations, that is, IETF, ITU-T, MEF, and
MFA. The organizations cooperate with each other. The transparent
transmission standards of the TDM service put forward by different
organizations are basically similar and have a little difference in the
specific technical details such as data encryption format. Among the
standard organizations, IETF PWE3 working group plays a leading role in
making the transparent transmission standards of the TDM service. The
organization not only defines the standards of the technology at the data
layer, but also defines the standards at the control and management
layers, while the standards of other organizations mainly focus on the data
encryption method.

The standards put forward by MEF focus on how to encrypt the original
TDM service to the Ethernet frame, while the standards of MFA focus on
how to bear the TDM service on the MPLS network. ITU-T standards also
focus on the data layer. It provides the mode of MPLS bearing the TDM
service data and the mode of IP bearing the TDM service data. Besides,
ITU-T defines the clock transmission solutions that are important for the
TDM service.
Commonly-used Terms
PWE3 (Pseudo Wire Edge to Edge Emulation): IETF defines the meaning
of PW in RFC3985, that is, an emulation of using the packet switching
network to bear the local service;
IWF (Interworking Function): the device that switches the data between
two different networks;
CE (Customer Edge): the device to initiate and terminate the TDM service;
PE (Provider Edge): the device that provides the PWE, which is equivalent
to IWF;
AC (Attachment Circuit): the connection link or virtual link between CE

and PE; all data on AC is required to be sent to the peer end without any
change;
Bundle: the bit flow sent by the TDM circuit of the PE devices at the two
sides of the PW; it can comprise any several 64Kbps time slots in one E1
or T1. Bundle is the uni-directional data flow. It often matches the
opposite Bundle to form the full-duplex communication. There can be
several Bundles between two PE devices.
CESoPSN (Circuit Emulation Services over Packet Switched Network): It is

the emulation that concerns the structure of the TDM data frame;
SAToP (Structure-Agnostic TDM over Packet): It is the emulation that

does not identify the structure of the TDM data frame;
TDMoIP (Time Division Multiplexing over Internet Protocol): It is the

emulation related with the contents of the TDM data.
CAS: Channel Associated Signaling
Technical Theory
IETF PWE3 working group plays a leading role in making the standards of
the TDM service transparent transmission, so the standards of the TDM

service transparent transmission made by IETF PWE3 working group are

the most complete and become the mainstream standards in the field. The
following introduces the TDM transparent transmission technology by
analyzing the TDM PWE3 technical scheme.
TDM PWE3 Technical Scheme

PW Th eor y
PW is a mechanism that transmits the key elements of one emulation
service from one PE to another or several other PEs via the PSN. It
emulates various services (ATM, FR, HDLC, PPP, TDM, and Ethernet) via
one tunnel (IP/L2TP/MPLS) on the PSN network. The PSN network can
transmit various data payloads. The tunnel used by the scheme is defined
as Pseudo Wires. The inner data service born over the PW is invisible for
the core network, that is to say, the core network is transparent for the CE
data flow.
Figure 21-1 PW schematic

The PW scheme provides a technical frame. In the frame, various services
can use the PW to be transmitted transparently on the PSN network. TDM
Pseudo Wires emulation is a technology that uses the PW to emulate the
TDM service data on the PSN network.
Ele ments o f TD M Emula tion Ser vice

When using the PW mode to emulate transmitting the TDM service on the
PSN network, the following elements need to be transmitted to the other
side of the PW.
1. TDM service data
2. The frame format of the TDM service data
3. The alarm and signaling of the TDM service at the AC side
4. TDM synchronous timing information

TD M Emula tion Pro tocol

TDM circuit emulation service is to use the special circuit emulation packet
head to encrypt the TDM service data. In the special packet head, there is
the frame format information, alarm information, signaling information
and synchronous timing information of the TDM service data. The
encrypted packet is called PWE3 packet. And then the PWE3 packet is born
by the IP, MPLS, and L2TPv3 protocols to cross the corresponding packet
switching network. After reaching the exit of the PW tunnel, dis-encrypt
the packet, and then re-construct the TDM circuit switching service data
flow.
The following describes several TDM circuit emulation encryption protocols.
1. SAToP protocol (RFC4553)
RFC4553 provides the emulation function for the low-rate PDH circuit
services such as E1/T1/E3/T3. SAToP is to transmit the unstructured (that
is unframed) E1/T1/E3/T3 service data. It segments and encrypts the TDM
service as the serial data stream, and transmits it on the PW tunnel. In the
elements of the TDM emulation service described in the above section, the
protocol can provide the transparent transmission of the TDM service and
the transmission of the synchronous timing information, but cannot
identify the TDM frame structure. Therefore, the information about the
TDM frame structure and the signaling in the TDM frame cannot be
identified and processed, and can only be transmitted transparently. The
protocol is the simplest mode of transmitting the PDH low-rate service
transparently in the TDM circuit emulation scheme. It is also because it is
simple to realize that it is released by IETF as the RFC formal standard.
RFC4553 totally provides three optional PW outer tunnel encryption modes,

that is, UDP/IP mode, L2TPv3 mode, and MPLS mode. UDP/IP mode
adopts the UDP/IP packet head to encrypt the PWE3 packet and uses the
different UDP port numbers to distinguish different PW outer tunnels. The
encryption mode is suitable for the pure IP network. Currently, the TDM
circuit emulation service developed by Maipu supports the UDP/IP mode.
The L2TPv3 mode adopts the L2TPv3 packet head to encrypt the PWE3
packet and uses the different session IDs to distinguish different outer
tunnels. The mode can adopt the L2TPv3 protocol negotiation to set up the
outer tunnel and distributes different session IDs to the different PWs in
the tunnel via the protocol. It is more flexible than the UDP/IP mode in
using.
MPLS mode adopts the MPLS label to encrypt the PWE3 packet and adopts
the LSP as the outer tunnel of PW. The PW label is the most inner label of
the MPLS label stack. In MPLS mode, the user can perform the dynamic

distribution and management via the LDP protocol, so compared with

UDP/IP manual binding mode, MPLS mode is more convenient to use.
Meanwhile, there can be several layers of MPLS labels to realize the
nesting of the PW outer tunnel, which is convenient for applying the mode
in a larger scale network range.
2. CESoPSN protocol
Compared with SAToP, CESoPSN can provide the structured TDM service
emulation transmission function, that is, can identify, process, and
transmit the framed structure and the signaling in the TDM frame. Take E1
as an example. The structured E1 comprises 32 time slots. Except for time
slot 0, the other 31 time slots can bear one 64Kbps voice service
respectively. Time slot 0 is used to transmit the signaling and the frame
symbol. The CESoPSN protocol can identify the frame structure of the TDM
service. The idle time slot channel does not need to transmit the data.
Only the useful time slots of the CE device are used to encrypt the E
service flow to the PWE3 packet. Meanwhile, the functions of identifying
and transmitting the CAS and CCS signaling in the E1 service flow are
provided.
The CESoPSN protocol scheme also provides three optional PW outer

tunnel encryption modes, that is, UDP/IP mode, L2TPv3 mode, and MPLS
mode. Different from the SAToP protocol, the TDM service data that is
born inside the PW by using the CESoPSN protocol has the frame structure.
Meanwhile, the PW control field in the PWE3 packet has the M domain to
identify the signaling checking at the AC side. Currently, the TDM service
products developed by Maipu support the CESoPSN protocol in UDP/IP
encryption mode.
Besides the TDM service data, CESoPSN provides the scheme of identifying
and transmitting the CAS signaling.
3. TDMoIP protocol
The PW encryption modes (UDP/IP mode, L2TPv3 mode, MPLS mode, and
MEF mode) on different PSN networks are described. Both SAToP and
CESoPSN take the TDM bit flow as the payload encrypted by the PW, while
TDMoIP adds three new TDM payload types, that is, the AAL1 payload,
AAL2 payload, and HDLC payload. Currently, the TDM service products
developed by Maipu support the HDLC TDM payload.
Besides, the PWE3 working group of IETF defines the structured circuit
emulation scheme for the high-end and low-end channel of SONET/SDH to
transmit the VC11/VC12 and VC2 TDM service data transparently via the
PWE3 mode.

Other Technical Schemes

Besides the PWE3 working group of IETF, MEF, MFA, and ITU-T define the
related protocol standards of the circuit emulation. For example, MEF8.0
defines the TDM circuit emulation packet encrypted by the nude Ethernet,
which distinguishes the different TDM circuit emulation data flows by the
different ECIDs.
Figure 21-2 Mapping relation between the function layer and MEF
packet encryption
In the MEF8.0 standard defined by MEF, CESoETH control words are

compatible with the PW control words defined by IETF. The RTP control
words also adopt the RFC3550 standard of IETF. It also adopts the PWE3
tunnel to transmit the TDM service transparently, but the bearing layer is
the nude Ethernet.
Key Technologies
D ata J itter Bu ffer
After crossing the packet switching network to reach the exit PE device,
the reaching interval may be different and the packets may be out of order.
To ensure that the TDM service data flow can be re-constructed on the exit
PE device, the jitter buffer technology is needed to smooth the interval of
the PW packets and re-arrange the packets that are out of order. The
capacity of the jitter buffer considers the performance eclectically. The
jitter buffer with large capacity can absorb the packet transmission
interval jitter with much change in the network, but brings in large delay
when re-constructing the TDM service data flow. Providing the jitter buffer
whose capacity the user can configure and adjust is a good policy. The
user can configure it flexibly according to the different network delay and
jitter. Currently, the TDM circuit emulation products developed by Maipu

support configuring the jitter buffers with different capacities via the
command.
R eco ver Clock Ti ming Infor mat ion

The TDM networks that adopt the circuit switching (such as the SDH
network) natively have the capability of transmitting the network
synchronous timing information, but most packet switching networks,
especially the current Ethernet network, do not have the function.
Currently, there are the following solutions.
1 Adopt the auto-sensing packet recovering algorithm: Use the

time window smoothing method and auto-sensing algorithm to
extract the synchronous timing information from the PWE3
packet at the exit so that the re-constructed TDM service data
flow gets a service data flow that is approximately synchronous
with the sending end. But the algorithm has limitations.
Especially, when the packet loss and transmission delay in the
network changes greatly, the synchronous timing information
cannot be recovered correctly.
2 Adopt the synchronous Ethernet to transmit the clock: Reform

the Ethernet network of the current synchronous clock system
and bring in the idea of synchronous timing transmission in the
whole network of the SDH system to the design of the Ethernet
network design.
3 TDM circuit emulation only transmits the service data. The

synchronous timing information is transmitted by other
synchronous timing system, such as the sending clock of the
GPs system or the sending clock of the synchronous clock
network.
C heck Link Faul t

The link fault checking includes the fault checking at the AC side, the fault
checking of the PW tunnel link, and a series of actions taken after the fault
is found, such as notifying the peer side and switching the fault link.
Currently, the link fault checking at the AC side and notifying the peer side
have the related technical drafts. The fault checking of the PW tunnel link
also has many optional technologies, such as MPLS-OAM technology and
Ethernet OAM technology.
Anal yze Packet De la y

For the services that have a high requirement for realtime such as voice
transmission, the data delay and jitter affect the service quality greatly,
which needs to be considered. For the technology of using the TDM PW

emulation mode to transmit the TDM service transparently, the data delay
comprises the following aspects, that is, packet encryption delay, service
processing delay, and network transmission delay.
1. Packet encryption delay is generated when the TDM service flow is

encrypted as the PWE3 packet. The delay is only owned by the
TDM circuit emulation technology. Take the E1 as an example. The
E1 rate is 2.048Mbps; each frame contains 32 time slots (256 bits);
8000 frames are transmitted every second; the duration of each
frame is 0.125ms. If adopting the structured encryption mode and
very four frames are encrypted as one PW packet, the delay for
encrypting one PW packet is 40.125ms=0.5ms. The encryption
time increase with the number of the encrypted frames. The more
the encrypted frames, the larger the encryption delay.
2. Service processing delay is the time for the device to process the
packet, including the packet validity check, packet filtering, parity
check, and calculation, packet encryption and receiving and
sending. The delay depends on the service processing capability of
the device. For one device, it is fixed.
3. Network transmission delay is generated when the PWE3 packet

reaches the egress PE from the ingress PE via the packet switching
network. It varies greatly with the network topology structure and
the network service flow. It is also the main reason for generating
the service jitter. Currently, the jitter buffer technology can absorb
the jitter, but the delay cannot be absorbed.
The TDM service delay depends on the above three kinds of delays.
C hannelize d and Non -channel ized

Technolog ies
The non-channelized service transmission in the TDM Pseudo Wire
Emulation is the un-structured transmission. It does not identify the data
format in the TDM service flow and only processes the TDM data as the
serial code flow. RFC4553 (SAToP) un-structured encryption protocol
requires that the un-structured circuit emulation for E1 rate must support
the service processing with 256 bytes as a basic payload unit, that is, the
E1 frame structure is not identified, but the TDM code flow must be
segmented according to the integral multiple of the E1 frame length and
are encrypted as the PWE3 packet. Meanwhile, the un-structured T1 rate
circuit emulation must support the service processing with 1024 bytes as a
basic payload unit.
Correspondingly, the channelized service is the structured TDM Pseudo

Wire Emulation. It needs to identify the frame format in the TDM service
flow and the segmenting for the TDM code flow must be at the frame

delimiter. For example, E1 frame must be segmented at the beginning of

time slot 0. Because of segmenting from the frame delimiter, the 32 time
slots in the E1 frame can be identified for structured processing. The
structured processing for T1 and E3/T3 is similar.
Comparing the two modes, the un-structured mode is simpler. It does not
need to identify the frame format in the TDM data flow and is more
commonly used. For the device in the traditional data network that takes
E1/T1 as the synchronous serial interface (that is, ignore the frame format)
and adopts the net channel transmission, the un-structured TDM Pseudo
Wire Emulation is more convenient.
Structured (channelized) mode is more complicated. It needs to identify

the frame symbol in the TDM data flow. The time slots in the frame and
the signaling information carried by some special time slots must be
identified and processed. When the TDM interface works in the frame
mode and communicates via some time slots of E1/T1, adopting the
structured TDM Pseudo Wire Emulation is more helpful for improving the
bandwidth utilization. The structured TDM Pseudo Wire Emulation can
distinguish the time slots being used in one E1 circuit from the idle time
slots. It can encrypt only the being-used time slots in the PWE3 packet
and discard the idle time slots. In this way, the network transmission
bandwidth is saved. Besides, the structured mode can realize the inserting
of the time slots between different E1/T1 interfaces, so as to further
improve the bandwidth utilization.
Realizing Methods
PWE3 Packet Format
Currently, Maipu only supports the PWE3 packets encrypted by the
UDP/IPv4 mode. As shown in Figure 21-3, the TDM service data is
encrypted in the TDMoIP PAYLOAD of the packet.
Figure 21-3 PWE3 packet encrypted by UDP/IP mode

The format of the UDP/IPv4 head is as shown in Figure 21-4. The source
IP address is the local address of the Pseudo Wire. The source addresses
of the PWE3 packets sent from the local are the same. The destination IP
address is the remote address of the Pseudo Wire. The destination IP
addresses of the PWE3 packets sent to the Pseudo Wire are different. UDP
destination port number is fixed as 2142, which is the private port number
of TDM over IP distributed by IANA. It is the ID of the PWE3 packet
encrypted by the UDP/IP mode. The UDP source port number is used to
distinguish the PWE3 packets of different bundles on one Pseudo Wire and
the value range is 1-8063.
Figure 21-4 UDP/IPv4 head format
The control word provides the method of exchanging TDM circuit status
and PSN network status for the PWE3 packet. The format is as shown in
Figure 21-5. RES is the reversed field and must be set as 0. L bit means
the local asynchronous; placing 1 at L bit means that the local is detected
or informed. The fault at the TDM physical layer results in the incomplete
of the data, so the bit can be used to indicate the asynchronous at the
physical layer and trigger generating the AIS signal at the remote side.
After the TDM fault is fixed, L bit is cleared up. R bit means the remote
receiving fault. Placing 1 at the R bit means that the remote does not
receive the packet from the Ethernet port. R bit can be used to advertise
the fault block or other network faults. Receiving the remote fault
indication can trigger the rollback mechanism to avoid the block. The R bit
is placed with 1 after the pre-set successive N packets are not received;
after the packets are received, the R bit is cleared up.
FRG field means the segmenting type and it is used for the CAS multi-
frame structure in the CESoPSN protocol. When FRG is 00, it means that
the multi-frame is in one packet; 01 means that the packet carries the
first segment of the multi-frame; 10 means that the packet carries the last
segment of the multi-frame; 11 means that the packet carries the middle
segment of the multi-frame. LENGTH field means the total bytes of the
control word, payload, and RTP head (if there is), which is used when the
length is less than 64 bytes. When the length is more than 64 bytes, the
field is set as 0. SEQUENCE NUMBER field means the serial number of the
packet. The initial value is a random value and it increases according to
the sent packets. When reaching the maximum value, it rolls back to 0.
The field is used to check whether the packet is lost.

Figure 21-5 Control word format
The RTP head is used to carry the clock information and assist the
receiving end to recover the TDM clock from the PSN network. The format
is as shown in Figure 21-6. V means the version and is fixed as 2. P means
the filling bit and is fixed as 0. CC is the CSRC count and is fixed as 0. M is
the marking bit and is fixed 0. PT field means the payload type and the
value of each bundle is unique. SN is the serial number of the packet and
is the same as SEQUENCE NUMBER in the control word. TS is the time
stamp and has two generating modes, that is, absolute mode (it is from
the recovered clock on the TDM line and it increases by 1 every 125 ms)
and the relative mode (it is from the common clock and it is added with 1
every time receiving a bit). SSRC indicates the synchronous source.
Figure 21-6 RTP head format
SAToP Protocol
The TDM port on PE works in the non-framed mode and does not concern
the received TDM frame structure information, which is regarded as a bit
flow with fixed rate. As shown in Figure 21-7, SAToP processes the TDM
flow with the byte (8 bits) as the unit. Every N received TDM bytes are
encrypted to the TDM payload of the PWE3 packet, and sent to the PSN
network. After the PE device at the other side of the Pseudo Wire receives
the packet, dis-encrypt the TDM payload from the PWE3 packet and send
it to the TDM port.
Figure 21-7 SAToP sketch map

After N TDM bytes are received, generate a PWE3 packet, so a fixed delay
is generated, which is called packet encryption delay (PCT).
The PCT calculation method of SAToP: PCTN8bit timeN8bit rate
Take E1 as an example. The E1 rate is 2.048Mbps; 2048000 bits are

transmitted every second; each bit time is 488ns. If every 256 bits are
encrypted as one PWE3 packet, the delay for encrypting one PWE3 packet
is 2568488ns=1ms. The packet encryption time increases with the

number of the encrypted bytes. The more the encrypted bytes, the larger
the packet encryption delay, the fewer the generated packets in unit time.
CESoPSN Protocol
The TDM port on PE works in the framed mode, which is divided into non-
CAS and CAS modes according to the TDM service type.
Non-CAS mode
As shown in Figure 21-8, CESoPSN processes the TDM flow with the frame
as the unit. After every N frames are received, the data of the specified
time slots (time slot 4 and 25) is encrypted into the TDM payload of the
PWE3 packet and then sent to the PSN network. After the PE device at the
other side of the Pseudo Wire receives the packet, dis-encrypt the TDM
payload from the PWE3 packet, insert them to the specified time slots
(time slot 4 and 25) respectively, and then send them to the TDM port.
Figure 21-8 CESoPSN sketch map of non-CAS

In the mode, PCTNframe timeNframe rate. Take E1 as an example.
The E1 rate is 2.048Mbps; every frame contains 32 time slots; 8000
frames are transmitted every second; the frame rate is 0.125ms; every 32
frames are encrypted as one PWE3 packet. Therefore, the delay for
encrypting one PWE3 packet is 320.125ms=4ms. The packet encryption
time increases with the number of the encrypted bytes in the PWE3 packet.
The more the encrypted bytes, the larger the packet encryption delay, the
fewer the generated packets in unit time.
CAS mode
As shown in Figure 21-9, TDM has the CAS multi-frame structure, that is,
comprises 26 base frames. The 16 time slots of each base frame are used
to carry the signaling and multi-frame synchronization. CESoPSN
processes the TDM flow with the CAS multi-frame as the unit. Encrypt the
data of the specified time slots (time slot 2, 4, and 25) in each base frame
to the TDM payload in the PWE3 packet according to the order that begins
with the first base frame of the multi-frame and ends with the last base

frame of the multi-frame. At last, add the corresponding signaling

information to the end of the time slot data, and then send it to the PSN
network. After the PE device at the other side of the Pseudo Wire receives
the packet, dis-encrypt the TDM payload from the PWE3 packet and insert
it to the specified time slots (time slot 2, 4, and 25) respectively.
Meanwhile, insert the signaling to time slot 16, and then send it to the
TDM port.
Figure 21-9 CESoPSN sketch map of CAS
In the mode, PCTthe number of the base frames in the multi-frame

frame timethe number of the base frames in the multi-frameframe
rate. Take E1 CAS multi-frame as an example. The E1 rate is 2.048Mbps;
each frame contains 32 time slots; 8000 frames are transmitted every
second; the frame rate is 0.125ms; each CAS multi-frame contains 16
base frames. Therefore, the delay for encrypting one PWE3 packet is 16
0.125ms=2ms.
If the multi-frame contains many base frames, the packet encryption

delay is large and maybe cannot reach the delay index required by the
system. The CAS multi-frame segmenting mode can solve the problem, as
shown in Figure 21-10. The multi-frame is divided to N sub multi-frames
and each sub multi-frame contains M base frames. CESoPSN processes the
TDM flow with the sub multi-frame as the unit. Each sub multi-frame
corresponds with one PWE3 packet. The last sub multi-frame of the multi-
frame is added with the signaling information. Set the FRG in the control
word of the PWE3 packet that contains the first sub multi-frame of the
multi-frame as 01; set the FRG in the control word of the PWE3 packet
that contains the last sub multi-frame of the multi-frame as 10; set the
FRG in the control word of the PWE3 packet that contains the other middle
sub multi-frames of the multi-frame as 11. The PE device at the other side
of the Pseudo Wire can dis-encrypt the time slot data and the signaling
according to the FRG in the control of the PWE3 packet.
In the segmenting mode, PCT the number of the base frames in the
multi-frameframe timethe number of the base frames in the multi-
frameframe rate. Take E1 CAS multi-frame as an example. The E1 rate
is 2.048Mbps; each frame contains 32 time slots; 8000 frames are
transmitted every second; the frame rate is 0.125ms; each sub multi-

frame contains 4 base frames. Therefore, the delay for encrypting one
PWE3 packet is 4 0.125ms=0.5ms. The packet encryption delay
increases with the number of the base frames in the sub multi-frame. The
more the base frames in the sub multi-frame, the larger the packet
encryption delay.
Figure 21-10 CESoPSN segmenting sketch map of CAS
HDLC Mode
SAToP and CESoPSN circuit emulation modes are called flow mode
(transparent transmission mode), because the encrypted in the packet is
the original bit flow. The purpose is to transmit the TDM bit flow without
any change between two TDM devices.
However, in the HDLC mode, only the existing HDLC frames in the TDM bit
flow are transmitted, as shown in Figure 21-11. No matter whether the
TDM flow is framed or not, it is processed with the HDLC frame as the unit,
that is, search for the frame head and the frame trail of the HDLC frame in
the bit flow. When a complete HDLC frame is received, the data is
encrypted to the TDM payload and then sent to the PSN network. After
The PE device at the other side of the Pseudo Wire dis-encrypts the PWE3
packet, the payload is re-encrypted as the HDLC frame and inserted into
the TDM bit flow.

Figure 21-11 Sketch map of HDLC mode

In the mode, PCT is meaningless. The number of the generated PWE3
packets is the same as that of the sent HDLC frames in the TDM flow.
Technology of Recovering Clock from

Circuit Emulation packet
The circuit emulation technology is originated from the ATM network,
which adopts the virtual circuit to encrypt the circuit service data in the
ATM cell to be transmitted on the ATM network. Later, the theory of the
circuit emulation is transplanted to the Metro-E. The Ethernet provides the
emulation transmission of the circuit switching services such as TDM. The
circuit emulation is the mechanism adopted by the transparent
transmission of the circuit switching service on the network. It uses the
special circuit emulation head to encrypt the TDM service and realizes the
transmission of the clock on the packet switching network via some
mechanism. The device that realizes the encryption function at the
physical layer is called framer or mapper, which can be connected to the
original TDM network directly.
The technology of recovering clock from the circuit emulation packet is to

adopt the auto-sensing algorithm to recover the clock synchronous
information from the packet. The following describes the basic theory of
the algorithm.

Figure 21-12 Sketch map of auto-sensing clock recovering
As shown in Figure 21-12, the gateway (IWF) at the clock source side
sends the time information to the peer gateway regularly. The time
information is provide with the T1/E1 emulation packet. At the other side,
the gateway extracts the time stamp from the packet and recovers the
service clock (f-service) via algorithm.
The core theory of the algorithm is that the left IWF device sends the
packet to the destination IWF device according to its own source clock.
The destination IWF device uses one queue to buffer the packet, and uses
its own local clock to send it out. If the source clock and the destination
local clock are not consistent, even if only a very small difference, it
results in the depth change of the buffer queue in the destination device.
Therefore, we can judge whether the local clock is consistent with the
source clock according to the depth of the queue. If the queue depth
continues increasing, it shows that the local clock is slower than the source
clock and the local clock needs to be adjusted quicker; if the queue depth
continues reducing, it shows that the local clock is quicker than the source
clock, and the local clock needs to be adjusted slower. This is a negative
feedback mechanism. After it becomes stable, we will find that the local
clock at the destination is the same as the source clock in the long run. In
this way, the frequency synchronization is complete between two IWF
devices on the IP network.
A vivid metaphor can help to understand the auto-sensing algorithm. The

IWF device at the clock source is equivalent to the inlet of the pool and
sends the packets to the pool with a certain clock frequency. The IWF
device at the destination is equivalent to the outlet of the pool. The water
in the pool maintains a constant level by adjusting their switches. In this
way, the synchronization between two devices is complete.
The difficulty to realize the auto-sensing algorithm is that the IP network

innately has the delay jitter (PDV). The packet jitter also causes the depth
change of the buffer queue, while the IWF device at the destination cannot
judge the change is caused by the frequency difference or the delay jitter
of the IP network, so it cannot make the right response. But the delay

jitter of the IP network is not cumulative, so you can use the statistics
methods such as getting the average to perform the filtering.
PWE3 Typical Application
Figure 21-13 The connection and aggregation of the MAN private line
As shown in Figure 21-13, the TDM circuit emulation technology can be
used to connect and aggregate the MAN private line. For example, the LAN
district is connected to the PBX switches of the branches in the district to
provide the E1 voice access function to realize the communication in the
district. This can also be realized by connecting the district to the PSTN.
The TDM circuit emulation service is the emulation for the TDM physical
transmission mode and does not perceive the actual services transmitted
in E1. The DDN service, FR service, and ATM service over E1 can be
transmitted transparently via the TDM circuit emulation mode.
TDMoIP Gateway in the figure is the PWE3 device. The PWE3 packet
formats on paths are as shown in Figure 21-14.
Figure 21-14 PWE3 packet

Performance Test Result
Figure 21-15 Performance test environment

To test the reliability of the PWE3 circuit emulation, set up the test
environment as shown in Figure 21-15. Enable the PWE3 function on PE;
construct various background flows via the SMARTBIS devices; simulate
the network block and bandwidth burst change. And then use Router A to
send the test packet to the Router B. The test result is that no matter
whether there is background flow impact, Router B can receive the data
from Router A completely.

Loopback Detection
Technology
Introduction to Loopback
Detection
Ethernet is one broadcast network. When the destination of the packet
cannot be identified, the switch broadcast the packet in one VLAN. When
there is loop in the network, the packet is forwarded repeatedly in the
network and at last, the network bandwidth is consumed up and the
communication cannot be performed. Enable the loopback detection
function on the port and send Loopabck packets with an interval to check
whether there is loop in the network. When the port receives the Loopback
packet sent by the local device, analyze the source port of the packet from
the loopback packet, set the port as ERR-DISABLE, and print the log
information.
This section describes the theory of the loopback detection protocol and
how to realize it.
Related Terms of Loopback Detection

Protocol
LBD: Loopback detection
Introduction to Loopback Detection

Protocol
The loopback detection protocol is used to detect the uni-port network
loop.

Ethernet is the multipoint-to-multipoint network, as well as one broadcast

network. When the destination address of the packet cannot be identified,
the switch broadcasts the packet to all terminal stations. Therefore, when
there is loop in the network, the packet is forwarded repeatedly and at last,
the network bandwidth is consumed up and the communication cannot be
performed.
There are two cases of loop. One is that the loop is between different ports
of the switch. For example, because of connection error, two ports of one
switch are connected; the other is that the loop is on one port of the
switch. For example, the port is connected to one bridge device and the
Ethernet port of the bridge loops. In the first case, you can use STP to
detect, but in the second case, STP is useless and you should adopt other
methods to detect.
The theory of the port loopback detection is to send one special packet
timely. In the normal state, the device that receives the packet drops it. If
there is loop, the packet is returned to the source port. Compared with the
sent packet, you can get to know whether there is loopback.
For mat of Loopback Detec tion Pro tocol Packet

The format of the Ethernet loopback detection protocol packet is as follows:
The format of loopback detection packet
Fields of Loopback Detec tion Packet

DMAC field (6 bytes): the destination MAC address of the packet;
SMAC field (6 bytes): the source MAC address of the packet;
QTag field (4 bytes): If VLAN is configured, tag is four bytes. Otherwise,

there is no tag field;
Ethernet type field (2 bytes): the protocol type number of the loopback
packet, 0x9000;
Skip count field (2 bytes): The field is usually set as 0x0000;
Message type field (2 bytes): The message type; if it is 0x0100, it

means Reply message; if it is 0x0200, it means Forward_data;
Port Index field (2 bytes): the number of the port that sends the loopback
packet;

W orkflo w of Loopback Detec tion

The workflow of the loopback detection is as follows:
Send the detection packet with an interval on the port that is configured
with loopback detection. The DMAC of the packet is one MAC of the switch
(got from the base MAC); the SMAC is one MAC of the switch (got from
the base MAC); Skip counter is 0; Message type is 0x0100; Receipt
number is the port number. If the port is not configured to any VLAN, send
only one untag loopback packet. Otherwise, if the port belongs to one or
multiple VLANs, besides one untag loopback packet, send the tag loopback
packet to each VLAN that is configured with tag.
When the port receives one loopback packet that is not configured with
the loopback detection, drop it. Otherwise, check whether the DMAC and
SMAC of the packet are the MAC addresses of the device. If yes, prompt
the port loopback to the user. If the port is in the controlled state,
shutdown the port. Otherwise, do not shutdown the port and only prompt
the port loopback to the user.
Typical Application
When using the loopback detection, ensure that the corresponding port is
configured with the loopback detection function and works in the same
detection mode.
In this section, configure one basic loopback detection protocol for

reference.
The network topology is as follows:
Figure 22-1 Application instance of the loopback detection
Illustration

The Ethernet port0/1 of Switch1 is connected to Ethernet port0/2 of switch

2 via the network cable. Use the network cable to interconnect the port
0/3 and port 0/4 of switch 2. Add port 0/1 of switch 1 and port 0/2, port
0/3 and port 0/4 of switch 2 to VLAN 10 in tagged mode. Check whether
there is loop on switch 2 via the loopback detection function on switch 1.
The configuration of Switch1:
Command Description
switch1(config)#loopback-detection enable Enable the port loopback detection globally
switch 1(config)# port 0/1 Enter the port configuration mode
switch1(config-port-0/1)# port hybrid tagged Add port0/1 to VLAN10 in tagged mode
vlan 10
switch1(config-port-0/1)#loopback-detection Set the interval of sending the loopback
enable interval-time 10 detection packets of port 0/1 as 10s
switch1(config-port-0/1)#loopback-detection Enable the port loopback detection
enable
switch1(config-port-0/1)#exit Complete the loopback detection
configuration
The configuration of Switch2:
Command Description
switch2 (config)# port 0/2-0/4 Enter the port configuration mode
switch2(config-port-range)# port hybrid Add port0/2, port 0/3 and port 0/4 to VLAN
tagged vlan 10 10 in tagged mode.

Super VLAN Technology
This chapter describes the Super-VLAN technology and application.
Main contents:
Super-VLAN theory
Super-VLAN realization
Typical application
Super-VLAN Theory
Super-VLAN: also called VLAN aggregation. Super-VLAN associates
multiple sub-VLANs. Configure the IP address on the Super-VLAN interface.
Each Sub-VLAN is one broadcast domain and different Sub-VLANs are
separated from each other. To realize the intercommunication between
different Sub-VLANs, the ARP proxy function is needed. With the ARP
proxy, forward and process the ARP request and response packets, so as
to realize the L3 intercommunication between L2 separated ports. The L3
communication between the users in sub-VLAN uses the IP address of
Super-VLAN as the gateway. In this way, the IP addresses are saved.
Sub-VLAN: The VLAN that is added to Super-VLAN becomes the Sub-

VLAN. The communication in one Sub-VLAN completely belongs to
common L2 communication. Sub-VLAN cannot be bound with the L3
interface for L3 forwarding directly, but performs the L3 communication
via the Super-VLAN.

Super-VLAN Realization
Figure 23-1 Super-VLAN diagram
To realize the intercommunication between different Sub-VLANs, run the

ARP proxy on Super-VLAN to process the received ARP request and
response packets. Meanwhile, the L3 switch serves as the intermediate
forwarding device to forward the packets between Sub-VLANs.
For the PC communication in different Sub-VLANs (such as PC1<---- >PC3

in the above figure), they are in different broadcast domains, and need the
router to transfer, that is, send the packet to the broadcast domain of the
destination PC. Configure the ARP proxy on Super-VLAN and realize the
intercommunication via the Super-VLAN. The ARP proxy technology is
used as follows:
Suppose that PC1 of Sub-VLAN_1 wants to communicate with PC3 of Sub-

VLAN_2. PC1 sends the ARP request packet to request the MAC address of
PC3. All PCs in Sub-VLAN_1, including Super-VLAN_1 interface can receive
the request packet.
After Super-VLAN_1 receives the ARP request packet, check (for example,
whether the ARP proxy is configured) and judge whether forwarding is
needed. If yes, modify the MAC address of the requester (sender) of the
ARP packet to the MAC address of the interface and forward to other Sub-
VLAN.
After PC3 of Sub-VLAN_2 receives the request packet, make the

responding ARP reply. Here, the source MAC address of the ARP request
packet is the MAC address of the interface, so the destination MAC of the

PC3 response packet and the switch interface can receive the ARP
response.
After the switch receives the ARP response, make a series of processing
and judge whether to answer the ARP response to the original requester of
the ARP (PC1). When answering the ARP response, modify the source MAC
address of the ARP response packet to the MAC address of the interface.
After PC1 receives the ARP response packet, the IP packet sent from PC1
to PC3 is sent to the Super-VLAN_1 interface. The switch forwards the IP
packet to PC3 via the Super-VLAN_1 interface.
Here, the communication of PC1 PC3 is PC1 L3switch

PC3. The corresponding MAC address of PC3 IP on PC1 is the
interface MAC address of L3 switch. Similarly, the corresponding MAC
address of the PC1 IP on PC3 is the interface MAC of L3 switch. When the
ARP packet sent from PC1 to Super-VLAN_1, write the PC1 information to
ARL (Address Resolution Logic) table. When other PC needs to request the
address of PC1, the Super-VLAN_1 interface directly searches the ARL
table. If the route to PC1 is available and the Sub-VLAN of the requester is
different from the Sub-VLAN of PC1, directly send the ARP response
packet to the requester, but do not need to forward the ARP request
packet in all other VLANs.
After the Super-VLAN_1 interface receives the ARP response packet of PC3,
search the ARL table according to the destination IP address of the ARP
packet. According to the recorded binding information of IP and VLANID,
Port, the switch can get to know which VLAN PC1 is in and from which port
the packet sent to PC1 should be sent out. In this way, the packet does
not need to be forwarded in all other VLANs.
Typical Application
Create Super-VLAN 10 and Sub-VLAN: VLAN 5, VLAN 6, and VLAN 8. Port
0/2 and port 0/3 belong to VLAN 5; port0/0/4 and port0/0/5 belong to
VLAN 6; port0/0/6 and port0/0/7 belong to VLAN 8. The L2 separation is
performed between different VLANs, so all Sub-VLANs use the L3 interface
of the Super-VLAN as the gateway to communicate with the outside, so as
to realize the L3 communication between different Sub-VLANs.

Figure 23-2 Networking
port0/2 and port0/0/3 are added to VLAN5; port0/0/4 and port0/0/5 are
added to VLAN6; port0/0/6 and port0/0/7 are added to VLAN 8. Create
Super-VLAN 10, enable ARP proxy function; add VLAN 5, VLAN 6, and
VLAN 8 to Super-VLAN 10. Create the VLAN interface of Super-VLAN 10
(interface vlan 10) and configured the reasonable IP address on the VLAN
interface. The configurations of the basic functions of Super-VLAN are
complete.

L3 Multicast Technology
This chapter describes the IP multicast theory and the related multicast
protocols. IGMP(Internet Group Management Protocol) is mainly used to
manage the group number relation between the host and the route/switch
device. The dynamic multicast routing protocol is used to maintain the
consistent multicast route table of the whole network. The multicast public
part maintains one multicast forwarding table calculated according to the
multicast route table. When the multicast service packets are received, the
route/switch device searches the multicast forwarding table to confirm
whether to forward the packets and how to forward the packets.
Note The term Route/switch device used in this chapter means the
router or the L3 switch with the routing function.
Main contents:
Introduction to multicast
Related terms of IGMP Protocol
Introduction to IGMP Protocol
Related terms of PIM-SM protocol
Introduction to PIM-DM protocol
Introduction to MSDP protocol
Introduction to Multicast
When the destination address of the information (including data, voice and
video) is one group of users in the network, you can adopt many kinds of
transmission modes. For example, adopt unicast mode and set up one
separate data transmission path for each user; adopt the Broadcast mode
and transmit the information to all users in the network; no matter
whether they need, they all receive the broadcasted information. The
above two modes both waste lots of bandwidth resources. Moreover, the
broadcast mode is not be propitious to the security and confidentiality of

the information. The IP multicast technology solves the problem validly.

The multicast source only sends the information once. Simply speaking, IP
multicast is one technology of saving the bandwidth and it transmits one
separate information flow to multiple receivers at the same time, reducing
the network traffic.
If there are the route/switch devices that do not support the multicast, the
multicast route/switch device can adopt the tunnel mode to encapsulate
the multicast packets in the unicast IP packet and then send it to the
neighboring route/switch device. And then the neighboring multicast
route/switch device removes the unicast IP head and continues to perform
the multicast transmission until reaching the destination.
Related Terms of IP Multicast

ip multicasting: The concept of the IP multicasting is defined in RFC
1112 and RFC 2236, that is, how to send packets to one host. One host
group means multiple devices that share one IP address. The IP multicast
transmission is the same as the IP unicast, adopting the best-effort
transmission mechanism to send packets. This means that for all hosts in
the group, it cannot be sure that the packets can be received correctly in
order.
multicast address: Currently, the address space reserved for IP

multicast is Class D address, which ranges from 224.0.0.0 to
239.255.255.255. The high bits of these addresses are all defined as 1110.
multicast distribution tree: In the multicast model, the source host can
send information to any host that is added to the multicast group. The
path of the IP multicast service packets in the network becomes the
multicast distribution tree, which includes the source tree and shared tree.
source tree: The root of the tree is the multicast information source. The
branches form the distribution tree that reaches the receiving station via
the network. The source tree that runs through the network with the
shortest path is called the shortest path tree (SPT).
shared tree: The shared tree does not use the information source as the
tree root, but adopts some selectable point in the network as the public
root, which is called Rendezvous Point (RP).

reverse path forwarding: When the multicast service packet reaches

the route/switch device, it executes the RPF check on the packets. If
passing the check, forward the packet. Otherwise, discard the packet.
multicast cache: It is also called the multicast route entry. It contains

the valid input and output interface information of the multicast service
packets, which is the evidence of the RPF check. The multicast cache is
generated and updated by the multicast routing protocol.
IP Multicast Address
IP multicast address is used to identify one IP multicast group. IANA
distributes Class D of addresses to multicast, which ranges from 224.0.0.0
to 239.255.255.255. The front four bits of the IP multicast addresses are
all 1110.
D istr ibution of I P Mul ticast Addre sses

The space of the IP multicast addresses are distributed: 224.0.0.0 to
224.0.0.255 are preserved by IANA; the address 224.0.0.0 is reserved.
The other addresses are used by the routing protocols and topology
searching and maintaining protocols. Regardless of TTL, none of the
addresses in the range are forwarded by the route/switch device, that is,
can only be transmitted in LAN. The addresses from 224.0.1.0 to
238.255.255.255 serve as the user multicast addresses, which are valid in
the whole network. The multicast addresses from 239.0.0.0 to
239.255.255.255 are the local management multicast addresses
(administratively scoped addresses), valid only in the specified local range.
M apping from IP Mult icast Address to MAC

Address
IANA distributes the MAC addresses from 01:00:5E:00:00:00 to
01:00:5E:7F:FF:FF to multicast, which requires mapping 28-bit IP
multicast address space to 23-bit MAC address space. The mapping
method is to put the low 23 bits of the multicast address to the low 23 bits
of the MAC address, as follows:

Mapping from multicast address to MAC address
Only 23 bits in the back 28 bits of the IP multicast address are mapped to
the MAC address. In this way, 32 IP multicast addresses are mapped to
one MAC address.
IP Multicast Features
In the common TCP/IP route, the transmission path of one packet is from
the source address to the destination address, adopting the Hop-by-Hop
theory to transmit in the IP network. However, in the IP multicast
environment, the destination address of the packet is not one but one
group, forming the group address. All information receivers are added to
one group. Once being added to the group, the data to the group address
is transmitted to the receivers at once. All members in the group can
receive the packet. Therefore, to receive the packet, it must become the
member of the multicast group first, while the sender of the packet does
not need to be the member in the group. In the multicast environment,
the data is sent to all members in the group and the users those are not
the members in the group do not receive the data.
The IP multicast has the following features:
1. There is no limitation for the location of the group members and the
number of the members. That is to say, the separate host can be
added to or leave the multicast group at any time. The members can
be at any place of Internet. One host can be the member of more than
one multicast groups at one moment;
2. One host can send packets to one multicast group, even the host is
not the member of the group. Transmitting the multicast packet to all
hosts in one multicast group is like the unicast and only needs to send
one packet to the group address;
3. The route/switch device does not need to save the member relation of
all hosts. It only needs to know whether there is host that belongs to
one multicast group on the segment of the physical interface; the host
needs to save the information of the multicast groups to which the
host is added.

IP Multicast Routing Protocol

The multicast protocol includes two parts: One is the Internet Multicast
Management Protocol (IGMP) as the basic signaling protocol of the IP
multicast; the other is the multicast routing protocol of realizing the
selection of the IP multicast path (such as DVMRP, PIM-SM, and PIM-DM).
Int ernet Mult i cast Manage ment Protocol

IGMP defines the mechanism of setting up and maintaining the multicast
member relation between the host and route/switch device (or between
the route/switch devices), which is the basis of the whole IP multicast.
IGMP notifies the member information of the route/switch device group.
The route/switch device uses IGMP to get to know whether there are the
members of the multicast group on the subnet connected to the
route/switch device. The specified application program can know the
information of which data source is sent to which multicast group. If there
is one user in LAN who announces that he is added one multicast group
via IGMP, the multicast route/switch device in the LAN spreads the
information via the multicast routing protocol and at last, the LAN is added
to multicast tree as a branch. After the host as the member of one group
receives the information, the route/switch device periodically queries the
group and checks whether the member of the group takes part in. As long
as one host takes part in, the route/switch device continues forwarding
data. When all users in the group exits the multicast group, the related
branches are deleted from the multicast tree.
M ulticas t Routing Pro tocol

The group address in the multicast is virtual, so it cannot be routed to the
specified destination address directly from the data source, like unicast.
The multicast application program sends the packets to a group of
receivers (multicast address) who hope to receive data, but not only one
receiver (unicast address).
The multicast routing protocol sets up one non-loop data transmission

path from the data source to multiple receivers. The task of the multicast
routing protocol is to construct the multicast distribution tree. The
multicast route/switch device can adopt multiple methods to set up the
path of transmitting data, that is, the distribution tree. According to the
actuality, the multicast routing protocol can be divided to two types, that
is, dense mode and sparse mode.
1. Multicast in dense mode
The multicast routing protocol in dense mode is suitable for small network.
It supposes that each subnet in the network has at least one receiver that
is interested in multicast group, so the multicast packet is spread to all

nodes in the network and the related resources (such as bandwidth and
CPU of route/switch device) are consumed. To reduce the consumption for
the precious network resources, the multicast routing protocol in dense
mode performs the pruning operation for the branches without multicast
data forwarding and only reserves the branches that contain the receivers.
To make the receiver with the multicast data forwarding requirement in

the pruned branch can receive the multicast data flow, the pruned branch
can periodically recover to the forwarding status. To reduce the delay of
waiting for the pruned branch to recover to the forwarding status, the
multicast routing protocol in dense mode uses the graft mechanism to add
into the multicast distribution tree actively. The periodical spreading and
pruning are the features of the protocol in dense mode. Generally
speaking, the forwarding path of the packets in the dense mode is source
tree (one tree with the source as root and the member as leaf). The
typical multicast routing protocols in dense mode are PIM-DM and DVMRP.
2. Multicast in sparse mode
The multicast in sparse mode supposes all machines do not need to

receive the multicast packets, but forward only when there is specified
requirement. To receive the data flow of the specified group, the receiver
must send the adding message to the Rendezvous Point of the group and
the path of the adding message becomes the branch of the shared tree.
When sending the multicast packets, the multicast packets are sent to the
Rendezvous Point and then are forwarded along the shared tree with the
Rendezvous Point as the root and the member as leaf. To prevent the
branch of the shared tree is deleted because of not being updated, the
multicast routing protocol in sparse mode periodically sends the adding
message to the branches, so as to maintain the multicast distribution tree.
To send data to the specified address, the sender first needs to register at
the Rendezvous Point and then sends the data to the Rendezvous Point.
When the data reaches the Rendezvous Point, the multicast packets are
copied and transmitted to the receivers along the distribution tree path.
The copying only happens to the branches of the distribution tree and it
can automatically repeat until the packet reaches the destination.
The typical multicast routing protocol in sparse mode is the PIM-SM in the
sparse mode.
For warding of IP Mult icast Packets

When forwarding the unicast packet, the route/switch device does not care
about the unicast source address, but only cares about the destination

address of the packet. The route/switch device decides to which interface

the unicast packet is forwarded according to the destination address. In
the multicast, the packet is sent to a group of receivers. The receivers are
identified by one logical address. After receiving the multicast service
packet, the route/switch device must confirm the upstream (pointing to
the multicast source) and downstream directions (forward the packet
along the direction of away from the multicast source) according to the
source and destination addresses. The process is called RPF (Reverse Path
Forwarding).
The RPF process uses the original unicast route table to confirm the
upstream and downstream adjacency nodes. Forward the packet to the
downstream only when the packet reaches from the interface (called RPF
interface) of the upstream adjacency node. The RPF can be used to
forward the packets correctly according to the configuration of the
multicast route and avoid the loop caused because of various reasons.
Avoiding the loop is an important problem in the multicast routing. The
main body of RPF is RPF check. After receiving the multicast packet, the
route/switch device first performs the RPF check. The packet can be
forwarded only after passing the check. Otherwise, drop the packet. The
process of the RPF check is as follows:
1. The route/switch device searches for the multicast source or the RPF
interface of RP in the unicast route table. When the source tree is used,
search for the RPF interface of the multicast source; when the shared
tree is used, search for the RPF interface of RP. The RPF interface of
one address means the output interface when sending the IP unicast
packet from the route/switch device to the address;
2. If the multicast packet is received from the RPF interface, the RPF
check is passed. If the multicast packet passes the RPF check, the
route/switch device forwards the packet to the downstream interface.
Otherwise, drop the packet.
The following figure shows the RPF check process when the source tree is
used.

RPF check
The route/switch device E receives one multicast packet from the S0

interface. The source address of the packet belongs to Source Segment.
The route/switch device E checks the route table and finds that the output
interface that reaches Source Segment is S1, so drop the packet. If the
multicast packet reaches from the S1 interface, the reaching interface is
consistent with the interface searched from the table and the route/switch
device forwards the packet.
From the RPF check process, we can see that the RPF check uses the
interface of the shortest path from the route/switch device to the multicast
source or RP, so it is called Reverse Path Forwarding.
IP Multicast Application
Inf orma tion Distr ibution
IP multicast makes the data in the company can be distributed to lots of
users. For example, one company with several chain stores can use the
multicast to transmit the price information to the cash registers of the
chain stores or the media provides the onsite real-time information to the
users that support multicast via Internet, such as the remote employee
management and remote education.
D ata Broadcast
The traditional data broadcast is based on the broadcast and occupies lots
of Internet bandwidth. With the multicast technology, the TV and radio not
only can broadcast the programs to the users that really need the data,
but also can reduce the maintenance costs of the network.
Related Terms of IGMP Protocol

Internet Group Management Protocol: IGMP makes the IP host can
report to which host group the neighboring multicast route/switch device
belongs. IGMP is one part of the Internet protocol stack, so the IGMP
message is encapsulated in the IP packet.
IGMP querier: The IGMP querier can send the IGMP query packets
regularly to query whether there is the host member that is applying for
adding to the multicast group in the LAN of the route/switch device.

Besides, in the version 2, IGMP querier sends the query packet of the
specified group for the IGMP leave message of one group member; in
version 3, send the query packet of the specified source for the specified
multicast source. Usually, the host does not generate the query packet
and it returns one group member qualification report packet as desired
only when receiving the query packet.
Introduction to IGMP Protocol

The IGMP protocol is to set up and maintain the group member relation
between the host and the route/switch device. The IGMP protocol runs on
the host and the multicast route/switch device directly connected to the
host. The function of the IGMP protocol is bi-directional: On one hand,
with the IGMP protocol, the host informs the local route/switch device that
it hopes to add into one multicast group and receive the information of the
multicast group; on the other hand, the route/switch device periodically
queries whether the members of one known group in the LAN are in the
active state via the IGMP protocol, that is, whether the segment has the
member that belongs to one multicast group to collect and maintain the
group member relation of the connected network. From the information
recorded in the route/switch device via IGMP, you can get to know
whether one multicast group has the group member at the local, but not
the corresponding relation between the multicast group and the host.
Currently, there are three versions of IGMP: IGMPv1(RFC1112) defines the
process of querying and reporting the basic group member; IGMPv2 is
defined by RFC2236 and is added with the fast leave mechanism of the
group member based on IGMPv1; IGMPv3 is defined by RFC3376 and the
added function is that the member can specify to receive or not receive
the packets of some multicast sources.
IGMP Protocol Theory

The following takes IGMPv2 as an example to describe, as follows:
IGMPv2 work theory

When there are multiple multicast route/switch devices in one segment,

IGMPv2 chooses one unique querier via the querier selection mechanism.
The querier periodically sends the common group query message to query
the member relation; the host sends the report message to answer the
query. The time of the host sending the report message is random. When
there is other member to send the same message in the same segment,
suppress its own response packet. If there are new hosts to add into the
multicast group, do not need to wait for the query message of the querier,
but actively send the report message. To leave the multicast group, the
host sends the leave group message; after receiving the leave group
message, the querier sends the query message of the specified group to
confirm whether all group members have leaved. For the route/switch
device as the group member, its action is the same as the common host,
answering the query of the other route/switch device.
With the above mechanism, set up one table in the multicast route/switch
device, which records which subnets of the interfaces on the route/switch
device have the multicast group of the active member, as well as one
timer for each multicast group. Besides, the table records one member of
the multicast group, but does not need to record all members. After the
route/switch device receives the packets of one group G, forward the
packet only to the interfaces with the member of group G. How to forward
the packet between the route/switch devices depends on the multicast
routing protocol, which is not the function of the IGMP protocol.
IGMP V1
Packet For mat
IGMP is one part of IP. The IGMP packet is encapsulated in the IP packet.
The protocol number of the IP packet is 2. The IGMP packet uses TTL 1 to
transmit and includes the IP route checksum in the IP head.
version type unused checksum

group address
Version number: 1
Type: When it is 1, it indicates the query packet of the member relation;

when it is 2, it indicates the report packet of the member relation;
Unused: During sending, it is set as 0; during receiving, it is omitted;
Checksum: Perform the 16-bit complement arithmetic for the

complementing sum of the 8-byte IGMP message;
Group address: It is 0 for the query packet of the member relation; it is

the IP multicast address of the reported group for the report packet of the
member relation;

Q uer y- Response Process

The route/switch device sends the query packet to 224.0.0.1 (all hosts in
the network);
The host that receives the packet fills the address of multicast group to
which it is added in the report packet and multicasts the packet to the
multicast address;
After other hosts added to the multicast group receive the multicast
packet, suppress the sending of its own report packet;
Therefore, the IGMP querier route/switch device only records to which

multicast groups one interface of the device is added and it must record
which hosts are added to the multicast group.
R esponse Packet Supp ression

After the host receives the query packet, it does not answer at once, but
delays 0-10s. This can avoid the response storm. Besides, the host the
opportunity to receive the response packet notified by other host, so as to
suppress the sending of the local packet.
Acti ve Adding Process

When the host is added to one multicast group for the first time, it can
actively notify one report packet of the IGMP member relation when not be
queried, so as to add into the multicast group.
Process o f Leavi ng Mul ticast G roup

IGMP V1 does not have the packet of leaving the multicast group. When
the route/switch device does not receive the response packet within three
times of the query interval, delete the multicast group.
IGMP V2
I mpro ve ment Co mpared with V 1
Query selection process
Maximum response time field
The message of querying the specified group

The message of leaving group
Packet For mat

type Max Resp Time checksum
group address
Type: type;
Three kinds of IGMP message is related with the interaction between the
host and the route/switch device:
1. 0x11 = Membership Query
There are two sub types of the membership query:
- General query, used to understand whether one group has members in

the neighboring network.
- Specified group query, used to understand whether the specified group

has the members in the neighboring network.
The two messages use the group address to distinguish. For the general
query, the group address is 0; for the specified group query, the group
address contains the multicast group address to be queried.
2. 0x16 = Version 2 Membership Report
3. 0x17 = Leave Group
To be compatible with IGMP v1, there is one additional message type:
0x12 = V1 member report
Max Resp Time: maximum response time
The maximum response time domain is valid only in the member relation
query. It defines the maximum waiting time before answering the
membership query (the unit is 1/10s). In all other messages, the sender
sets it as 0, while the receiver ignores the domain.

After the host receives the query packet, it sets one delay timer for each
group. The value of the timer is selected from o to the maximum response
time defined in the query packet. After the timer of the group arrives, the

host multicasts one V2 member report to the group and the TTL is 1. If the
host receives the report of another host (the version is 1 or 2), but its own
timer does not arrive, it stops the timer and does not send report, which
reduces the repeated report.
When receiving the membership report, the route/switch device adds the
group to the member list of the multicast group and sets one timer with
value as Group Membership Interval (GMI) for it. Receiving the report of
the group results in the updating of the timer. If the timer times out, the
route/switch device regards that there is no local group member and does
not need to forward the multicast packet for the group on the neighboring
network.
When the host is added to the multicast group, send one V2 membership
report at once, avoiding that it is the first member of the group on the
network. The report may be lost, so the host needs to re-send the
membership report for at least one time after Unsolicited Report Interval
(URI).
I G MPv2 Lea ve Group Infor mat ion

IGMP V2 adds one feature, that is, the leave group information. In IGMP
V1, the host leaves stealthily and does not send any message. In V2,
when the host leaves one multicast group, it sends one leave message to
all route/switch device multicast group address (224.0.0.2).
When the querier receives the leave group message of the group member,
send the specified group query to the group that is to leave, so as to
confirm whether there are other active group members in the subnet.
Other active group members answer the membership report. If there is no
any report message at the last member query period, the route/switch
device regards that the group does not have the local member.
Inter-operation of V1 and V2
R oute/ Switc h D e vice Ser vi ng as Mul ticast Host
If the route/switch device that supports V2 receives the V1 IGMP
membership query, the route/switch device turns to the status that the
current queried route/switch device is V1 and sets one timer. As long as
the timer receives the V1 membership query, it resets. If the timer times
out, the route/switch device returns to the V2 status.

D e vice is M ulticast Rou te/ Switch De vi ce

If there is the V1 group member in the subnet, the V1 host cannot identify
the specified group query, so it must ignore the leave message of V2 host
and does not process the leave. If there is the V1 route/switch device in
the subnet, configure all route/switch devices in the subnet as V1.
IGMP V3
I mpro ve ment Co mpared with V1 and V2
Add the private member report packet of V3; one packet can report
multiple group records and each group record can indicate which
sources to be received or refused;
The member report is sent to all IGMP V3 route/switch device groups

(224.0.0.22);
Add the specified source query;
When the source quantity in the query packet is 0, the query packet
length is 4 bytes more than the V2 packet;
The Max Resp Code field; when the number is larger than 128, you
can perform the floating-point transformation to get Max Resp Time;
With the INCLUDE and EXCLUDE filtering mode, unify the formats of
the member report packet and leave packet;
Packet For mat

There are two packet types of the IGMPv3 protocol:
0x11: member query packet
0x22: V3 member report packet
The format is as follows:
1. IGMP V3 query packet:
Type = 0x11 Max Resp Code Checksum
Group Address
Resv S QRV QQIC Number of Sources (N)
Source Address [1]

Source Address [2]
Source Address [N]
Type: type;
Max Resp Code: maximum response time;
The actual used is Max Resp Time (the unit is s1/10s). The relation of Max
Resp Time and Max Resp Code is as follows:
If Max Resp Code < 128, Max Resp Time = Max Resp Code;
If Max Resp Code >= 128, Max Resp Code indicates one floating value of
the following format :
01234567
+-+-+-+-+-+-+-+-+
|1| exp | mant |
+-+-+-+-+-+-+-+-+
So, Max Resp Time = (mant | 0x10) << (exp + 3)
Checksum: the parity sum of the IGMP packet;
Group Address: group address;
When sending the general query, the group address is 0; when sending
the specified group and specified source query, it is the group address;
Resv: the reserved domain; during sending, it is 0; during receiving, it is

omitted;
S: the flag S;
When it is set as 1, it indicates that the multicast route/switch device

suppresses the updating of the timer when receiving the query, but it does
not suppress the selection of the querier or the processing of the host (the
route/switch device also can also serve as the group member) for the
received query.
QRV: Querier's Robustness Variable
When QRV is not 0, it indicates the robustness variable used by the

route/switch device that sends the query; when QRV exceeds 7, it is
processed as 0 and the route/switch device uses the QRV in the latest
received query as its own robustness variable; if the QRV of the received
as 0, use the local default robustness variable.

QQIC: Querier's Query Interval Code
The actual used is QQI. The relation between QQI and QQIC is similar to
Max Resp Code. When smaller than 128, QQI = QQIC; when larger than
128, it is processed as the floating value.
Number of Sources (N): the number of the queried sources in the query.
For the general query and specified group query, N is 0; for the specified
source query, N is not 0. The N value is limited by the MTU of the network.
Source Address [N]: the source address;
2. IGMP V3 member report packet:
Type = 0x22 Reserved Checksum

Reserved Number of Group Records (M)
Group Record [1]
Group Record [2]
Group Record [M]
Type: type
Reserved: the reserved value. During sending, it is o; during receiving, it

is omitted;
Checksum: the parity sum of the IGMP packet;
Number of Group Records (M): the number of the group records;
Group Record: the group record;
The format of the group record is as follows:
Record Type Aux Data Len Number of Sources (N)
Multicast Address
Source Address [1]
Source Address [2]

Source Address [N]
Record Type: the group record type; the value range is 1-6, whose
meanings are as follows:
1: MODE_IS_INCLUDE, indicating that the filtering mode of the interface

on the host is INCLUDE mode. The source list in the record is the source
list maintained on the host. The host is interested in the source of the
source list.
2: MODE_IS_EXCLUDE, indicating that the filtering mode of the interface

on the host is EXCLUDE mode. The source list in the record is the source
list maintained on the host. The host is not interested in the source of the
source list.
3: CHANGE_TO_INCLUDE_MODE, indicating that the interface of he host

becomes the INCLUDE mode. The source list contains the new interested
source list maintained on the interface;
4: CHANGE_TO_EXCLUDE_MODE, indicating the interface of he host

becomes the EXCLUDE mode. The source list contains the new
uninterested source list maintained on the interface;
5: ALLOW_NEW_SOURCES, indicating that the source in the source list is

the new interested source on the host;
6: BLOCK_OLD_SOURCES, indicating that the source in the source list is

the old interested source on the host;
Here, the group records of type 1 and 2 are the current status group
records. the group records of type 3 and 4 are the status change group
records.
Besides, IGMPv3 supports the packet types of V1 and V2, as follows:
0x12: the member report of V1
0x16: the member report of V2
0x17: the leave packet of V2

The multicast route/switch device periodically sends the general query to
get the IGMP member information of the local network. After receiving the
general query, the host collects its own group information, including the
interested or un-interested source list, to fill in the current status group
record and returns the IGMP V3 member report (sent to all IGMP v3
route/switch device group 224.0.0.22) to the route/switch device.

When the group information or source information of the host changes

(maybe the filter mode changes or the source list changes), the host fills
the change information into the group record whose status changes and
then actively sends the IGMP V3 member report to the route/switch device.
After the route/switch device receives the member report, refresh the local
group and source status. Before the filter mode of the group maintained
by the route/switch device changes from EXCLUDE to INCLUDE, send the
specified group query, which is reflected on IGMPv2 as sending the
specified group query before the local un-interested group is deleted. For
the local un-interested source, send the specified source query before
deleting the source. Generally, send the specified group or specified
source query only after receiving the group record whose status changes.
For the current status record, do not send the specified group or specified
source query.
I G MP Status Infor mat ion on Route /Swi tch

D e vice
Each interface and each group on the route/switch device has one group
status. The group status comprises the group address, the filter mode
(INCLUDE / EXCLUDE), source list and group timer.
Each source in the source list of each group has one source status,
comprising the source address and the source timer.
When all sources of one group are interested, the group status is EXCLUDE
and the source list is null.
When there is no IS_EX or TO_EX report in the network, the filter mode of
the group status is INCLUDE; when receiving the IS_EX or TO_EX report,
the filter mode of the group status changes to EXCLUDE.
When the group is EXCLUDE, there are two source lists. One is the list of
the confirmed un-interested and confirmed un-forwarded sources in the
network; the other is the list of the un-interested sources and maybe
interested sources or confirmed interested sources (when turning to
INCLUDE, these sources are needed). The packets of the source from the
list are forwarded.
When the group is INCLUDE, there is only one list, that is, the list
comprising the sources that need to be forwarded. When the timer of the
source in the list times out, the list is empty and the group is deleted.

The group timer runs only in the EXCLUDE filter mode. In the INCLUDE
mode, only the source timer runs. When the source timer times out, the
group is deleted. When the group timer times out, the filter mode of the
group switches from EXCLUDE to INCLUDE.
Only the source whose packet can be forwarded has the source timer. The
source in the source list that is not forwarded in the EXCLUDE mode does
not have the source timer. When the source timer times out and if the
group is the INCLUDE mode, delete the source; if the group is EXCLUDE
mode, remove the source from the forwarding source list to the un-
forwarding source list.
C ompa tibi lit y of IG M P V3 wi th V 1 and V2

The distinguishing of the query packet version:
IGMPv1 query: The length of the query packet is 8 bytes and Max Resp
Code is 0;
IGMPv2 query: The length of the query packet is 8 bytes and Max Resp
Code is not 0;
IGMPv3 query: The length of the query packet is larger than or equal to 12
bytes.
The distinguishing of the member report packet:
The member report packets of IGMPv1 and v2 are sent to the added group.
The leave packets of IGMPv2 are sent to all route/switch device groups
(224.0.0.2); the member report packets of IGMPv3 are sent to all IGMPv3
route/switch device groups (224.0.0.22).
When the route/switch device of IGMP v3 receives the query packets sent
by the route/switch device with lower version, the route/switch device can
be configured as IGMP v1 or IGMP v2 artificially. If not configured as the
lower version, the alarm information appears.
The IGMP v3 processes the member report packets of v1 and v2 as

IS_EX{} packet, and the leave packets of v2 as the TO_IN{} packets.
Meanwhile, set one timer (v1 and v2 have one separate timer) of the host
with old version for each group.
When the timer of v2 host runs, do not process the BLOCK record of the
group and all TO_EX packets are processed as TO_EX{};

When the timer of v1 host runs, do not process the BLOCK record, all
TO_EX packets are processed as TO_EX{} and do not process the TO_IN{}
record of the v2 leave packet.
When the timer of the v1 host of the group times out and if the timer of v2
host does not run, the processing of the group returns to the v3
processing mode. Otherwise, adopt the v2 processing mode; when the
timer of the v2 host, return to the v3 processing mode.
Related Terms of PIM-SM

Protocol
BSR: BootStrap Router, used to send RP information in PIM-SM v2;
DR: Designated Router, used to forward the multicast packets and send
adding/pruning and registering messages in the multi-path access network
(such as Ethernet);
IGMP: Internet Group Management Protocol;
PIM: Protocol Independent Multicast;
PIM-DM: Protocol Independent Multicast-Dense Mode;
PIM-SM: Protocol Independent Multicast-Sparse Mode;
RP: Rendezvous Point, the tree root of the shared tree;
RPF: Reverse Path Forwarding;
SPT: Shortest Path Tree, the shortest path to the source;
Introduction to PIM-SM Protocol

PIM-SM is similar to PIM-DM, adopting any one IP routing protocol (RIP,
IRMP, STATIC and OSPF) to decide the RPF interface. The most important
difference between PIM-SM and PIM-DM is that PIM-SM adopts the pulling
mode, while PIM-DM adopts the pushing mode. The pulling mode supposes
that the multicast is not needed. The multicast information is not sent to
the receiving station unless explicitly adding.

Basic Hierarchy of PIM-SM in TCP/IP

Protocol Stack
Basic hierarchy of PIM-SM in TCP/IP protocol stack
The PIM-SM protocol is at the upper layer of the IP protocol and

communicates with IP via the original socket. The protocol number of PIM-
SM in the IP packet is 103.
PIM-SM Protocol
In the PIM-SM domain, the route/switch device that runs the PIM-SM
protocol periodically sends the Hello message, which is used to discover
the neighboring PIM route/switch device and is responsible for selecting
DR in the multi-path access network. Here, DR is responsible for sending
the adding/pruning message and registering message.
PIM-SM sets up the multicast distribution tree to forward the multicast

packets. The multicast distribution tree includes two kinds, that is, the
shared tree with RP of group G as the root and the source tree with the
multicast source as the root. PIM-SM sets up and maintains the multicast
distribution tree via the explicit adding/pruning mechanism.
When there is the active member of the group G in the direct-connected

network of DR, send the multicast adding message hop by hop along the
RP direction of the group G to add into the shared tree (No. 1 in the
following figure). When the adding go upstream along the shared tree, the
route/switch devices on the way set up the multicast forwarding status
(No. 2 in the following figure), that is, route option. The route option
includes the fields of the source address, group address, input interface of
the multicast packet, output interface list of the multicast packet, timer
and flag so that the route/switch device can forward the received multicast
data along the tree. When the pruning message goes upstream along the
shared tree, the route/switch device on the way updates its route options,
such as the output interface. If the branches of the distribution tree are
not updated, they are deleted after timeout. To avoid this problem, the

route/switch device on the distribution tree periodically sends the

adding/pruning message to the RP of the group, so as to maintain the
multicast distribution tree status.
When the source host sends the multicast data to the group, the source
data is encapsulated in the register message and then DR unicasts it RP
(No. 5 in the following figure). RP encapsulates the register message as
packet and forwards it to the group members along the shared tree. And
then, RP can send the adding/pruning message (No. 3 in the following
figure) for the specified source along the source direction to add into the
shortest path tree of the source. In this way, the packet is sent to RP
without being encapsulated along the shortest path tree. When the
multicast packet reaches along the shortest path, RP sends the register-
stop message to the DR of the source, so as to make DR stop the
registering and encapsulating process. Hereafter, the multicast data of the
source is not registered or encapsulated any more, but is sent to RP (AB
RP) along the shortest path tree of the source, and then RP forwards the
packet to the shared tree. At last, the packet is sent to the group
members along the shared tree (RPCE).
Work process a of PIM-SM protocol
If reaching a certain data transmission rate, DR can send the explicit

adding message to add into the shortest path tree of the source (No. 9 in
the following figure) and the multicast packet is forwarded along the
shortest path tree. And then DR updates the shared tree and deletes the
corresponding shared forwarding route (No. 8 in the following figure).

Work process b of PIM-SM protocol
PIM-SM refers to the selection mechanism of BSR and RP. One or multiple
Candidate-BSRs are configured in the PIM-SM domain and use some rule
to select the public unique BSR of the domain. Candidate-RP is also
configured in the PIM-SM domain. The Candidate-RPs unicast the packets
that contain the information about their addresses and the multicast
groups that can be served to the BSR, and then BSR regularly generates
the BootStrap messages that contain a series of Candidate-RPs can
corresponding group addresses. The BootStrap messages are sent hop by
hop in the whole domain. The route/switch device receives and saves the
BootStrap messages. Id DR receives the IGMP adding packets from the
direct-connected host and it does not have the route option of the group,
use the hash algorithm to map the group address to one candidate RP,
and multicast the adding/pruning message hop by hop along the RP
direction. If DR receives the multicast packets from the direct-connected
host, and it does not have the route option of the group, use the hash
algorithm to map the group address to one candidate RP and then
encapsulate the multicast data in the register message and unicast it to RP.
In the multi-path access network, PIM-SM brings in the following

mechanism: use the assert mechanism to select the unique forwarder,
avoiding the repeated forwarding of the multicast packet in the same
segment; use the adding/pruning suppression mechanism to reduce the
redundant adding/pruning message; use the pruning deny mechanism to
deny the un-necessary pruning.
D R Select ion
The rules of selecting DR are as follows:
1. If the PIM Hello packets of all neighbor route/switch devices on one

interface carry the priority field, first compare the priority values. The
larger the value, the higher the priority. If there are multiple
route/switch devices with the same priority, select the one with largest
IP address as DR;

2. If the interface has one neighbor route/switch device whose PIM Hello
packets do not carry the priority field, select DR according to the IP
address, that is, select the one with the largest IP address as DR.
B S R Selecti on
At first, the route/switch device configured as the candidate-BSR enters
the Pending-BSR status; set the Bootstrap timer as the random veto value
(5s-23s) and begin to monitor the Bootstrap message.
The Bootstrap message contains the priority and IP address of the

message initiator. When the route/switch device in the Pending-BSR status
receives one Bootstrap message, it compares the priority and IP address
of the message with the its own priority and IP address. If the message
initiator is better, it enters the candidate-BSR status, and set the
Bootstrap timer as the Bootstrap timeout value (130s). If the route/switch
device is better, it does not perform the further processing. When the
Bootstrap timer of the route/switch device in the Pending-BSR status
times out, it enters the Selected-BSR status, send the Bootstrap message
and set the Bootstrap timer as the Bootstrap period value (60s). If the
priorities are the same, the one with larger IP address is better.
When the Bootstrap timer of the route/switch device in the candidate-BSR

state times out, it enters the pending-BSR status, set the Bootstrap timer
as the random veto value and enter a new BSR selection process. When
the route/switch device in the candidate-BSR status receives one better
Bootstrap message, it sets the Bootstrap timer as the Bootstrap timeout
value and still keeps the candidate-BSR status.
When the Bootstrap timer of the route/switch device in the selected-BSR

status, it sends the Bootstrap message, set the Bootstrap timer as the
Bootstrap period value and keep the selected-BSR status. When the
route/switch device in the selected-BSR status receives one poorer
Bootstrap message, it sends the Bootstrap message, set the Bootstrap
timer as the Bootstrap period value and still keep the selected-BSR status.
When the route/switch device in the selected-BSR status receives one
poorer Bootstrap message, it enters the candidate-BSR status and sets the
Bootstrap timer as the Bootstrap timeout value.
Bootstrap message adopts the all PIM routers multicast group address
224.0.0.13 and TTL is set as 1. When one PIM route/switch device
receives the Bootstrap message, it sends the message on all interfaces
(except for the receiving interface). The process not only can ensure that
the Bootstrap message is spread to the multicast domain, but also can

ensure that each PIM route/switch device can receive the packet, so as to
know which route/switch device is BSR.
R P Select ion
One route/switch device can be configured as the candidate-RP (C-RP) of
some specified multicast group or all multicast groups. After receiving the
Bootstrap message and getting to know the BSR location, C-RP transmits
Candidate-RP-Advertisement message to BSR via unicast. The message
has the RP address of the message initiator, its priority and the multicast
group address of C-RP.
BSR clears up all C-RPs, lists their priorities and their groups and forms
the RP set. BSR declares the RP set to the whole multicast domain via the
Bootstrap message. The Bootstrap message includes one 8-bit hash mask.
When one route/switch device receives the IGMP message or PIM join
message and one shared tree needs to be added, it checks the RP set got
from BSR. With the specified hash algorithm, select the RP for the
multicast group.
Int roduction to PI M SSM

PIM SSM is short for Protocol Independent Multicast ----Source Specific
Multicast. PIM SSM is the specified source multicast of PIM, that is,
perform the special processing for the multicast services in the
232.0.0.0/8 address range of IPv4. Performing the multicast service with
group address in SSM needs to complete the related SPT operations. The
discovery of source S is realized via outband, that is, do not use the PIM
message (such as register message). SSM needs the supporting of IGMPv3,
because IGMPv3 can send the IGMP member reports of the specified
source and group at the same time. PIM SSM mode can run on one device
with PIM SM at the same time, but also can run on one device separately,
which depends on the protocol.
Introduction to PIM-DM Protocol

PIM-DM is short for Protocol Independent Multicast-Dense Mode. Same as
PIM-SM, PIM-DM is at the upper layer of the IP protocol and communicates
with IP via the original socket. The protocol number of PIM-DM in the IP
packet is 103. The TTL of the sent PIM-DM protocol packet is always 1,
that is, the transmission distance is only one hop.

The basic hierarchy of PIM-DM in TCP/IP protocol stack
PIM-DM Protocol
PIM-DM application topology
N eighbor Se tup
After the PIM route/switch device starts, it periodically (by default, it 30s)
sends the hello packets to the route/switch device (sent to all PIM router
groups 224.0.0.13) to set up the neighbor relation. The route/switch
device that receives the hello packet adds the route/switch device that
sends the hello packet to the neighbor list and enables one timer for it.
The value of the timer is the value in the holdtime domain in the hello
packet.

Spreading and Pruning Process o f Ser vice

Packets
When the source appears, send one (S, G) service packet to the network.
At the beginning, the packet is spread to the every corner of the network.
When the route/switch device receives the service packet, set up the (S
G) entry for it and record the input interface and the other are regarded as
the downstream interface. As shown in the figure, C receives the service
packets of A and B, but there can only be one input interface. Select the
route with the smallest cost as the input interface according to the cost of
the route to the source via unicast, but the other sends the pruning
information to prune it.
When the service packet is transmitted from E to I and I finds that itself
does not have the downstream neighbor or local group member and the
egress port is empty, I sends the pruning message to the upstream E
(note: the pruning is sent out from the input interface and the destination
address is the address of the group to be pruned) to ask for pruning. Here,
E finds that it has only one neighbor (such as the point-to-point
connection between E and I), E prunes I at once after receiving the
pruning of I. After pruning, E finds that its egress port is empty, it
continues to send the pruning upstream. After receiving the pruning of E,
C finds that there is local group member (refer to IGMP) in the network, so
ignore the pruning of E.
When the service packet is transmitted to the network from F, G receives

the packet and finds that itself does not have egress port, so forward
pruning upstream. Here, there is no other route/switch device in the
network, so F enables the pruning delay timer; H has the local group
member and egress port, and it needs to receive the service packet. When
H audits the pruning information of G (because the pruning sent by G is
transmitted to the group that needs pruning), it enables one deny timer.
When the timer times out, send the adding packet upstream (sent to the
desired group) and inform F that the service packet needs to be received
in the network and the egress port cannot be pruned. Therefore, after F
receives the adding packet, continue to keep the status of forwarding the
service packet.
G raft ing Process

If I has local group member to add and has egress port, it sends unicast
graft packet to E; after receiving the unicast graft packet, E returns one
Graft ACK to change the downstream interface status to the forwarding
status; after I receives the Graft ACK, it changes its upstream interface to
the forwarding status and when there is packet, it can forward. Here, E
finds that itself has the egress port, so send the graft via unicast to the
upstream RPF neighbor (such as C, suppose C is the declared winner); C
returns one Graft ACK to E and then the upstream of E changes to the

forwarding status. C is forwarding the service packet, so E with the

upstream interface in the forwarding status receives the service packet ad
forwards it. In this way, the service packet is transmitted to the new
added local group.
D eclar ing Process

As shown in the figure, because of the spreading of the service packet, the
route/switch device E may receive the service packet forwarded by C and
D, which results in the information redundancy. Therefore, C and D need
one declaring process and then D receives the service packet forwarded by
C at its egress port, which causes the declaring process. Similarly, C sends
the declare packet and they compare to select one winner to forward
service packets.
Status Re freshing Process

The PIM-DM protocol is one typical spreading and pruning protocol. After
the pruning timer times out, the packet is spread to the network. To
reduce the cost of the frequent spreading-pruning process, PIM-DM uses
the status refreshing mechanism to maintain the pruning status in the
network. The status refresh message (SRM) is generated by the route-
switch device directly connected to the source and is sent to all
downstream neighbors in the network. After the downstream neighbor
receives the status refresh packet, make the response according to the
contents of the packet (for example, if the status refresh packet shows
that A sends pruning, while C needs to forward the packets, so C sends
out adding message to A; if A is in the forwarding status, while C does not
have the egress port, C sends the pruning information to A), refresh the
pruning timer of the egress port with the downstream neighbor, modify
the status refresh packet according to its own information, and forward
the modified status refresh packet (such as E in the figure, the egress port
is in the pruning status; after receiving the status refresh packet, refresh
the pruning timer of the egress port, fill in its own information and send
the status refresh packet to I; after receiving the status refresh packet, I
finds that E is in the pruning status and its own ingress interface is also in
the pruning status, and I does not have other downstream neighbor, so do
not do anything).

Introduction to MSDP Protocol

Overview
MSDP application topology
In the PIM-SM mode, if one source begins to send multicast service flow,
the first hop DR connected to the source registers the source information
to RP. In this way, the RP in PIM-SM can always know the source
information of all multicast service flows in the domain. In actual
application, to meet the network management requirements, divide the
whole network to multiple PIM domains and each domain has its own RP,
which is used to manage the source information of all multicast service
flows in the domain. Usually, the RP in the domain cannot know the source
information of other PIM domains, so it cannot receive the multicast
service flow of other domains. However, to meet the use requirements,
the users belonging to different domains hope to receive the multicast
service flow of other domains. To provide all multicast service flows, one
domain must depend on the RPs of other domains, which is not hoped by
the carriers. MSDP appears to solve the problem.
Multicast Source Discovery Protocol (MSDP) makes each MSDP domain

have its own RP and send multicast service flow to other domains or
receive multicast service flow from other domains.
MSDP sets up the peer connection relation between domains. The defined
information exchanging makes RPs of the domains share the active source
information in the network. Meanwhile, the RP of each domain maintains
the receiver information of its own domain. Therefore, for the multicast
service flow with receiver, RP can directly initiate adding to the source and
does not depend on the RPs of other domains. After the service flow is
referenced to RP via the source tree (SPT), RP transmit the service flow to
the receivers in the domain via the sharing tree (RPT). In this way, the
multicast service flow can be transmitted in the domain without depending
on the RP of other domain.
The MSDP peer relation is set up between the RPs of the domains via the
TCP connection. When the RP of one domain learns the new active source
in the domain, it sends SA (source-Active) message to all peer ends that
set up the peer relation with it. The peer end of MSDP adopts the
improved RPF to check whether to accept the SA message sent from other
peer end. After receiving the SA message, forward it to other peer ends

until all MSDP routers in the network receive the SA message. If the RP
that receives the SA sets up the (*, G) item, RP sets up the (S,G) item
and adds it to the source via SPT, importing the service flow to the domain.
The left is processed by the PIM-SM protocol. Besides, MSDP router
periodically sends out the source information in its own domain via the SA
message, letting the MSDP peer ends of all other domains know that the
source is sending service flow.
Setup of MSDP peer

After configuring the MSDP peer, confirm the connection status according
to the address used to set up connection with peer and the size of the
peer address. Set up the passive connection for the large address and set
up the active connection for small address. The passive connection side
must send the MSDP message to the active connection side. Without the
MSDP message, send the keepalive message to prevent the active side
from resetting the connection. After the connection is set up, form the
MSDP peer relation.
Sending of Source Active Message

After MSDP gets the multicast source information from PIM, send the
Source Active message to the connected MSDP peer and notify the
multicast source information to the peer. After the peer MSDP receives the
multicast source information, notify the information to the PIM module, so
as to realize the cross-domain multicast-on-demand.
MSDP Application
Int er -doma in MSD P
PIM-SM can be regarded as one multicast IGP protocol, because it is
supposed to run in one single domain. How to cross the AS boundary to
distribute the multicast packets and maintain the autonomy of each AS at
the same time is the problem of PIM-SM. PMBR (PIM Multicast Border
Router) in the PIM-SM protocol is used to solve the problem. PMBR is
located at the edge of AS and sets up the branches for all RPs in the AS.
Each branch is expressed by (**RP). The wildcard indicates all source
and group addresses mapped to the RP. When RP receives the traffic from
the source, forward the traffic to PMBR, and then PMBR forwards the
traffic to other domain. When the adjacency domain does not need the
traffic, send pruning to PMBR, and then PMBR sends the pruning to RP, as
follows:

PMBR solution
The key disadvantage of PMBR is the flooding and pruning actions.

Moreover, PMBR is designed to connect the PIM-SM domain to the DVMRP
domain. Therefore, PMBR is not the good method of solving the above
problem.
To solve the above problem, the following two problems need to be solved:
1. When the source is in one domain, but the group member is in the
another domain, RPF process must keep valid;
2. To keep autonomy, the domain cannot trust the RP in another domain;
PIM can use the BGP route to decide the RPF to other domain, but when
the unicast and multicast use different links, RPF check may fail. The static
multicast route can be used to prevent the RPF problem, but using the
static multicast route in a large range is not realistic. MBGP expanded from
BGP can solve the problem. In this way, problem can be solved via MBGP.
The reason of solving problem 2 is that AS (managed by different ISPs)

does not hope to depend on the uncontrollable RP (in other domain or
managed by other ISP). If each AS sets its own RP, there must be protocol
to make multiple RPs cross the AS boundary to share the information and
discover the known source information of other RP, as follows:

Inter-domain MSDP
MSDP shares the known source information of the RPs between different
AS via interaction. PIM-SM feels that the shared sources are in the same
domain. In this way, the receiver only depends on the RP in the local
domain, realizing the AS autonomy.
Int ra -doma in MSD P

To solve the PIM-SM problem between domains, we have to talk about the
problem in the same PIM-SM domain, that is, Anycast RP (Anycast means
that when one packet is sent to one single address, one of multiple
devices responds the address).
Placing RP in a large dispersed PIM-SM domain is a headache. PIM-SM

only permits the group-RP mapping, so the following problems may appear:
1. Traffic bottleneck;
2. Lack the expansible register de-encapsulation (when using the shared

tree)
3. When the activated RP fails, the recovery of the fault is slow;
4. The multicast packet may be forwarded with secondary priority;
5. Depend on the remote RP;
The hash algorithm and auto RP filtering of the PIMv2 BootStrap protocol
can relieve the above problems, but cannot provide the scheme of solving
the problems completely. Anycast RP is the method of permitting one

single group to be mapped to multiple RPs. The RPs can distributed in the
whole domain and use the same RP address. As a result, the virtual RP is
generated, while MSDP is the basis of generating the virtual RP.
As shown in Figure 3, four route/switch devices form the virtual RP;

release one single RP address 10.100.254.1; use MSDP to exchange the
information of the sources registered on each route/switch device. But all
route/switch devices run the auto RP and release the RPA address of
10.100.254.1. The source DR in the domain has the information of only
one RP address and is registered on the nearest physical RP. This causes
the separation of the PIM domain, but with MSDP mesh group, the Anycast
RPs in the domain can exchange the source information.

MPLS Technology
This chapter describes the principle and application of Multi-protocol Label

Switching (MPLS).
Main contents:
Terms
Introduction to MPLS
MPLS architecture
Introduction to the LDP Protocol
Introduction to BGP/MPLS VPN
MPLS VPN user accesses Internet
Introduction to CSC
Introduction to MPLS L2VPN
MPLS traffic engineering
MPLS OAM
Terms of MPLS Protocol

MPLS -Multiprotocol Label Switching
Label -Label
FEC-Forwarding Equivalence Class
LSR-Label Switching Router
LDP-Label Distribution Protocol

Introduction to MPLS
The MPLS integrates the latest development of the route/switch solution.
It combines the simplicity of L2 switching and flexibility of L3 route. It
provides the following features:
In the MPLS network, the packet forwarding is based on the fixed-

length label. It simplifies the forwarding mechanism and improves the
forwarding rate.
Frame relay, ATM, PPP, HDLC, SDH, and DWDM are supported, which
ensures the interconnection of multiple types of network.
It supports QoS, traffic engineering and large-scale VPN.
MPLS Architecture
Separation of Control and Forwarding
The MPLS architecture is divided into two independent units: the control
unit and forwarding unit, as shown in the following figure:

Figure 25-1 MPLS Architecture
The control unit uses the standard routing protocol (such as OSPF and
BGP4) to exchange routing information and maintain routing tables. At the
same time, it uses the label control protocol (such as LDP, MP-BGP, and
RSVP) to exchange the label forwarding information with the
interconnected label switching devices to create and maintain the label
forwarding table.
The forwarding unit determines the forwarding of a packet, namely, search

the label forwarding table according to the information in the packet
header. Process and forward the label according to the search results.
Forwarding Equivalence Class

A FEC is a collection of the packets using the same forwarding path in the
network (the destination addresses of the packets can be different). The
packets are processed in the same mode by the LSR in the process of
forwarding. From the view of forwarding processing, the packets are

equivalent. FEC is collection of a series of attributes (FEC elements),

including source address, destination address, source port, destination
port, protocol type, and CoS.
The entrance LSR of the MPLS domain determines one FEC for the IP
packet entering the MPLS domain. Then, it searches the corresponding
label value according to the FEC and encapsulates them into the IP
packets to form label packets. Then, transmit the packets in the MPLS
domain.
Label Encapsulation and Label Operation

In the MPLS network domain, the forwarding of the label packets are
performed according to the label carried in the packet. The label is
inserted between the L3 packet and L2 header. It is called MPLS label
header. The format is shown as follows:
Figure 25-2 MPLS label
One MPLS packet can carry multiple label headers. The structure is called
a label stack. The labels are organized in the Last in, first out mode. The
external label is called the stack top label and the internal label is called
the stack bottom (simple IP unicast route does not use label stack, but
other MPLS-based applications including MPLS-VPN rely on the label stack).
Each label is composed of the following fields:
Time-to-Live
The TTL of the field is 8 bits. It is used for coding of TTL. The function is
the same as that of the TTL field in the IP header. The filed is used to
prevent forwarding loopback caused by improper configuration, fault, or
slow convergence of routing algorithm, and to restrict the packet scope.
Stack bottom bit (S)
The field is 1 bit and the location is 1. It indicates that the corresponding
label is the last label (S) in the label stack. 0 indicates all other labels
except the bottom stack label.

Service class information (EXP, also named trial bit)
The field is 3 bits used to carry CoS information (the function is similar to
TOS data in the IP packet).
Label Value
The field is 20 bits, containing the actual value of the label. When a LSR
receives the label packet, it first checks the label value of the stack top.
Normally, the LSR knows the next-hop node through the label value and
uses new label to replace the current stack top label. The label values 0-15
are the reserved label values. The meaning is as follows:
Label Description
Value
0 Indicates that IPv4 shows the blank label. When the label is at the stack top, it
indicates that the next step is to pop up the label and forward the packets according
to the new stack top label. If the label is the only label in the label stack, namely,
after the popup, the label stack is empty, the forwarding of the packets are based
on the IPv4 packet header.
1 Indicates the alert label of the router. When the stack top label of the receive
packet is 1, the packet is sent to the local software. The forwarding of the packet is
determined by the next item in the label stack.
2 Indicates that IPv6 shows the blank label. The usage is similar to label value 0.
3 It indicates the hidden blank label. The LDP uses it to require upstream neighbor to
pop up labels (penultimate relay segment popup). The label value does not occur in
the label encapsulation.
415 Reserved
MPLS Network Structure and Forwarding

Process
In the forwarding of traditional IP packets, in each hop, the router
independently analyzes the destination IP address and runs the routing
algorithm of the network. On the basis, the independent forwarding policy
is made to determine the next hop of the packets. In MPLS, packets
entering the network are divided into different FECs. Then, search the
corresponding label value according to the FEC and encapsulate the values
into the packets. The routers in the network determine the packet
forwarding according to the labels carried in the packets. In the entire
MPLS domain, the forwarding of the packets is performed according to the
label. You need not perform any operation over the IP headers. The join
and forwarding process of the label is as follows:

Figure 25-3 Label forwarding
The basic unit of the MPLS network is the label switching router (LSR). The
switches or routers that can distribute labels and forward packets
according to the label belong to LSR. According to the functions provided
by the LSR, it can be divided into LSE (LER) and core LSR.
The LSR possessing non-MPLS neighbor is considered to be the boarder

LSR. The boarder LSR performs the tag insertion or rejection operation in
the MPLS boarder. In the entrance point of the MPLS domain, insert the
tag; in the egress point of MPLS domain. Before forwarding packets to
neighbors out of the MPLS domain, reject the packet tag.
In the preceding MPLS network structure, R1-R7 form an MPLS domain, in

which, R1, R2, R3, and R7 are the boarder LSRs, R4, R5, and R6 are core
LSRs. In the MPLS domain, each LSR maintains a label forwarding table.
The core LSR searches the label forwarding table according to the labels
carried in the packets to determine the forwarding path. No operation is
required for the IP header.
Penultimate Hop Popping Mechanism

For the packet that is received from the MPLS neighbor and whose
destination is a subnet outside of the MPLS domain, the MPLS egress
boarder LSR must search it twice, as shown in R7 of figure 25-3. The LSR
must check the label in the label stack top to search labels. When the
packet is forwarded to outside of MPLS domain, the label should be pop up.
Then, search and forward L3 according to the IP header. Dual-search
operation of the R7 router may decrease the performance of the node. In
addition, in the environment implementing MPLS and IP switching, dual-

search will increase the complexity of the hardware implementation. To

solve the problem, in the MPLS architecture, penultimate hop popping
mechanism is adopted.
With the penultimate hop popping mechanism, the boarder LSR can
require the upstream neighbor to pop up the label (through the signaling
protocol such as LDP to send hidden label tag value 3 to the upstream
neighbor). In figure 25-3, R6 router pops up the labels in the packets,
then, send the pure IP packets to the R7 router. At last, R7 router
performs simple L3 searching operation and sends packets to the
destination.
Introduction to the LDP Protocol

LDP, as a signaling protocol in the MPLS architecture, binds the labels for
the unicast routes in the routing table and advertises the generated MPLS
label forwarding table. The relation between the LDP and the label
forwarding table is similar to the relation between routing protocol and
core routing table.
Basic Concepts of LDP

L D P Peer
The two LSRs using LDP to exchange FEC/label mapping information are
called LDP peers.
Label Space
The concept of label space is related with the assignation and distribution
of the label. It defines the scope of using labels and defines whether the
labels in different interfaces can be repeated. The label space includes two
types:
Label space in the scope of each interface
The interface using the interface resources as the label generally uses the
label space. If the LDP peer is connected through specific interface, and
the label is transmitted through specific interface data, the label space
based on the scope of each interface can be used. In this case, the label is
unique in each interface.

Label space in the scope of each platform
When the interfaces share label resources, the label space based on each
platform scope is used. In this case, the label is unique in a platform (a
LSR).
L D P Ident ifier
The length of LDP identifier is 6 bytes. It is used to mark the label space
scope of specific LSR. The first four bytes indicate the IP address assigned
to the specific LSR. The rest two bytes indicate the specific label space in
the LSR. For the label space in the platform scope, the last two bytes of
LDP identifier are always 0. The format of LDP identifier is as follows:
<IP address>: <Label space SN> such as 128.255.1.2:0,

129.13.17.35:2
If there are two physical links between two LSRs, the two physical links
are ATM links using the label space of each interface scope. In this case,
multiple label spaces should be advertised between LSRs, and multiple
LDP identifiers should be used.
L D P Session
The LDP session is used to exchange label information between LSRs. If
multiple label spaces are advertised to another LSR from one LSR, for
different label spaces, different LDP sessions must be created between
LSRs.
L D P Trans mission
The LDP uses TCP to ensure reliable transmission of the LDP session. If
multiple LDP session is required for two LSRs, different LDP sessions will
correspond to different TCP connection.
LDP Working Process

The LDP working process includes LDP discovery, session creation and
maintenance, label distribution and management.
L D P Disco ver y
LDP discovers and creates Adjacency through the discovery mechanism.
LDP supports basic discovery and expanded discovery.

The basic discovery mechanism is implemented by periodically sending

link hello message (the UDP multicast packets with the port of 646, the
multicast address is all routers in the subnet: 224.0.0.2) in the startup
LDP interface.
The expanded discovery mechanism discovers the non-directly-connected

LDP neighbor by periodically sending destination hello message (UDP
unicast packets with the port of 646) to a specific IP address.
If the LSR receives LDP hello message, it indicates that the potential
reachable LDP peer exists. The label space used by the LDP peer can be
obtained.
L D P Session Creat ion and Main tenance

Exchanging LDP discovery hello message between two LSRs (LSR1 and
LSR2) will start the creation of LDP session. The process of creating LDP
session includes two steps: creating transmission connection (TCP
connection) and session initialization.
Assume that the label space of LSR1 is LSR1: a, the label space of LSR2 is
LSR2: b. The following describes the process of creating LDP session of
LSR1.
Process of creating transmission connection
After discovering the hello message through exchanging LDP, the two
parties will create an adjacency. Then, determine the initiative part
according to the transmission addresses of the two parties. If the
transmission address of LSR1 is larger than the transmission address of
LSR2, LSR1 serves as the initiative party of creating connection, namely, it
initiatively launches TCP connection (port: 646) to LSR2, and LSR2 serves
as the passive party of the connection to wait for the creation of
connection.
The determination mode of transmission address:
If LSR1 uses TLV is optional for transmission address in the hello

message sent to LSR2 to advertise its address, the transmission address
of LSR1 is the address advertised in the TLV.

If LSR1 did not use the optional TLV of transmission address, the
transmission address of LSR1 is the source address of sending hello
message to LSR2.
LDP session initialization
After LSR1 and LSR2 create the transmission layer connection, they
exchange the LDP initialization messages and negotiate the LDP session
parameters. The parameters that should be negotiated include LDP
protocol version, label distribution mode, and the session holding timer
value. When the parameters are negotiated successfully, a session based
on LSR1: a and LSR2: B are created between LSR1 and LSR2. The
following describes the initialization process of a session through the state
machine mode.
Table of initialization state conversion
Status Event New status

NON EXISTENT Create a TCP connection INITIALIZED
Send initialization message (initiative party) OPENSENT
Receive acceptable initialization message (passive OPENREC
INITIALIZED party).
Action: send initialization message and holding
message.
Receive any other messages. NON EXISTENT
Action: send error notification messages; close
the TCP connection.
Receive the holding message OPERATIONAL
OPENREC Receive any other LDP messages. NON EXISTENT
the TCP connection.
Receive acceptable initialization message. OPENREC
OPENSENT Action: send holding message.
Receive any other LDP messages. NON EXISTENT
the TCP connection.
Receive the closing message NON EXISTENT
Action: send the closing message; close the TCP
OPERATIONAL connection.
Receive other messages OPERATIONAL
Time out. NON EXISTENT
Action: send the closing message; close the TCP
connection.
Initialization state conversion

Figure 25-4 Initialization state conversion of LDP protocol
To maintain the completeness of LDP session, LDP maintains a session

holding timer for each session. When the LSR receives a LDP PDU from a
specific link, the session holding timer will be restarted. If the LSR did not
receive LDP PDU from the peer until the session holding timer times out,
the LSR will think that LDP session connection is wrong. Close the TCP
connection and end the LDP session.
After the LDP session is created, LSR must send the LDP protocol message
within the session holding time. If no message will be sent, the session
holding message will be sent.
If LSR wants end the LDP session, the LSR will send the closing notification
message to LDP peers.

L D P Label D istribu tion and Manage ment

The label distribution and management are determined by three label
parameters (distribution mode, control mode, and holding mode).
Label distribution mode
The label distribution modes used in the MPLS include: downstream

unsolicited, and downstream on demand.
For a specific FEC, the LSR does not need to obtain label request message
from the upstream to perform label distribution and assignment. This
mode is called Downstream Unsolicited.
For a specific FEC, the LSR needs to obtain label request message to
perform label distribution and assignment. This mode is called
Downstream on Demand.
Label control mode
There are two types of label control modes in MPLS: Independent Control
and Ordered Control.
When the independent control mode is used, each LSR can advertise label
mapping to the connected LSR at any time.
When the ordered control mode is used, if the LSR receives specific FEC-
label mapping message of specific FEC next-hop or if LSR is the egress
node of LSP, LSR can send label mapping message to the upstream.
Label retention mode
There are two types of label retention modes in the MPLS: Liberal
retention mode and conservative retention mode.
For s specific FEC, LSR Ru receives label mapping from the LSR Rd. When
Rd is not the next-hop of Ru, if Ru saves the binding, the mode used by
Ru is liberal retention mode; if Ru discards the binding, the mode used by
Ru is conservative retention mode.
Generally, the default mode is: downstream unsolicited, independent

control and liberal retention.
L D P Graceful R estart
For the LDP to support graceful restart, you should add the support for
Fault Tolerant Session TLV as optional parameter in the initialization
message. It sends the initialization message carrying FT session TLV
parameter to peers to indicate that it can retain the MPLS LSP information

and FEC information in the case of restarting LDP. At the same time, the
neighbor router should support the graceful restart capability and retain
the MPLS LSP information created with the restart router.
LDP advertises two times in the FT Session TLV to the peers: FT Reconnect
Timeout and Recovery Time. FT Reconnect Timeout is the time of
reconnecting after restart. Recovery time is the time of LDP recovery after
restart.
In the case of restarting, LDP starts the restart process for each neighbor.
Before the reconnect timer times out, reconstruct the connection with
neighbors. Then, wait for the neighbor to send the label mapping message
retained in the restart. Update the forwarding information retained locally
in the restart according to the information. At last, send new label
mapping information to all neighbors. When the restart is over, delete all
forwarding information that is not updated. For the neighbor routers, when
new connection is created, the MPLS LSP created with the restart router is
marked as to be aged, and send label mapping information to the restart
router. At last, update the MPLS LSP marked as to be aged according to
the label mapping information received from restart router. When the
restart is complete, delete all to-be-aged MPLS LSP that is not updated.
LDP Message Type and Format

L D P Message Typ e
Message Type Description
Notification Error notification message
Hello Hello message
Initialization Initialization message
KeepAlive Keepalive message
Address Address message
Address Withdraw Address withdraw message
Label Mapping Label mapping message
Label Request Label request message
Label Abort Request Label abort request message
Label Withdraw Label withdraw message
Label Release Label release message
L D P Message For mat

All LDP messages use type-length-value (TLV) coding architecture. LDP
PDU is composed of LDP header and one or multiple LDP messages.
LDP header format

Version: two-byte unsigned integer; indicates the version number of LDP

protocol. The current LDP protocol version is 1.
PDU Length: two-byte integer; indicates the PDU length, excluding version
number and PDU length field.
LDP identifier: 6 bytes, identifies the label space sending LSR of PDU. The
first four bytes indicate the router ID (IP address) of LSR and the last two
bytes indicate the label space of LSR.
Universal format of LDP message code
All LDP messages use the following coding format:
U bit: unknown message bit. If the LSR receives unknown message with U
bit of 0, the LSR will return a notification message to the message source;
if the LSR receives unknown message with U bit of 1, the LSR will discard
the unknown message.
Message type: indicates the type of the message

Message length: indicates the length of message in bytes. The length

includes message ID, mandatory parameters, and optional parameters.
Message ID: 32-bit value, used to indicates the message.
Mandatory parameters: a collection of message parameters with variable

length.
Optional parameters: a collection of optional parameters with variable

length.
Code format of the notification message:
State TLV: indicates the event type of the notification message. For details,
see universal TLV coding mode of LDP-state TLV.
Optional Parameters: variable length field, including 0 or multiple

parameters, the coding mode of each parameter is TLV. The notification
message contains the following optional parameters.
Optional Type Length Content

Parameter
Expansion state 0x0301 4 Indicates the expansion message of certain
notification message state code
Returned PDU 0x0302 Variable LSR uses the parameter to return LDP PDU
length header to source LSR.
Returned 0x0303 Variable LSR uses the parameter to return LDP
message length message type and length to source LSR.
Code format of the hello message:

TLV code of universal hello parameter:
T: indication of target hello message, 1: target hello message; 0: link hello

message.
R: Request to send target hello message, 1: request the receiving LSR to

periodically send target hello message to the sending LSR.
Optional Parameters: variable length field, including 0 or multiple

parameters, the coding mode of each parameter is TLV. The hello message
contains the following optional parameters.
Optional Type Length Content

Parameter
Transmission 0x0401 4 Indicates sending the address used by LSR
address in the case of creating TCP connection.
Configuration SN 0x0402 4 Indicates the configuration state of the
sending LSR.
Code format of initialization message

TLV code of universal session parameter:
The current protocol version is 1.
Session holding time: with the unit of seconds, indicates the wanted value
of session holding timer of the sending LSR.
A: Label distribution mode
0 indicates the downstream unsolicited mode; 1: downstream on demand.
D: loopback detection. 0: loopback detection is not allowed; 1: loopback

detection is allowed.
PVLim: Path vector limit
Code format of session holding message

Code format of the address message:
Address list TLV: the sending LSR notifies the interface address through
the address list TLV For the coding format, see the universal TLV coding
mode of LDP-address list TLV.
Code format of the address withdraw message:
Address list TLV: the sending LSR withdraws the interface address through
the address list TLV For the coding format, see the universal TLV coding
mode of LDP-address list TLV.
Coding format of label mapping message

FEC TLV: indicates the FEC unit part of FEC/label mapping, for the code
format, see the universal TLV coding mode of LDP-FEC TLV.
Label TLV: indicates the unit part of FEC/label mapping, for the code
format, see the universal TLV coding mode of LDP-label TLV.
Coding format of label request message
FEC TLV: indicates the FEC unit corresponding to the label request.
Coding format of label request withdraw message

FEC TLV: indicates the FEC unit corresponding to the label request
withdraw
Label request message identifier TLV: the label request message indicated
by the label request withdraw
Coding format of label withdraw message
FEC TLV: indicates the FEC unit corresponding to the label withdraw
message, for the code format, see the universal TLV coding mode of LDP-
FEC TLV.
Label TLV: indicates the withdrawn label, for the code format, see the
universal TLV coding mode of LDP-label TLV.
Coding format of label release message

FEC TLV: indicates the FEC unit corresponding to the label release
message, for the code format; see the universal TLV coding mode of LDP-
FEC TLV.
Label TLV: indicates the released label, for the code format, see the
universal TLV coding mode of LDP-label TLV.
Note
0x3E00~0x3EFF TLV is for reserving proprietary TLV of the manufacturers.
U ni versa l TL V C oding Mode of L D P
U bit: unknown TLV bit field.
F bit: forwarding unknown TLV bit field.
Type: Message type
Length: length of message, with the unit of byte
Value: Message content
FEC TLV code format

FEC units cover many types. The coding also depends on the unit type.
The content of FEC unit field is as follows:
The field with the length of 1 byte, indicates the unit type of FEC.
The field with variable length, indicates the FEC unit value dependent on
the type.
Code of the FEC unit is as follows:
FEC unit type Type field Field of unit value

Wildcard 0x01 Does not exist
Prefix 0x02 See the context
Host Address 0x03 See the context
Code format of the address prefix
Address family: two bytes, based on RFC1700 code, for example IPV4 is 1.
Prefix length field: one byte, indicates the length of prefix length in bytes.
Address prefix: the coding is based on the address family.
The coding format of host address:

Address family: two bytes, based on RFC1700 code, for example IPV4 is 1.
Host address length: one byte, indicates the length of host address in
bytes.
Host address: the coding is based on the address family.
TLV code format
Labels cover three types: universal label, ATM label, and frame relay label.
The following describes the code of universal label.
Label: label value, the length is 20 bits, placed in four-byte field.
Address list TLV code format
State TLV code format

State code: 32-bit unsigned integer, indicates the event type The code
format is as follows:
Error bit: critical error bit, 1: critical error notification; 0: advisory

notification.
F bit: forwarding bit, 1: the notification message should be forwarded to

the next-hop or previous-hop LSR of LSP related with the message; 0: not
forward
State data: 30-bit unsigned integer, indicates the state information.
Message type, 0: the state TLV is not related with specific message; 1:
message type of the state TLV.
BGP/MPLS VPN
BGP/MPLS VPN is a mechanism permitting SP to use the IP backbone
network to provide L3 VPN service for users. In the mechanism, BGP is
used to publish the VPN routing information in the backbone network of
the SP. MPLS is used to forward VPN service from one VPN station to
another.
Concepts and Terms of BGP/MPLS VPN

Term Description
P-Network Provider network, the backbone network of the service provider.
PE router Provider Edge Router
P router Provider Router
CE router Customer Edge Router

ASBR AutoSystem border router

Site The networks connected with CE form a site
VRF VPN Routing Forwarding Instance, supported in PE, each VRF has an
independent route forwarding table.
VPN An abstract concept, it can be considered to be a group of sites sharing
routing information. A VPN can include multiple VRFs (the VPN contains
routes of multiple VRFs); one VRF can belong to multiple VPNs (multiple VPNs
contain the routes in VRF).
RD Route Distinguisher,
is a 64-bit number. Address overlapping is allowed in different VRFs. When
the BGP advertises VRF routes, different RD must be added to the IP address
to ensure that the address is unique.
RT Route-Target used for the BGP to advertise VPN routes; it controls the
destination VRF of importing the received VPN routes.
BGP/MPLS VPN Network Structure

The following illustrates the BGP/MPLS VPN network structure.
Figure 25-5 BGP/MPLS VPN network structure
In the preceding figure, each PE contains two VRFs, and connects two sites.
The two interfaces connecting sites belong to two different VRFs. Site1 and
site2 belong to one VPN; site3 and site4 belong to another VPN.
Process o f Route Ad ve rtise ment and Label

M apping Ad vert isement
In the P-Network, each device runs a certain IGP protocol (such as
OSPF) to mutually advertise routes, including loopback interface.
In the P-Network, each device starts the MPLS, and mutually

advertises label mapping through signaling protocol (such as LDP). For
PE1, in the routing table, there is a route to PE2 LOOPBACK interface;
the corresponding output MPLS label is L1.

OSPF and RIP IGP protocols run between PE and CE. At the same time,
BGP protocol can also run. The routing information can be exchanged
through static route. For PE2, the route 10.2.1.0/24 learned from site2
will be saved in the routing table of VRF1.
Run MP-BGP between PEs to mutually advertise VPN routes (including

label mapping information). PE1 receive3s VPN route 10.2.1.0/24
advertised from PE2. The output label is L2; the next-hop is the
loopback interface of PE2. According to RT, PE1 imports the route into
VRF1 routing table, and advertises the route through IGP or BGP to
CE1.
For warding Process o f Packets

For the CE1 to access 10.2.1.5, the process is as follows:
CE1 sends IP packets to PE1.
PE1 (ingress PE device) receives the VPN packets, checks the relevant
VRF routes; searches MPLS forwarding table according to the routes to
obtain relevant output label value L2. The next-hop of the route is the
loopback interface of PE2, obtain the corresponding label value L1
from the MPLS forwarding table. The two labels are integrated into
MPLS label stack, added to the front of the received VPN packets, and
forwarded to P device.
After device P receives the MPLS packets, forward the packets

according t o the stack top label: pop up stack top label L1
(penulitimate hop popping), and forward to PE2.
PE2 device (egress port PE) receives MPLS packets (the packets have
only one layer of label L2), according to the stack top label L2,
determine the VRF, pop up the label, check the relevant VRF route,
and forward packets to CE2 according to the routes.
BGP/MPLS VPN Cross-Domain

VR F -to- VR F
The following figure illustrates the VRF-to-VRF cross-domain mode.

Figure 25-6 VRF-to-VRF
Through this mode, in the domain, the BGP/MPLS VPN network is

configured. For the cross-AS VPN, the ASBR should serve as the PE device
of the VPN. In the ASBR device, configure the VRF corresponding to the
VPN, and assign an interface/sub-interface for the VRF. The
interfaces/sub-interfaces between two ASBRs are mutually connected. For
the local AS domain VPN, the ASBR serves as PE; import all routes of the
VPN. For the peer AS domain VPN, the ASBR serves as CE roles; you can
learn the routes of peer VPN by running various routing protocols between
ASBRs. Then, distribute the routes to all PE devices of the local VPN. When
the packets are forwarded, in the domain, two-layer label forwards are
used. After reaching ASBR, serve as ordinary IP packets to send to the
peer ASBR.
The advantage of the mode is MPLS is not required between ASBRs. The
disadvantage is that all VPN routes should be maintained in the ASBR. In
addition, one interface/sub-interface should be assigned for each cross-
domain VPN. Therefore, the problem of expansibility exists.
M P - EBG P Carryi ng VPN V4 Route

For this mode, the MP-EBGP should run between ASBRs. When the ASBR
learns the VPNV4 routes advertised by the local AS PE, replace the label
and advertise the routing information and new label to the peer ASBR. In
the case of forwarding packets, the two-layer label forwarding is adopted
in the domain. The one-layer label forwarding is adopted between ASBRs.
According to the implementation details, the inner layer label should be
replaced in ASBR.
In this mode, the ASBR is not required to assign VRF for each VPN; the
VPNV4 route should not be imported, and interface/sub-interface should
not be assigned for each VPN. But the ASBR should maintain all VPNV4
routes and assign labels fro routes. Install the ILM entry locally. Therefore,
the pressure of ASBR is heavy.

The implementation method of the mode is as follows:
Figure 25-7 MP-EBGP carrying VPNV4 route
In figure 25-7, CE1 and CE2 belong to the same VPN; PE1 and ASBR1
belong to AS1; PE2 and ASBR2 belong to AS2.
1. Process of Route Advertisement and Label Mapping Advertisement
In the P-Network of the same AS, each device runs a certain IGP
protocol (such as OSPF) to mutually advertise routes, including
loopback interface.
In the P-Network of the same AS, each device starts the MPLS,
and mutually advertises label mapping through signaling protocol
(such as LDP). For PE1, in the routing table, there is a route to the
ASBR1 LOOPBACK interface, the corresponding output MPLS label
is L1. For ASBR2, in the routing table, there is a route to PE2
LOOPBACK interface, and the corresponding output MPLS label is
L2.
OSPF and RIP IGP protocols run between PE and CE. At the same
time, BGP protocol can also run. The routing information can be
exchanged through static route. For PE2, the route 10.2.1.0/24
learned from site2 will be saved in the routing table of VRF1.
Run MP-BGP between PE2 and ASBR2 to mutually advertise VPN

routes (including label mapping information). ASBR2 receives VPN
route 10.2.1.0/24 advertised by PE2. The output label is L3 and
the next-hop is the loopback interface of PE2.
Run MP-EBGP between ASBR1 and ASBR2 to mutually advertise

VPN routes. After ASBR2 receives the VPN routes advertised from
PE2, apply a label L4 for the route. Then, advertise the VPN route
10.2.1.0/24 containing L4 to ASBR1.
Run MP-IBGP between PE1 and ASBR1. ASBR1 receives VPN route
10.2.1.0/24 advertised from ASBR2; the output label is L4; ASBR1

apply a label L5 for the route, then, advertise VPN route

10.2.1.0/24 containing L5 to PE1.
PE1 receives VPN route 10.2.1.0/24 advertised from ASBR1. The

output label is L5; the next-hop is the loopback interface of ASBR1.
According to RT, PE1 imports the route into VRF1 routing table,
and advertises the route through IGP or BGP to CE1.
2. Forwarding Process of Packets
For the CE1 to access 10.2.1.5, the process is as follows:
CE1 sends IP packets to PE1.
PE1 (ingress PE device) receives the VPN packets, checks the

relevant VRF routes; searches MPLS forwarding table according to
the routes to obtain relevant output label value L5. The next-hop
of the route is the loopback interface of ASBR1, obtain the
corresponding label value L1 from the MPLS forwarding table. The
two labels are integrated into MPLS label stack, added to the front
of the received VPN packets, and forwarded to P1.
After device P1 receives the MPLS packets, forward the packets

(penulitimate hop popping), and forward to ASBR1.
ASBR1 receives MPLS packets (the packets have only one layer
label L5); change L5 into L4, and then forward MPLS packets to
ASBR2.
ASBR2 receives MPLS packets (in this case, the packets have only
one layer of label L4); change L4 into L3, and then search routing
table. The next-hop of the route is the loopback interface of PE2,
label value L2 is obtained from the MPLS forwarding table. Press
L2 into MPLS label stack top of the packets and then forward MPLS
packets to P2.
After device P2 receives the MPLS packets, forward the packets

(penulitimate hop popping), and forward to PE2.
PE2 device (egress port PE) receives MPLS packets (the packets
have only one layer of label L3), according to the stack top label
L3, determine the VRF, pop up the label, check the relevant VRF
route, and forward packets to CE2 according to the routes.
MPLS VPN User Accesses Internet

In most network deployments, some or all sites of MPLS VPN should
access Internet. The access service is one of the most important services

for the enterprise customers provided by the MPLS VPN service provider.
Taking address overlapping, access control, and security into consideration,
we provide two common Internet access solutions.
Enterpr ise Int ranet Access
Figure 25-8 Access Internet through intranet firewall
Many customers do not want the VPN users to access internet directly.
They want to use the firewall to control the access to internet of VPN users.
It is the typical requirement of enterprise access to the Internet. In the
VPN environment, specific sites are provided for accessing internet. Each
VPN site sends the internet data flow to one or multiple central sites.
In the network as shown in figure 25-8, all sites in the VPN access internet
through a central site. The central site provides service between the VPN
members and internet through firewall and NAT service. VPN members
forward the data flow accessing internet to the central site by importing a
default route (the next hop is the central site CE) accessing internet. The
central site forwards the data flow accessing internet to the enterprise
firewall. In the firewall, control the access and perform NAT processing
according to the security policy of the enterprise. At last, the firewall
forwards the data flow to the internet.
For the access mode, you only need to deploy the configuration in the
enterprise VPN. Participation of the carrier is not required. The enterprise
can control the security policy of accessing internet of the intranet users.
But this mode requires that the enterprise should have strong security
management capability. For the carrier, universal management for the
access to internet cannot be performed.

Based on the implementation principle of this mode, two other access

modes are expanded.
1. Universal access mode of the service provider
The universal access control mode is to put the access control of internet
and the universal access point at the egress port of carrier. To solve the
problem of address overlapping between different VPNs, the carrier needs
to provide a VPN and firewall for each VPN to access internet. Each VPN
user accesses internet through respective firewall. In this mode, a default
route directing to internet gateway should be configured at the VPN site of
the internet egress port. Advertise the default route through BGP to other
sites of the VPN. Other sites of the VPN forward the data flow accessing
internet to the VPN sites at the PE side of the internet access point of the
carrier through the regular L3VPN mode in the default route. After
processed by the firewall, the data flow is forwarded to internet.
Certain configuration should be performed for the carrier in the access

mode. Each VPN requires a firewall. If the number of VPN users is large,
great investment is required. Through this mode, carrier can manage
internet access of VPN users effectively.
2. Multiple-level access mode
To solve the problems of enterprise firewall access and service provider

access control, the multiple-level access control mode is adopted. In each
enterprise VPN, adopt the enterprise firewall access mode. Control the
centralized access to the internet in the central site in the enterprise VPN.
Then, control the access again in the carrier network, that is, multiple-
level access control. The advantage of this mode is that it meets the
requirements of controlling the access. At the same time, the carrier can
also perform universal control for the internet access of VPN users. In
addition, not too many investments are required. The configuration of this
mode is complex and the efficiency is lower than the previous two modes.

Ser vice Pro vider Net work Access
Figure 25-9 Static default route access
Some customers want to access internet through VPN, but the customers
do not require receiving complete routing information. The access
requirement can be met through the static default route access.
The following describes the access mode taking VPNA users accessing
internet as an example. In VPNA, there are two users: CE1 and CE4. First,
we need to respectively configure a cross-VRF default static route
accessing internet in the VRF routing table of PE1 and PE2. The next hop
of the cross-VRF default static route in PE1 is the Internet gateway; in PE2,
the next hop of cross-VRF default static route is PE1. To ensure the path
for returning from internet, you should configure a cross-VRF route
reaching CE1 in the global routing table of PE1. The next hop is the CE1 of
VRF. In the global routing table of PE2, you should configure a cross-VRF
route reaching CE4. The next hop is the CE4 in VRF. Advertise the route
through IGP to the MPLS network.
We describe the process of forwarding packets taking CE4 accessing

internet as an example:
After PE2 receives the packets accessing internet forwarded by CE4,

press it into the label stack of the default route, namely, the label
assigned to global next-hop of the default route. Then, forward the
MPLS packets to PE1.
PE1 receives the packets accessing internet sent by PE2. In the global
routing table, search the internet route and forward the packets to the
internet gateway.

When the packets returned from the internet reach PE1, PE1 searches
the global routing table. There is a route reaching CE4 advertised by
PE2 through IGP. Then, forward the packets through the label of the
route to PE2. PE2 found a static route reaching CE4 in the global
routing table and forward the packets to CE4.
Introduction to CSC
C SC Co ncept
With the promotion and spread of BGP/MPLS VPN, more and more end
users implement network interconnection through MPLS VPN. Many small-
to-medium carriers, in order to save the cost of independently
constructing or leasing L2 transmission link, begin to lease VPN lines from
large MPLS carrier to implement POP connection. This is called Carriers
Carrier (CSC).
C SC Ne t work Struc ture
Figure 25-10 CSC network structure
Basic structure of CSC is not significantly different from that of MPLS VPN
network. Carrier network usually refers to the large-scale network
providing VPN access service based on the label exchange for small-to-
medium carriers and end users. See backbone carrier network in figure
25-10. Carriers Carrier network is based on the carrier network. It
provides internet access or VPN access for end users or end users. See
User Carrier POP1 and User Carrier POP2 in figure 25-10. Theoretically,
the number of layers is not restricted. Therefore, it is expansible.

1. Process of CSC Route Advertisement and Label Mapping

Advertisement
The LGP and LDP advertisement process in the backbone carrier

network is skipped. We think that CSC-PE1 has already learned
one FTN: 2.2.2.2/32-to-L1. At the same time, one ILM (L1-to-
NULL) is installed in P1.
USER-PE2 advertises route 1.1.1.1/32 to USER-P and advertises

one blank label for the route.
USER-P advertises route 1.1.1.1/32 to CSC-CE2 through IGP

protocol. The next hop is specified to be USER-P. Assign one label
L2 for route 1.1.1.1/32 through LDP and advertise it to CSC-CE2.
Then, install an ILM entry of L2-to-NULL locally.
CSC-CE2 advertises route 1.1.1.1/32 to CSC-PE2 through IGP

protocol. The next hop is specified to be CSC-CE. Apply one label
L3 for the route through LDP on CSC-CE2 and advertise it to CSC-
PE2. Then, install an ILM entry of L3-to-L2 locally.
MP-IBGP in the CSC-PE2 applies a new label L4 fro route

1.1.1.1/32 in the VRF. Then, encapsulate the route and label into
VPNv4 packets and advertise to CSC-PE1. Specify the next-hop to
CSC-PE2(2.2.2.2), and install the ILM entry of L4-to-L3 locally.
After CSC-PE1 learns route 1.1.1.1/32, advertise it to CSC-CE1

through IGP protocol. Next-hop is specified to be CSC-PE1. In this
case, LDP in CSC-PE1 applies a label L5 for the BGP route and
advertises it to CSC-CE1. In addition, install an ILM entry of L5-L4
locally.
CSC-CE1 advertises route 1.1.1.1/32 to USER-PE1 through IGP

protocol. The next hop is specified to be CSC-CE1. Apply one label
L6 for the route through LDP on CSC-CE1 and advertise it to
USER-PE1. Then, install an ILM entry of L6-to-L5 locally.
USER-PE1 installs FTN of 1.1.1.1/32-to-L6 in the local label

forwarding table.
As a result, USER-PE1 and USER-PE2 can access mutually (assume the

label advertisement is complete in the negative direction.)
USER-CE2 advertises route 10.2.1.0/24 to USER-PE2.
After USER-PE1 and USER-PE2 creates MP-IBGP connection,

USER-PE2 applies a label L7 for route 10.2.1.0/24. Encapsulate
the route and label into VPNv4 packets and advertise to CSC-CE3.
The next-hop is specified to be USER-PE2 (1.1.1.1). Install a label
entry L7-to-NULL locally.
After USER-PE1 receives the VPNv4 route, install a FTN of

10.2.1.0/24-to-L7 in the local label forwarding table. Then,

advertise route 10.2.1.0/24 to USER-CE1 through IGP. The next

hop is specified to be itself.
2. Forwarding Process of CSC Packets
When USER-CE1 needs to access USER-CE2 (10.2.1.1), according

to the local routing table, find that the next hop reaching
10.2.1.0/24 is USER-PE1. Therefore, send the IP packets to USER-
PE1.
After USER-PE1 receives the IP packets, press the packets with

inner label L7 according to FTN: 10.2.1.0/24-to-L7. Obtain inner
label L6 according to next-hop 1.1.1.1 and FTN: 1.1.1.1/32-to-L6.
As a result, USER-PE1 sends L6 L7 10.2.1.1 MPLS packets to
CSC-CE1.
When the packets reach CSC-CE1, replace the external label

according to external label L6 and entry L6-to-L5 in the local label
forwarding table. Therefore, USER-CE1 sends L5 L7 10.2.1.1
MPLS packets to CSC-PE1.
In CSC-PE1, replace the external label with L4 according to

external L5 and label forwarding entry L5-to-L4. Press one layer of
L1 according to next hop 2.2.2.2 of the entry and FTN:
2.2.2.2/32-to-L1. Therefore, CSC-PE1 sends one L1 L4 L7
10.2.1.1 MPLS packet to P1.
After P1 receives the packet, the external label L1 pops up. Then,
send packets L4 L7 10.2.1.1 to CSC-PE2.
In CSC-PE2, replace the external label with L3 according to

external label L4 and label forwarding table entry L4-to-L3.
Therefore, CSC-PE2 sends a L3 L7 10.2.1.1 MPLS packet to
CSC-CE2.
In CSC-CE2, replace the external label with L2 according to

external label L3 and label forwarding table entry L3-to-L2.
Therefore, CSC-CE2 sends a L2 L7 10.2.1.1 MPLS packet to
USER-P.
After the packets reach USER-P, the external label pops up

according to label entry L2-to-NULL. Then, forward packet L7
10.2.1.1 to USER-PE2.
When USER-PE2 receives the packets, label L7 pops up according

to label table entry L7-to-NULL. Then, send IP packets to USER-
CE2.
As a result, the IP packets of USER-CE1 reach USRE-CE2. The process of

forwarding packets is complete.

MPLS L2VPN
Terms
VPLS: Virtual Private LAN Service, expands Ethernet LAN to IP/MPLS
network. It provides users with virtual cross-WAN transparent LAN service.
VPWS: Virtual Private Wire Service, a point-to-point virtual private line

technology.
H-VPLS: Hierarchical VPLS, a technology enhancing VPLS expansibility.
PW: Pseudo wire, an indication of packet leased line, or virtual circuit

between two nodes.
AC: Attachment Circuit, the connection circuit between CE device and PE

device. It is a physical circuit.
VC: Virtual Circuit, a logical link between devices.
SVC: Spoke VC
uPE: User-facing PE
nPE: Network-facing PE.
Q-in-Q: an Ethernet encapsulation technology, it allows the frame with

802.1Q VLAN tag to be added with 802.1Q VLAN; it is also called VLAN
stack.
VSI: Virtual Switch Instance. Multiple VPLS forwarders connected through

PW form a VSI.
Basic Concepts
MPLS L2VPN provides L2 VPN service in the MPLS network. With the
MPLSL2VPN technology, carriers can provide users with L2 VPN services of
different media through the MPLS network, including ATM, FR, VLAN,
Ethernet, PPP, and HDLC. The MPLS network also provides common IP, L3
VPN, traffic engineering, and QOS service. As a result, carriers can save
the investment for constructing network.
MPLS L2VPN transparently transmits L2 data in the MPLS network.

Through the MPLS L2VPN network, L2 connection be created between
different sites. Take ATM as an example; configure an ATM virtual circuit in
each CE. Connect it with a remote CE of the MPLS network. This mode is
the same as the interconnection through ATM network.

With MPLS L2VPN, the carrier only needs to provide L2 connectivity for
users. The carrier does not need to participate in the route calculation of
VPN users. But the MPLS L2VPN is same as traditional L2 VPN (for example,
VPN provided by ATM PVC), there is the problem of N power. In each VPN,
the connection between any two CEs requires a link between CE and PE.
For PE device, if a VPN has N sites, N-1 physical or logical port connection
between CE and PE must be created. In MPLS L2VPN, PE device does not
participate in the route calculation of users. Therefore, the expansibility of
L2VPN is greater than L3VPN. But, L2VPN is less flexible.
PPVPN team of IETF worked out many frame drafts, in which, the two
most important types are Martini and Kompell. The Martini draft
implements MPLS L2VPN through expanding LDP; Kompell draft
implements it through expanding MP-BGP. Currently, Martini draft has
become a standard. Maipu supports this mode.
Relevant RFCs are as follows:
RFC4905, Encapsulation Methods for Transport of Layer 2 Frames over
MPLS Networks
RFC4906, Transport of Layer 2 Frames Over MPLS
MPLS L2VPN covers Virtual Private Wire Service (VPWS) and Virtual Private
LAN Services (VPLS). VPWS is a point-to-point virtual dedicated line
technology. It supports most link layer protocols. VPLS provides similar
LAN services in the MPLS network. Distributed users can access mutually
like accessing LAN directly.
VPWS
The basic principle of MPLS L2VPN is similar to that of BGP/MPLS VPN. It
also uses the label stack to implement the transparent transmission of
packets in the MPLS network. External label (tunnel label) is used to
transfer packets from one PE to another. Internal label (in MPLS L2VPN, it
is called VC label) is used to distinguish different connections in different
VPNs. The receiver PE determines the destination CE according to the VC
label. In the process of forwarding, the label stack changes as follows:

Figure 25-11 Forwarding process of MPLS L2VPN label
Illustration
L2PDU: link layer packets
V: internal VC label
T, T1: external Tunnel label, in the MPLS forwarding, the tunnel label will
be replaced.
I mple mentat ion Mode of Mar tin i

Martini mode defines the method of implementing MPLS L2VPN through
creating point-to-point link. It distributes VC labels by expanding LDP
signaling protocol. Therefore, the mode is also called LDP L2VPN.
For the LDP protocol to distribute VC labels, RFC4447 expands the LDP
protocol. In the LDP protocol, FEC type of VC FEC is added. In addition,
the two PEs switching VC labels are not directly connected. Therefore, LDP
must use target peer to create a session and then transfer VC FEC and VC
labels over the session. The process of distributing VC labels of LDP is the
same as the distribution process of other labels.
The L2VPN implemented through expanding LDP can carry ATM, FR,
Ethernet/VLAN, PPP, and HDLC. It requires that the link layer protocols in
each site in the VPN are the same. Only when all sites are Ethernet or ATM,
the L2 VPN network can be created. The disadvantage of L2VPN in Martini
mode is that only the point-to-point VPN L2 connection can be created.
The automatic discovery mechanism of the VPN is not supported.

The L2VPN of Martini mode focuses on the problem how to create virtual
circuit between two CEs. It adopts VC-TYPE + VC-ID to identify a VC. VC-
TYPE indicates the VC types including ATM, ETHERNET, VLAN, and PPP.
VC-ID is used to identify a VC. It must be unique in the PE device. The PE
connecting two CEs exchanges VC labels through LDP protocol and binds
the corresponding CEs through VC-ID.
When the LSP connecting two PEs is created successfully, and the
exchange and binding of labels are complete, the VC is complete. The two
CEs transmit L2 data through the VC.
Point-to-Multipoint Connection (VPLS)

Background and Featu res of VPL S Technolog y
VPLS virtual private LAN is one kind of MAN Ethernet technology. It can
connect each access points and implement point-to-point, point-to-
multiple point, and multiple point-to-multiple point Ethernet service in the
network topology.
According to the connection mode, VPLS uses WAN backbone network of

IP/MPLS to provides enterprise users with simulation LAN connection.
According to the service provision mode, the simulation LAN of VPLS
provides convenient and flexible Ethernet service. The simulation LAN
connection is transparent for the sub-LAN crossing the WAN. Each sub-LAN
is like being connected to the same switch.
VPLS uses IP/MPLS domain to classify the network and restrict the L2
service to the entrance/edge network. According to the networking
requirements, the MAN using VPLS technology includes the following two
modes.
1. The access network provides L2 service; the aggregation and core

network provide L3 service.
2. The access network and aggregation network provide L2 service; the

core network provides L3 service.
VPLS technology integrates the IP/MPLS, VPN, and Ethernet switching to

implement the multipoint-to-multipoint LAN interconnection in the WAN.
The advantage of VPLS is that: after the PE device with multipoint

connectivity is configured, when the CE devices are added, deleted, or re-
deployed in the VPN, you only need to re-configure the directly connected

PE device. If the point-to-point L2VPN is used, the peer PE device must be

re-configured.
Two Graphical C oncepts and W orking

Princ iple of VPL S
VPLS technology includes signaling control layer and data forwarding layer.
To implement the VPLS signaling control function, you can use BGP or
Targeted-LDP, which are respectively called Kompella VPLS and Martini
VPLS. Currently, only the VPLS control panel through Targeted-LDP is
supported.
In the signaling control panel, VPLS technology uses LDP signaling protocol
to create a pair of cross-backbone network unidirectional MPLS VC-LSP,
and create corresponding PW between PEs. Transmit the Ethernet data
unit in the backbone network through PW. VC-LSP can be configured
statically or dynamically configured by the LDP protocol. The created PSN
tunnel can carry multiple VPLS services. At the same time, It shields the
transmission data to protect the cross-backbone network security.
For data forwarding, in the MAN created according to the VPLS technology,
the PE devices in the network independently learn the MAC address and
maintain the MPLS FIB table, encapsulate/de-encapsulate the received L2
data according to RFC4447. The data is exchanged through the PSN tunnel
created by LSP of MPLS between PEs. One VPLS instance corresponds to
one enterprise customer. PE maintains one MPLS FIB entry for different
VPLS instance. In the maintained MPLS FIB table entry, the key is the
relationship between MAC and PW, namely, the relation between MAC and
LSP. Note that one PW is composed of two LSPs. MAC corresponds to the
labels of negative direction. Then, the data can be properly forwarded.
When the PE maintains the MPLS FIB table entries, the problem similar to
the MAC address aging of the switch will be encountered. VPLS
implements the function through the signaling protocol to send address
withdraw message. The function is implemented through a FEC TLV
(involved VPLS of the flag) contained in a LDP address withdraw message
and a MAC address TLV (optional).
VPLS technology emulates a transparent LAN. The sub-LAN is similar to be

connected to a switch. Loopback will be encountered inevitably. VPLS
technology solves the problem through two methods: Run STP in each PE
to transmit STP BPDU tunnel; perform full-mesh interconnection for all PEs
and support horizontal split mode. In the first method, STP is developed
from the LAN technology. Even in the LAN with many hosts, the
aggregation time is long. Although STP is improved through multiple ways,
it is not suitable to apply in large-scale network. The second method can
solve the loopback problem in certain scale. But when the number of PEs

increases, the full-mesh interconnection will increase the number of inter-

PE LSPs, decrease the flexibility of network deployment, the increase the
press of PE. You can solve the problem by applying hierarchical VPLS (H-
VPLS) in the large-scale network.
H-VPLS uses a centralized star layout to create the hierarchy: full-mesh

tunnels are maintained between backbone sites (specified to be PE); CE
devices are connected to a uPE; uPE is connected to one nPE. Through the
hierarchy, H-VPLS enables carriers to assign bandwidth dynamically in the
network to create unique section. H-VPLS can effectively use the
bandwidth, especially for the video application. Through pushing multiple-
point broadcast to the edge of the carrier network, H-VPLS decreases the
load of the core part of the MAN.
VPL S Packet Enca psula tion

The packets transmitted over AC and PW in the VPLS mode are Ethernet
frames. Two Ethernet encapsulation modes are supported: RAW and
TAGGED.
RAW: the packets can contain 802.1Q VLAN tag (or do not contain),
but the tag is meaningless for the two connected nodes. The tag is
transparently transmitted.
TAGGED: In each packet, at least one 802.1Q VLAN tag is contained.

The tag is meaningful for the two connected nodes, namely, the two
connected nodes have certain conventions for the tag (for example,
configure through signaling or manual operation).
For a PE device, AC or PW encapsulation mode selection means selecting

from the encapsulation formats of the packets output from AC or PW. If
the TAGGED mode is selected, for AC, the tag is meaningful for the CE-PE
two ends; for PW, the tag is meaningful for the two ends of pseudo line
connection between PE1 and PE2.
The packets received from AC, namely, the packets received from VLAN
interface, can contain tag of not. If the tag is contained, the tag can be the
Service-Tag (S-TAG) pressed by users for the SP network to distinguish
users. It can also be customer VLAN-Tag (C-TAG). To identify the S-TAG
or C-TAG, you should check the configuration of customers (packets first
match the TPID of OVID, then, match the TPID of IVID (namely, the per-
chip configured inner TPID). If the two TPIDs are equivalent, it is
considered to be OVID).
The packets with Tags are received from PW. If the PW is in TAGGED
mode, and the TPID in the packets is equivalent to the configured TPID,
the external TAG is considered to be S-TAG. Otherwise, it is C-TAG. If PW

is the RAW mode, the tag contained in the packets is C-TAG. The C-TAG is
transparently transmitted in the VPLS processing. It will not be deleted or
replaced.
1. Packet Encapsulation on AC
The packet encapsulation mode on the AC is determined by the VSI access

mode of the user VLAN interface. The user access modes include: Ethernet
access and VLAN access.
VLAN access: the Ethernet frame header sent to PE from CE or sent to CE

from PE contains a VLAN tag. The tag is a S-TAG pressed by the customer
for the SP network to distinguish customers.
Ethernet access: The uplink Ethernet frame header of CE and the downlink
Ethernet frame header of PD do not contain S-TAG. If the frame header
contains VLAN tag, it indicates that it is the internal VLAN tag of the user
packets and it is meaningless for PE devices. The tag of the internal VLAN
is called C-TAG.
Ethernet access mode is corresponding to the RAW encapsulation mode.

VLAN access mode is corresponding to the Tagged VPLS encapsulation
mode. The packet processing modes are as follows:
A. RAW Mode
Packets sent from AC do not process tag in the VPLS process, no matter
whether S-TAG or C-TAG exists. Whether S-TAG should be added to the
packets is determined by the port configuration and VLAN configuration.
B. TAGGED Mode
If the packets sent from AC contain S-TAG in the VPLS processing part,
judge whether the S-TAG is equivalent to the S-TAG of AC. If they are
equivalent, do not perform any operation; if they are not equivalent,
replace the tag. If the packets do not contain S-TAG, add the S-TAG of AC.
2. PW Encapsulation
The encapsulation mode in PW also contains two types: RAW mode and
Tagged mode.
A. RAW Mode
If PW uses the RAW mode, PW indicates the virtual links on two Ethernet
ports. Packets are transparently transmitted. The packets can contain Tags.

But the tag is meaningless fro ingress and egress PE. S-TAG will not be
transmitted over PW.
The packets received from AC will be output from PW. If the packets
contain S-TAG previously, delete the S-TAG first, then, press two layers of
MPLS labels before forwarding. If the packets without S-TAG are received,
press two layers of MPLS labels before forwarding.
B. TAGGED Mode
After the PW is configured to be the TAGGED mode, PW indicates the

virtual link between two VLANs. Each PW can represent different VLAN to
perform switching of different network. Each packet must contain a TAG.
The tag value is meaningful for the ingress PE and egress PE.
The packets received from AC should be output from PW. If the S-TAG is
contained previously, press two layers of MPLS labels before forwarding; if
the packets do not contain S-TAG, add an empty TAG (TAG VID=0) and
then press two layers of MPLS labels before forwarding.
Basic VPL S
The full-mesh interconnection structure is adopted in the basic VPLS.
In a full-mesh network, the session connections are created between PEs

in the same VPLS instance. Corresponding PW is generated. The packets
received from the CE can be forwarded to one or multiple local interfaces
(AC) and emulated LAN interface (PW). To prevent loopback of broadcast
packets in the network, the packets received from a PW will not be
forwarded to other PWs in the same VPLS instance. This is L2 horizontal
split. In the Full-mesh network, the horizontal-split is a basic function.
The following figure illustrates the full-mesh connection.

Figure 25-12 Full-mesh connection of basic VPLS
In the preceding figure, enterprise user A and user B are connected to

three branch LANs through VPLS technology respectively. Red line
indicates the traffic flow of user A and blue dotted line indicates the traffic
flow of user B. Each branch LAN of the user is connected to the IP/MPLS
backbone of the carrier through PE to form a VPLS instance. In the
preceding figure, user A belongs to VPLS instance 1; user B belongs to
VPLS instance 2. The traffic flow can be transmitted in the LAN mode
between branch LANs in the same VPLS instance. Even if multiple
enterprises access the same backbone network through the same PE, the
traffic flows are independent from each other logically. This ensures the
privacy of user data. The VPLS instance data of user A and user B are
isolated and cannot be interconnected.
To connect different branch sites, you should create the full-mesh

interconnection between PEs of the same VPLS instance. It is a data tunnel
created through the LSP of IP/MPLS. PE provides the Ethernet-based
bridge access mode for users. PE directly receives the data frames in the
Ethernet encapsulation format from user branch LAN, and determines
forwarding data to the proper LSP to reach the branch LAN at the other
end according to the destination MAC address. With the VPLS protocol
running on PE, the interfaces connecting user network on PE, like bridge
devices, provide L2 switching and MAC address learning capability. When
the PE receives data frames, it first checks whether the destination MAC
address of the frame header and the entries in the MAC address table are
matched. If any entries are matched, the data frame is forwarded to the
corresponding LSP for transmission; if no entry is matched, the same data
frame is broadcast to other logical ports serving the same VPLS instance.
When the PE device receives data from the home host of the MAC address
and learns the address, the MAC address table is updated. The following
data frames will be forwarded normally. This is similar to the working
principle of Ethernet switch.

H - VPL S
Hierarchical VPLS (H-VPLS), is a technology enhancing the VPLS
expansibility. It extends the access scope of the service provider VPLS and
decreases the network complexity to facilitate network management. At
the same time, the construction and operation cost is reduced. When
common VPLS is used, if one PE is expanded, full-connection with each PE
is required. If LDP is used, each PE device in the VIS should be configured.
N2 problems occur in the case of controlling the quantity of packets. After
the H-VPLS is used, expand a PE. You only need to modify the
configuration of the PE connected. In addition, the quantity of the packets
does not encounter the N2 problem.
New roles are introduced in H-VPLS: uPE, namely the user end PE, the PE
in the SP network connected with uPE is also called nPE, namely, network
end PE. uPE can be the L2 device with the Ethernet switch function only. It
can also be L3 device with switch and route functions. One end is
connected with PE of SP network; the other end (multiple interfaces)
connects multiple user CE devices in the building. uPE is one part of VPLS.
It connects with PE by creating a PW. The PW is also called SVC.
In the H-VPLS network, user end PE (U-PE), is usually placed at the

entrance. Therefore, it is also called Multi-Tenant Unit (MTU). If the MTU
only has the switching function, H-VPLS can use the L2 QinQ mode to
access. The mode is applicable to the early stage of the network
construction, when the accessed devices in the system do not have the
MPLS function. It can also be used in small access network. Only simple
access function is required. If the MTU has the routing and MPLS function,
H-VPLS can use the LSP mode of the MPLS to access. This mode can also
be used in the medium-scale access or aggregation point. The MPLS
network can extend to the user end to user other VAS of the MPLS
network.
The core network in the H-VPLS is the full-mesh topology. The edge
network is the Hub-and-Spoke star topology. In the preceding figure, uPE
is the hub, and the multiple CEs are equivalent to spoke. The top layer and
the edge layer of the core are connected through the pseudo wire.
In the H-VPLS network, if you want to make full connection like basic VPLS,
the uPEs will serve as PEs in the basic VPLS to make full connection. The
quantity of sessions is greater than the full-connected PE devices in H-
VPLS. Therefore, H-VPLS enhances the expansibility of the VPLS. As a
result, the N power problem caused by the expansion is prevented. For the
new uPE, configure the uPE and the connected PE. You do not need to

change other devices. Then, the maintainability and manageability are

improved.
For the signaling protocol between PE and uPE, one mode is that the PW
from PE to uPE is implemented through spoke VC function of LDP, to
implement H-VPLS; the other mode is the H-VPLS based on the QinQ
mode. It is only applicable to Ethernet link.
Figure 25-13 H-VPLS connection
In the H-VPLS, uPE can access multiple CEs. The CEs can belong to one or
multiple different VPLS instances. Between nPE and uPE, label or VLAN-ID
is used to distinguish VPLS instance. If the VLAN-ID is used, the QinQ
technology is required for the user data frame may contain VLAN-ID label.
For the CEs in the same VPLS instance connected to the same uPE to
exchange information, you can implement the function through L2 switch
on uPE. The participation of nPE is not required.
When CE2 wants to send data to the remote CE1 (through the CE
connected to the WAN of SP) in the VPLS instance, the Ethernet frame is
first sent to uPE1. If uPE1 fails to learn the DEST-MAC (broadcast frame or
multicast frame) of the frame, send the frame on other ports (AC, SVC,
and PW) of non-receiving port. After PE2 receives the frame, if the MAC is
not learned, the frame will be broadcast in all ports (PW, AC, and other
SVC) of the VPLS instance. If the DEST-MAC is learned, the frame will be
sent in the corresponding PW. If PE1 of the other end receives the data
frame, it will be forwarded according to the DEST-MAC, namely, if the
DEST-MAC is not learned, broadcast the frame on other ports of the VPLS
instance; if the DEST-MAC is learned, send the frame to the corresponding
AC, and then upload to the CE1.

1. Access Through SVC
The connection between uPE and PE can adopt VC, which is called Spoke-
VC (SVC). Use the SVC to identify the VPLS instance of the packets
entering PE. For the SVC, there are two conditions:
uPE has the switching capability. The processing for received

packets is described previously. Between uPE and PE, maintain
one PW for one VPLS.
uPE is a device without switching capability. Between uPE and PE,

for one VPLS, multiple PWs should be maintained. On uPE, the
mode is the same as that of VPWS. One ingress interface of uPE
accessed by CE corresponds to PE directing to PW. In this case,
the packets of uPE received from the AC will be sent be the PE for
processing. If the packets are sent to another CE of the same
VPLS connected with the local uPE, the switching process is
implemented on the PE. This mode has some disadvantages. But
it is only a compatible mode of using the deployed uPE devices
supporting VPWS.
For SVC, two VPLS instances (such as two cross-MAN VPLS instances) can
be connected. This is called Multi-domain VPLS. Two PEs connected by
SVC are called border-PEs. If multiple multi-domains should be
interconnected, perform full-mesh for border-PEs of each VPLS through
SVC. As a result, a L3 VPLS network is formed.
2. QinQ Access

Figure 25-14 Packet process of QinQ Access
The preceding figure illustrates the packet forwarding process of QinQ

access:
A. Enable QinQ at the CE access port. Add pressed VLAN tag for
the received packets to serve as multiplexing separation tag.
Between MTU and PE1, transparently transmit packets to PE1
through QinQ tunnel.
B. PE1 first determines the home VSI according to the VLAN tag
of the carried MTU, and then press multiplexing separation
label (MPLS label) corresponding to PW it according to the
destination MAC of the packets. At last, forward the packets.
C. After PE1 receives packets from the PW side, determines the

home VSI according to the multiplexing separation label (MPLS
label). Label VLAN tag according the destination MAC Forward
the packets to MTU through the QinQ tunnel. At last, MTU
forwards the packets to the CE.

M AC Add ress R estric tion

The MAC address is learned before switching is performed in the VPLS
instance. Then, search the MAC address table according to the destination
MAC address. One system can support multiple VSI instances. To prevent
oversized MAC address table of an instance, restrict the number of MAC
addressed that can be learned in the VSI.
M AC Add ress R ec ycling

When any fault is encountered, to quicken the convergence speed, notify
other PEs to clear local MAC table entries of the VSI, trigger the re-
learning of MAC address, and reconstruct the MAC forwarding path as soon
as possible. The recycling message of LDP protocol provides the
mechanism.
The address recycling message carries the MAC TLV. The devices receiving
the message delete the MAC address or re-learn the MAC address
according to the parameters specified by TLV.
The destination of the MAC address recycling message is relevant with the
fault type. The basic principle is to notify all devices that may learn the
MAC addresses. The fault types include: AC interface fault, Mesh-PE device
fault, and Spoke-PE device fault.
When the AC interface is faulty, you should send the MAC address
recycling message to all Mesh-PE devices and Spoke-PE devices.
When a Mesh-PE device is faulty, you should notify all Spoke-PE devices.
When a Spoke-PE device is faulty, notify all Mesh-PE devices and other
Spoke-PE devices.
Loopback Avo idance

Like the common Ethernet, the loopback avoidance must be taken into
consideration for the virtual Ethernet. In the VPLS, full-mesh and split
horizon must be adopted to avoid loopback.
In the basic networking environment, among all PEs of the same VPLS
instance, the PE will be created to form a full-mesh topology. As a result, a
PE can connect with other PEs through the PW. At the same time, PE will
be connected to CE through the access circuit (AC). In split horizon, the
broadcast, multicast, or the frames to be flooded that are received from
the PW will not be sent to other PWs (including itself) of the same VPLS
instance, but they can be sent to AC; the broadcast, multicast, or frames
to be flooded that are received from AC, except the AC itself, can be sent
to other PWs and ACs of the same VPLS instance, namely, the packets
received from the PWs at the public network will not be forwarded to other

PWs of the public network. The packets can only be forwarded to the
private network.
The core network created in this mode does not have loopback.
If the loopack structure caused by the backdoor exists in the CE network

of the VPLS, the users in the LAN should run the loopback avoidance
protocol, such as STP, to avoid loopback. For the loopback avoidance
control protocol of users, the carrier network does not perceive. It is
transparently transmitted as user data.
In the H-VPLS of the MPLS access network, to avoid loopback in the

forwarding, nPE devices must enable the L2 split horizon in the pseudo
wire connecting to other nPE devices. Disable the split horizon in the
pseudo wires connecting to uPE. On nPE, packets reaching the pseudo
wires connecting to uPE are forwarded to other pseudo wires. When the
packets reach the pseudo wires connecting nPE, the packets are forwarded
to the pseudo wires connecting uPE.
If a uPE connects a PE, since it is a star topology, there is no loopback in

the network. To prevent circuit fault, you can use the MPLS FRR to ensure
fast recovery of the fault. To prevent node fault, you can use the dual-
homing access nPE.
If uPE is dual-homed to two PEs, the L2 split horizon only cannot prevent
the loopback. You have to enable the spanning tree protocol between uPE
and nPE.
Comparison between VPLS and VPWS

VPWS VPLS
Concept Virtual private wire service. Virtual private LAN service.

Point-to-point virtual circuit Provide the virtual Ethernet service
connection, in users' eyes, is a circuit through WAN. In users' eyes, it seems
connecting to another end, providing that multiple VPN branches are
L2 packets transparent transmission. connected to a huge LAN provided by
the SP. In addition, bridge switching is
performed in the LAN.
VPN A point-to-point connection mode of A point-to-multipoint connection mode

L2VPN. of L2VPN.
Expansibility There is a network expansibility Provide good expansibility; operation
problem, namely, Npower problem. and maintenance are simple.
After providing multi-connectivity PE After providing multi-connectivity PE

devices for VPLS customers, when devices for VPLS customers, when you
you add, delete, or re-deploy CEs in add, delete, or re-deploy CEs in the L2
the L2 VPN, you must re-configure VPN, you must re-configure the
each peer PE. connected PEs.
Signaling LDP, the pseudo wire between PEs is LDP, the pseudo wire between PEs is
protocol called VC. called PW or SVC.
Encapsulation Add the VPWS label, and then add Add the VPLS label, and then add the
mode the label of external MPLS tunnel. label of external MPLS tunnel. Take
Take the FR AC access as an the FR AC access as an example:
example: When the AC interface encapsulation
When the AC interface encapsulation between CE and PE is FR, packets are
between CE and PE is FR, packets received on PE (the format should be:
are received on PE. Add the VPWS FR header + Ethernet header + Data).
label before the FR header, and The FR header should be removed.
then, add external MPLS label. Add VPLS label only before the
Ethernet header in the FR packet, and
then, add MPLS label.
AC access Multiple types of ACs are supported, Multiple types of ACs are supported,
such as, PPP, HDLC, Ethernet, VLAN, such as, PPP, HDLC, Ethernet, VLAN,
FR, and ATM FR, and ATM
Packet The network connection is as The network connection is as follows:
processing follows: CE1--------PE1--------P------- CE1--------PE1--------P-------PE2--------
flow PE2--------CE2. CE2.
Assume that: data communication is Assume that: data communication is
performed from CE1 to CE2, VC label performed from CE1 to CE2, VC label
is exchanged through the LDP is exchanged through the LDP protocol
protocol between PE1 and PE2. The between PE1 and PE2. The data
data processing flow is as follows: processing flow is as follows:
CE1PE1, PE1 adds the VPWS label CE1PE1, after PE1 receives packets,
and then adds the global route label, it learns the MAC address of CE1, and
send to PE2, after PE2 receives the then search the MAC address table in
packets, remove the label, and send the VPLS instance taking the
to CE2 interface. destination MAC as the key value. The
found destination MAC will be sent to
the PW of PE2, add VPLS label in the
encapsulation, and then add global
MPLS label, at last, send to PE2. After
PE receives the packets, learn the
address and then search the table. If
the AC is found, remove the label, and
then send it to CE2 interface. If the
packet is not found in the table,
perform flooding in the VPLS instance
based on the split horizon principle.
MPLS Traffic Engineering

With the expansion of network scale, network engineering and traffic
engineering arise.
Network engineering is to design the network to meet the traffic

requirements. The network designer should understand the transmission
of traffic in the network, and then purchases proper links and network

devices. The implementation of network engineering takes long time for

new links and devices should purchased and installed.
Traffic engineering is to design the traffic for normal transmission over the
network. Despite the efforts of network designers, the actual traffic in the
network is not the same as the predicted value. The increasing speed of
the traffic is beyond the expectation sometimes, but the network
designers cannot upgrade the network at once. Usually, rapid traffic
increase, emergency, or network accident may increase the requirements
for bandwidth at certain places. At the same time, some links in the
network is not fully utilized. The core concept of the traffic engineering is
to transfer the traffic, and the traffic blocking the link will be transferred to
the links not fully utilized. The traffic engineering is not the proprietary
product of MPLS, it is a universal solution. MPLS-based traffic engineering
is a trial. It attempts to use the link-oriented traffic engineering
technology and integrate the technology with IP routing technology.
At the ingress port (it can be considered to be source end of the data) of
MPLS network, the MPLS traffic engineering controls the path to specific
destination. Create the LSP and reserve network bandwidth in the passing
routes. Balance the traffic load and make full use of the link bandwidth.
The acronym of MPLS traffic engineering is MPLS-TE.
MPLS-TE ensures the bandwidth for each traffic by creating tunnels. After
the tunnel is created, the data is mapped to be FEC, and is forwarded in
the tunnel along the LSP path. At the head end of the tunnel, the tunnel
exists as a tunnel interface. Any traffic to pass the tunnel, should be sent
through the interface. In the network routing, the tunnel interface can be
found through static route and dynamic route. The routes directing to the
tunnel interface can be distributed through the dynamic route.
Another major feature of MPTS-TE is to implement communication

protection. Usually, the partial protection technology, namely, the fast
reroute technology is adopted; the graceful restart technology can also be
adopted.
Ground of MPLS Traffic Engineering

To implement the MPLS-TE, the following two modes can be adopted:
Constraint-Based Label Distribution Protocol (CR-LDP).
RSVP-TE expanded from RSVP.

RSVP-TE is supported by most vendors. The MP series switches support

RSVP-TE protocol.
In RSVP-TE, LABEL_REQUEST, EXPLICIT_ROUTE, SESSION_ATTRIBUTE,

RECORD_ROUTE and LABEL are added. They are respectively used in the
PATH and RESV messages. The objectives are used to request label,
complete path specified by the source end, description, recording route
and assignment label. The EXPLICIT_ROUTE can specify the path of the
tunnel. The specified path in the objective includes strict hop and loose
hop. Usually, the path calculated by the source end is described in strict
hop. If the data source cannot see all details in the entire network, or the
source end does not want to specify each hop in the path explicitly. you
can use the loose hop to describe the path. When the router node receives
the PATH message, and the path objective is processed, for the strict hop,
the first IPv4 address in the objective must be the address of the local
router, otherwise, the objective cannot be processed. For loose hop, the
router should generate a strict hop path for the source end, which contains
loose hop node. In addition, use new strict hop objective to put into the
PATH message for transmission.
Releasing MPLS-TE Network Topology

Information
The RSVP protocol cannot see the topology of the entire network.
Therefore, MPLS-TE should resort to the link state routing protocol (OSPF
or IS-IS) to release network topology information and calculate the tunnel
path of MPLS-TE.
The link state routing protocol (OSPF or IS-IS), according to the known
network topology and the advertised MPLS-TE network topology
information, calculates the shortest path of the required MPLS-TE tunnel.
R eleasing M PLS - TE Net work Topolog y

Inf orma tion on O SPF
The MPLS-TE network topology information released on OSPF includes two
types: switch address information and switch link information. The
released switch address information is the switch ID of the MPLS-TE, which
is used to identify the switch node in the MPLS-TE network topology. The
relevant information is as follows:
Information Function Command

Router ID The Router ID (one interface IP address) of switch(config-ospf)#
the switch, used to identify the switch node in mpls traffic-eng
the MPLS-TE network topology router-id address

The link information refers to the relevant information of MPLS-TE released

based on a single link. The corresponding configuration command is
configured in the interface mode. It includes the following content:

Link type Specify the link type, 1: point-to-point, 2: N/A
multiple access (for example, Ethernet)
Link ID On the point-to-point link, it is the OSPF N/A
ROUTER ID of the neighbor; on the multiple
access link, it is the interface address of the
designated router (DR)
IPv4 interface The interface IP address of the advertisement N/A
address switch on the link
Neighbor address Point-to-point link refers to the interface of N/A
the neighbor at the other end; multiple-point
interface refers to the interface address of the
DR
TE metric The cost of calculating tunnel path in the link mpls traffic-eng
admin-weight
Maximum physical The maximum physical bandwidth on the link bandwidth
link bandwidth interface
Maximum reserved Maximum bandwidth that can be reserved in ip rsvp bandwidth
bandwidth the link
Unreserved Unassigned reserved bandwidth of each N/A
bandwidth for each priority tunnel on the link
priority
Attribute flag The link attributes defined by the user. mpls traffic-eng
Include or exclude the link according to the attribute-flags
attribute in the path calculation.
R eleasing M PLS - TE Net work Topolog y

Inf orma tion on I S - I S
The MPLS-TE network topology information released on IS-IS includes
global MPLS-TE network topology information and the attached MPLS-TE
network topology information.
The global MPLS-TE network topology information released on IS-IS is as

follows:

Router ID The Router ID (one interface IP address) of switch(config-isis-
the switch, used to identify the switch node in af)# mpls traffic-eng
the MPLS-TE network topology router-id address
For the attached MPLS-TE network topology information of each link

released on IS-IS, the corresponding configuration command is configured
in the interface mode:
IPv4 interface The local interface address of the link. It is N/A
address used to generate the IP address identifying
LSP path.
Neighbor address Peer IP address on the point-to-point link; N/A
non-point-to-point link does not advertise the

neighbor address
TE metric The cost of calculating tunnel path in the link mpls traffic-eng
admin-weight
Maximum physical The maximum physical bandwidth on the link bandwidth
link bandwidth interface
Maximum reserved Maximum bandwidth that can be reserved in ip rsvp bandwidth
bandwidth the link
Unreserved Unassigned reserved bandwidth of each N/A
bandwidth for each priority tunnel on the link
priority
Attribute flag The link attributes defined by the user. mpls traffic-eng
Include or exclude the link according to the attribute-flags
attribute in the path calculation.
MPLS-TE Tunnel Path Calculation (CSPF)

The MPLS-TE tunnel path calculation is the CSPF calculation. In the
calculation, the shortest path to the tunnel end is calculated according to
the network topology state described by the link state route protocol
(OSPF or IS-IS) in the predefined restrictions.
The restrictions of path calculation includes bandwidth requirement,

created priority, included network nodes, excluded network nodes,
included link, and excluded link.
After the MPLS-TE tunnel path is calculated, transfer the path information
to the RSVP. Then, the RSVP creates a tunnel according to the path
information.
Creating MPLS-TE Tunnel Path

For the creation of MPLS-TE tunnel, the source end (the start switch of the
tunnel, Ingress) calculates a path to the tunnel end (end point switch of
the tunnel, egress) through the path described previously. Then, form the
explicit route object (ERO) and put into the Path message. If the source
end cannot calculate a qualified path leading to the path end, the source
end will not launch the process of creating tunnel. On the contrary, the
source node will form a Path message and send to the next-hop switch of
ERO.

Figure 25-18 Create a TE tunnel
As shown in figure 25-18, create a tunnel from switch1 to switch 4. The

path leading to switch4 calculated by switch1 is: Switch1 Switch2
Switch3 Switch4. Switch1 sends the Path message to the downstream
switch2. After switch2 receives the Path message, checks whether itself is
the node indicated by ERO in the Path message. If it is, accept the
message, and check whether the required bandwidth reservation can be
provided in the link (Switch2 Switch3) along the path indicated by ER. If
the bandwidth permits, continue sending Path messages to the
downstream. As a result, the Path messages are sent to the switch at the
end of the tunnel hop by hop along the path indicated by ERO. After
switch4 receives the Path message, send RESV message to the upstream
node switch3. The RESV message contains the labels of switch3 assigned
by switch4. After switch3 receives the downstream RESV message, it also
assigns labels to the upstream switch2. At the same time, reserve certain
bandwidth. As a result, the downstream node sends RESV messages to the
upstream node hop by hop and then distributes tunnel labels. At last, the
RESV message reaches the source end. The source end receives the RESV
message and reserve bandwidth. In this manner, the tunnel is created.
After the tunnel is created (UP), RSVP-TE protects the tunnel through the
"soft-state" mode. Soft sate means that the relevant states are maintained
through refreshing Path and RESV messages. It includes that each node
sends Path messages to the downstream respectively and sends RESV
messages to the upstream. Each node will wait for the upstream Path
messages and downstream RESV messages. If the wait times out, we
think that the tunnel maintenance is not required and thus delete the
relevant resource reservation. Independence means that the node will not
immediately send Path messages to the downstream after receiving
upstream Path messages, and will not immediately send RESV messages
to the upstream after receiving the downstream RESV messages. Of
course, the condition of receiving PATH messages and RESV messages for
the first time should be excluded. They have their own cycle. But the
cycles are not the same. The time of refreshing cycle changes around 50%
(up and down) to avoid global cycle synchronization. If the refreshing
cycle is 30 seconds, the possible refreshing time includes: 30s, 45s, 15s,
and 30s.

Forwarding Traffic on MPLS-TE Tunnel

To forward traffic on the MPLS-TE tunnel, you can configure static route,
automatic route and forwarding adjacency.
Stat ic Route
In static route, certain traffic is designated to forward through MPLS-TE
tunnel.
Auto matic Route

In automatic route, when the link state route protocol (OSPF or IS-IS) is
performing SPF route calculation, replace the next-hop of the route
reaching MPLS-TE tunnel end and the tunnel downstream with the MPLS-
TE tunnel. The basic principle of automatic route is that do not affect the
SPF route calculation. It only replaces the next-hop of relevant route.
For warding Ad jacenc y

In the link state route protocol (OSPF or IS-IS), advertise the MPLS-TE
tunnel as an adjacent link to change the network topology of the SPF route
calculation. The route whose next-hop is the MPLS-TE tunnel is generated.
The effect of forwarding adjacency is heavier than that of automatic route
for the network topology of the route calculation is changed.
Fee - Equi va lent Load Ba lance of M PL S - T E

For the route reaching the MPLS-TE tunnel end, the load between the IGP
path and MPLS-TE tunnel cannot be balanced. But the load between MPLS-
TE tunnels can be balanced. If you cannot fully control all traffic reaching
the MPLS-TE tunnel end, it is hard to implement traffic engineering in the
network.
For the route reaching MPLS-TE tunnel end, the load between IGP path
and MPLS-TE tunnel can be balanced. The downstream of the MPLS-TE
tunnel end does not belong to the scope of the traffic engineering.
Therefore, you need not worry whether the nodes support the traffic
engineering.

The next-hop generated by the automatic route cannot balance the load
with the next-hop generated by the forwarding adjacency. The automatic
route has the absolute priority. The forwarding adjacency is implemented
through affecting the SPF route calculation network topology. But the
automatic route is implemented by replacing the next hop of the relevant
route after the SPF route calculation. Therefore, the next hop generated by
the automatic route has the absolute priority.
For MPLS-TE tunnel or IGP path, the condition for determining load
balance is that the metric values of the paths are the same.
Tunnel Protection
In the network, the link and switch nodes carrying tunnel traffic may fail
owing to the internal fault. The failure of any link or node may cause
network breakdown and data loss. When the network encounters a failure,
the IGP protocol running in the large-scale network requires long time to
perform route aggregation. RSVP-TE also requires long time to update the
path used by the tunnel. During this period, the data in the tunnel will be
lost. In addition, the duration is long. You can use minutes to be the
metric unit. To prevent this symptom, RSVP-TE provides sophisticated
tunnel protection and restoration function. As a result, the data loss or
interruption caused by the link or node failure is reduced.
The protection includes two types: Full path protection and fast re-route
protection (FRR). MP switch supports FRR.
Full Path Protect ion

In the LSP ingress node, specify certain nodes (for example, exclude some
LSRs that the protected LSP passes) through the explicit route mode.
Create a sub-LSP. The sub-LSP and the protected primary LSP pass
different nodes. As a result, when the LSP is canceled because of a failure,
the head node of the LSP will map the route that is previously mapped to
primary LSP to the sub-LSP. Thus, the entire LSP is protected. Generally, it
is troublesome to configure a sub-LSP. Therefore, it is rarely used in
practice.
Fast Rerou te (FR R )

The Fast Reroute (FRR) is a partial protection technology of MPLS LSP. It is
used to protect the TE tunnel. Partial protection means that the protection
is implemented for a node or a link. If protection is implemented for each
node or each link, the entire LSP is protected. FRR is an important feature
of the MPLS traffic engineering. Presently, in most cases, using the MPLS
traffic engineering is to use the FRR function.

Actually, the essence of FRR is "take precautions before it is too late",

namely, create the backup LSP before any fault is encountered. When the
protected node or link fault is detected, the traffic is switched from the
protected LSP to the backup LSP to avoid the loss of traffic.
Partial protection includes:
1. Link protection: protect the link between two switches.
2. Node protection: protect a node of the switch. The node protection

includes the link protection.
Figure 25-19 Node protection
As shown in the preceding figure, the backup LSP protects R2 node. At the
same time, it protects the link R1 > R2.
RFC4090 defines the methods of implementing FRR:
1. One-to-One
One-to-One mode means that one backup LSP protects one protected
tunnel. See the following figure. The red LSP is the backup LSP, which is
called Detour LSP in this mode. It protects the primary LSP (TUNNEL). The
Detour LSP starts from S1 switch. S1 switch is called point of local repair.
It is the ingress device of the detour. The Detour LSP bypasses the
downstream node S2 of PLR (S1). The destination of the Detour LSP is the
Egress of the protected Tunnel. It meets with primary LSP in S3 and is
merged into the primary LSP. This action is called "Merge". Therefore, S3
is called Merge Point (MP). Actually, the mergence operation is not
necessary. But if the merge operation is not performed, multiple LSP
signaling should be maintained after the MP. Therefore, the mergence
operation is required.

Figure 25-20 One-to-One mode
The detour LSP in the One-to-One mode exists depending on the protected
LSP. If the protected LSP is deleted, the Detour LSP related with the
tunnel will be deleted.
In the One-to-One Mode, the Ingress node of the primary LSP initiates the
FRR requirements. Each node (including Ingress) in the primary LSP try to
create the Detour LSP with itself as the start point. Therefore, the
expansibility of the protection mode is faulty. MP switch does not support
the One-to-One mode.
2. Facility Mode:
Facility is another mode to implement the FRR mechanism, as shown in

figure 25-21 and 25-22. The backup is implemented through a bypass
tunnel. The bypass tunnel is an independent tunnel. It exists and is
maintained independent from the protected tunnel. Actually, it is an
ordinary tunnel. The maintenance for relevant path message and RESV
message is independent. It is different from the non-independence of
Detour LSP in the one-to-one mode.
Figure 25-21 Facility mode of node protection
In the facility mode, the end node of the bypass tunnel is the Next Next
Hop (NNHOP) of the PLR, as shown in S3 device in figure 25-21. It
bypasses the downstream node (S2) of the PLR.

Figure 25-22 Facility mode of link protection
In the facility mode of the link protection, the end node of the bypass
tunnel is the next hop (NHOP) of the PLR. It bypasses the S1->S2 link
between the PLR and the downstream node (S2).
Owing to the independence of the bypass tunnel, it can protect multiple

tunnels to implement the 1: N protection. In this mode, the expansibility is
better. Therefore, the facility mode is also called "Many-to-One" mode.
MP series switches support the facility mode protection, including the link
protection and node protection.
Graceful Restart
Graceful Restart (GR), means that the forwarding service is not
interrupted when the protocol is restarted.
The core of the GR mechanism is: when the protocol of a device is

restarted, it can inform the peripheral devices in certain time to maintain
the stability of the neighbor relation and the route. After the protocol is
restarted, the peripheral devices help it to synchronize routing information.
Restore the information of the device as soon as possible. In the process
of restarting protocol, the network route and forwarding are stable. The
packet forwarding path is also not changed. The entire system can forward
IP packets uninterruptedly. The process is called Graceful Restart. It
includes two roles:
GR Restarter: GR restarting router, refers to the router

performing protocol restart implemented by the administrator or
triggered by the fault. It must have the GR capability.
GR helper: the neighbor of the GR restarter, it helps the GR restarter

to maintain the stability of route relation. It must have the GR
capability.
After the TE tunnel is constructed, start the RSVP GR HELLO at the

connected device interface to check the protocol state of RSVP, as shown
in the following figure. If S1 and S3 hello time out, S2 protocol is
considered to be restarted. S1 and S3 maintain the relevant states and
information of RSVP protocol. When the S2 device is restarted, S1 device

will send the path message. S3 device will send the recovery path
message for helping S2 device to restore the state.
Figure 25-23 Graceful restart
MPLS OAM
Introduction to MPLS OAM
According to the actual demands of the carrier network, the network
management work can be classified into three types: operation,
administration, and Maintenance (OAM). The operation covers prediction,
planning, and configuration for routine network and services; the
maintenance covers test and fault management.
ITU-T defines the OAM function as follows:
1) Monitor the performance and generate maintenance

information, according to which, evaluate the network stability;
2) check the network fault periodically. Various maintenance and

alarm information is generated.
3) Dispatch or switch to other entities and bypass failed entities

to ensure normal running of the network.
4) Transfer the fault information to the management entity.
The OAM function is very important in the public network for it can simplify
the network operation, check the network performance, and reduce the
operation cost. In the network providing QoS, OAM is particularly
important. Relevant OAM function is defined for traditional SDH/SONET
and ATM. MPLS, as the key carrying technology of the expansible next
generation network, provides multiple-service capability with QoS.

Therefore, MPLS requires the OAM capability urgently. The OAM

mechanism should prevent the network fault and quickly diagnose and
locate network fault. Finally, the network availability and QoS will be
improved.
MPLS OAM Technology

L SP Ping/L SP Traceroute
1. Background
In the MPLS network, when the label switch path (LSP) failed to forward
user data, the control panel requires a method to detect MPLS LSP data
graphical fault. But in the detection methods of traditional IP network, IP
Ping and Traceroute cannot detect the connectivity of the MPLS network.
Successful ping only indicates that the IP forwarding is normal, but it

cannot indicate that the MPLS LSP is connected. When the IP route is
normal but the LSP is disconnected, traditional ping packets can be
forwarded to the destination through IP.
Traditional Traceroute cannot locate MPLS LSP faults hop by hop and
return relevant information of LSP. Successful IP forwarding does not
mean that the LSP is connected. In addition, standard ICMP packets
cannot return relevant information including label stack and downstream
mapping of LSP.
A method for the MPLS network to detect the faults is required. This
document describes a simple but effective mechanism-MPLS LSP
Ping/Traceroute for detecting the faults of the MPLS LSP.
2. Basic Principle
Similar to traditional IP Ping/Traceroute, the MPLS LSP Ping/Traceroute is

also based on the Echo Request and Echo Reply mode. But the LSP
Ping/Traceroute adopts IPv4 UDP protocol instead of ICMP protocol. The
protocol port is 3503. The two basic functions of MPLS LSP Ping/Traceroute
are: 1, checking the connectivity of the forwarding panel; 2, checking the
consistency of the control panel and forwarding panel.
It adopts the packets of specific FEC forwarding class to verify the integrity
of the LSP (from ingress LSR to egress LSR) in the FEC. The information of
the home FEC is carried in the MPLS echo request message.
In the LSP ping operation, the echo request packets are encapsulated in
the UDP packets, including serial number and NTP time stamp parameter.

The destination port number is well-know port 3503. When the MPLS is
processing the LSP Ping request messages, the forwarding policy same as
that of the FEC packets is adopted. When the ping command is used to
test the connectivity, the packets reach the LSP egress port. The LSR
checks the packets to verify that whether the port is the actual egress port
of the FEC.
LSP Traceroute mode is used as a method for locating faults. The LSR that
initiates the test sends ping packets to the destination LSR. The initial
value of TTL is 1, the step value is 1. The LSRs check the packets to return
the information of relevant control panel and data panel.
M PLS BFD
1. Introduction to the Protocol
Bidirectional forwarding detection (BFD) is a solution for quick detection. It

provides a detection method of light load and short duration. In many
aspects, the BFD is similar to the neighbor detection of well-know routing
protocol (such as OSPF). The BFD can create sessions between a pair of
systems. The two ends of the session checks the connectivity of the path
by sending packets periodically. If a system failed to receive the detection
packets from the opposite end in certain time, the bidirectional path of the
adjacent system is faulty.
The BFD protocol describes the mode of implementing bidirectional

detection. There are two modes: Asynchronous mode and query mode.
MPLS BFD adopts asynchronous mode to implement the bidirectional fast
detection of the LSP path.
In the asynchronous mode, the BFD control packets are mutually sent
between systems periodically. If a system failed to receive the BFD control
packets from the opposite end in certain time, announce that the session
is down and notify the control panel or the forwarding panel.
2. Creating a Session
When the BFD is used to detect the fault of the MPLS LSP, a BFD session is
created between the ingress LSR and the egress LSR. The BFD control
packets are transmitted along the data path same as that of LSP.
In the asynchronous mode, the creation of a BFD session is triggered by

the initiative party.
A. The ingress LSR (the initiative party of the session) sends an echo
request packet carrying local session discriminator.
B. The egress LSR (the passive party of the session) replies an echo
replay packet carrying local session discriminator.

C. The ingress LSR sends a BFD control packet to the egress LSR.
Set the value of Your Discriminator field to the session
discriminator of egress LSR to enter the Down state.
D. The egress LSR receives the BFD control packets of the ingress
LSR. Send a BFD control packet to the ingress LSR to enter the
Down state.
E. After the ingress LSR receives the BFD control packets of the
egress LSR, the state changes from Down to INIT. Determine the
sending interval and detection time of the local packets according
to the time parameter carried in the packets. Start the timer of
sending BFD packets. Send the BFD control packets according to
the negotiated interval.
F. The egress LSR receives the BFD control packets of the ingress
LSR. The state changes from Down to Up.
G. After the ingress LSR receives the BFD packets of the egress LSR,
the state changes from INIT to Up.
H. Thus, a BFD session is created. After the session is created, the

egress LSR and the ingress LSR will send BFD control packets
periodically according to the negotiated interval.
3. Session State Machine
The creation of the session covers three handshaking processes. After the
creation process, the session become Up. Negotiate the corresponding
parameters. The subsequent state changes are based on the fault
detection results. Relevant processing should be performed. The state
machine migration is as follows:
Figure 25-24 BFD state migration

IPv6 Network Protocol

Technology
Overview
With the rapid development of the IP network scale and services, the user
quantity of the IP network increases and more and more problems of the
IP network appear, such as insufficient address space and security
problem. To solve the Internet problems, especially the problem of
insufficient address space, IETF defines the next-generation Internet
protocol based on IPv4 in 1992, called Ipng or IPv6.
The maximum problem solved by IPv6 is to enlarge the address space.

Besides, compared with IPv4, IPv6 has advantages in other aspects, such
as security, service quality, and mobility. One obvious feature of IPv6 is
the plug-and-play function. After the node is directly connected to the
network, it can be used without any manual configuration, which makes
the network management and control become simpler; secondly, the node
just need to know its own link-layer address and the subnet prefix of the
local network so that the node can get the unique IPv6 address via the
IPv6 no-status or all-status auto configuration, so as to become one part
of the network; besides, IPv6 realizes the better supporting for the node
mobility. Theses functions are realized via the neighbor discovery protocol.
The interaction between all hosts and gateway devices in one subnet is
realized via the neighbor discovery protocol.
This chapter describes the basic theory of IPv6 protocol.
Main contents:
IPv6 packet format
ICMPv6 protocol
IPv6 address discovery protocol
IPv6 address

IPv6 searching address model
IPv6 extension header
IPv6 Packet Format

In IPv6, the packet header takes 64 bits as the unit and the total length of
the packet header is 40 bytes. The IPv6 protocol defines the following
fields in its packet header:
Version: The length is 4 bits. For IPv6, the field must be 6;
Type: The length is 8 bits, indicating that the packet provides one
distinguish service. At first, RFC 1883 defines the field as 4 bits and
names as priority field. Later, the name of the field changes to Type.
The latest IPv6 Internet scheme, it is called service flow type. The
definition of the field is independent from IPv6 and currently, it is not
defined in any RFC. The default value of the field is all-0.
Flow label: The length is 20 bits, used to identify the packets that belong
to one service flow. One node can serve as the sending source of multiple
service flows. The flow label and source node address uniquely identify
one service flow. At first, RFC 1883 defines the field as 24 bits, but the
after the length of the type field increases to 8 bits, the flow label field is
forced to reduce the length as compensation.
Payload length: The length is 16 bits, including the byte length of the
packet payload, that is, the bytes contained in the packet behind the IPv6
header. It indicates that when calculating the payload length, the length of
the IPv6 extension header is contained.
Next header: The field indicates the protocol type in the header field
following the IPv6 header. Similar to the IPv4 protocol field, the next
header field can be used to indicate that the upper layer is TCP or UDP,
but it can also be used to indicate the existing of the IPv6 extension
header.
Hop threshold: The length is 8 bits. After one node forwards the packet,
the field is reduced by 1. If the field reaches 0, the packet is dropped. In
IPv4, there is the life time field with the similar function, but different from

IPv4, people are unwilling to define one upper threshold about the packet
life time in IPv6, which means that the function of judging the timeout for
the outdated packet can be completed by the high layer protocol.
Source address: The length is 128 bits, indicating the address of the
sender of the IPv6 packet.
Destination address: The length is 128 bits, indicating the address of the
receiver of the IPv6 packet. The address can be one unicast, multicast or
any on-demand address. If the routing extension header is used (define
the special routes that one packet must pass), the destination address can
be the address of one intermediate node, but not the final address.
ICMPv6 Protocol
The IP node needs one special protocol to exchange packets, so as to get
to know the information about IP. ICMP is just suitable for the requirement.
When the IPv4 is upgraded to IPv6, ICMP experiences some modification.
The latest ICMPv6 is defined in RFC 2463. The ICMP packet can be used to
report the error and the information status, as well as the Internet
detection (Ping) of the packet and route tracking.
The generation of the ICMP packet is from some errors. For example, if
one gateway device cannot process one IP packet because of some reason,
it may generate one type of ICMP packet and directly return the packet to
the source node of the packet. And then the source node adopts some
methods to correct the reported error status. For example, if the reason
why the gateway device cannot process one IP packet is because the
packet is too long and cannot be sent to the network link, so the gateway
device generates one ICMP error packet to indicate that the packet is too
long. After receiving the packet, the source node can use the packet to
confirm one more suitable packet length and re-send the data via a series
of new IP packets.
RFC 2463 defines the following packet types (excluding the group packets
defined in the document):
1. The destination address is unreachable;
2. The packet is too long;
3. Timeout;

4. The parameter problem;
5. The echo request;
6. The echo response;
The following describes these packets in details.
The destination address is unreachable:
The packet is generated when the gateway device or the source host can
forward one packet because of the reasons except for the blocking of the
service flow. The error packet has four codes, includeing:
0: There is no the route to the destination address. The packet is

generated when the gateway device does not define the destination route
of the IPv6 packet. The error is generated when the gateway device does
not set the default route.
1: The communication with the destination address is prohibited by the

administrator. When one prohibited service flow wants to reach one host
in the firewall, the packet filter firewall generates the packet.
2: The address is unreachable. The code indicates that there are some
problems when the IPv6 destination address is parsed to the link-layer
address or the link layer of the destination network goes to its destination.
3: The port unreachable. This happens when the high-layer protocol (such
as UDP) does not listen to the destination port and the transmission layer
protocol does not have other methods to inform the problem to the source
node.
The packet is too long:
When the gateway device that receives one packet cannot forward the
packet because the packet length is larger than the MTU of the destination
link, generate one packet about the too long packet. The ICMPv6 error
packet has one field to indicate the MTU value of the link that results in
the problem. During the process of discovering the path MTU, this is one
useful error packet.
Timeout
When the gateway device receives one packet with hop threshold 1, it
must reduce the value before forwarding the packet. If after the gateway
device reduces the value, the hop threshold field changes to 0 (or the
gateway device receives the packet with hop threshold field), the gateway
device must drop the packet and send the ICMP timeout packet to the
source node. After the source node receives the packet, it can be regarded

that the original hop threshold is set too small (the actual route of the
packet is larger than the expected) or one routing circulating results in the
failure of the packet delivery. The packet is useful in the tracking route
function. With the function, one node can identify all gateway devices on
the path of one packet from the source node to the destination node. Its
working mode is as follows: First, the hop threshold of the packet to the
destination is set as 1. The first gateway device that the packet reaches
reduces the hop threshold to 0 and returns one timeout packet. In this
way, the source node identifies the first gateway device on the path. And
then if the packet must pass the second gateway device, the source node
re-sends one packet with hop threshold 2 and the gateway device reduces
the hop threshold to 0 and generates another timeout packet, which ends
when the packet reaches the destination address and meanwhile, the
source node also gets the timeout packet sent from each intermediate
gateway device.
Parameter problem
When some part of the IPv6 header or the extension header has problem,
the gateway device cannot process the packet, but just drops it. The
gateway device should generate one ICMP parameter error packet to
indicate the problem type (such as the error header field, un-identifiable
next header type or un-identifiable IPv6 option) and use one pointer value
to indicate which byte has the error.
ICMPv6 echo function
ICMPv6 contains one function that is not related with the error. All IPv6
nodes need to support two kinds of packets, that is, the echo packet and
echo response. The echo request packet can be sent to any correct IPv6
address and contain one echo request ID, one order number and some
data. The echo request ID and order number are optional, but they can be
used to distinguish the responses of different requests. The data of the
echo request is also one option and can be used for diagnosis. When one
IPv6 node receives one echo request packet, it must return one echo
response packet. The response packet contains the same request ID, order
number and the data carried in the original request packet. The ICMPv6
echo request/response packet pair is the basis of the ping function. Ping is
one important diagnosis function, because it provides one method to
confirm whether one special host is connected to the same network with
other hosts.
IPv6 Address Discovery Protocol

The neighbor discovery protocol is one basic part of the IPv6 protocol. It
realizes all functions of the re-direction protocol and gateway device

discovery part in the ARP and ICMP of IPv4, and has the mechanism of
checking the unreachable neighbor. The neighbor discovery protocol
realizes the functions of gateway device and prefix discovery, address
resolution, next-hop address confirming, re-direction, neighbor un-
reachable checking and repeated address checking. The functions of the
link-layer address change, input address balance, any-cast address and
proxy advertising. The neighbor discovery protocol adopts five types of
IPv6 control information packet (ICMPv6) to realize the functions of the
neighbor discovery protocol. The five types of messages are as follows:
1. Router Solicitation: When the interface works, the host sends the
router request message to request the gateway device to generate the
router Advertisement message at once, but do not need to wait for the
next scheduled time;
2. Router Advertisement: The gateway device periodically advertises its

existing and the configured link and network parameters, or the
answers the router request message. The router advertisement
message contains on-link confirming, the configured prefix of the
address and the hop quantity limitation.
3. Neighbor Solicitation: The node sends the neighbor request message to

request the link-layer address of the neighbor, so as to verify the
reachabillity of the neighbor link address saved in the buffer or
whether its own address is unique on the local link;
4. Neighbor Advertisement: It is the response of the neighbor request

message. The node can send the neighbor advertisement actively to
advertise the change of the link-layer address rapidly;
5. Redirect: The gateway device informs the host via the re-direction
message. For the special destination address, if it is not the best route,
inform the host to reach the best next hop of the destination address.
IPv6 has one design requirement. Even in the limited network, the host
must work correctly and it is unnecessary to save the route table on the
gateway device or have fixed configuration. Therefore, the host must
configure automatically and learn the information about how to send the
data to the destination. The memorizer that saves the information is called
cache. The data structure is the queue of a series of records, called entries.
The information of each entry has some validity and you need to clear up
the entries in the cache, so as to ensure the space size of the cache. The
host needs to maintain the following information for each interface:
Neighbor cache: A group of entries about one single neighbor. The

neighbors receive the latest data flow. The entry is the key of connecting
the unicast address and the included information has the link-layer
address, the flag that indicates the neighbor is the gateway device or host,
the pointer that points to any queue of waiting for completing the address
resolving the packet, and so on. The neighbor cache entry also includes
the information used by checking whether the neighbor is unreachable,
such as reachable status, the times of detection without response, and the
next time of checking the neighbor unreachable.

Destination cache: A group of entries about the destination nodes of the

recent received data flow. The destination cache includes on-link and
off-link destination and provides some indirect addressing. The
destination cache can map the destination IP address to the IP address of
the next-hop neighbor. The cache updates the information via the re-
direction message. If the accessory information that does not have direct
relation with the neighbor discovery is saved in the destination cache
entries, such as path MTU(PMTU) and the round time set by the
transmission protocol, the execution becomes more convenient.
Prefix list: The list of a group of the prefixed of the on-link addresses.
The entries of the prefix list are generated from the information received
by the router advertisement. Each entry has one related invalid timer
value (depending on the advertisement information), which is used to
abandon the prefix when the prefix becomes invalid. Unless one new
(limited) value is received in the later advertisement, the special
unlimited timer value rules that the prefix is valid forever. The local link
prefix is in the prefix list with the unlimited invalid timer regardless
whether the gateway device is advertising the prefix. The received router
advertisement should not modify the invalid timer of the local link prefix.
Default router list: The list of the routers that receive packets. The entries
of the router list point to the entries in the neighbor cache. The default
selection algorithm of the gateway device is: Select the known reachable
gateway devices, but do not select the gateway device whose reachability
is not confirmed. Each entry has one related invalid timer value (got from
the router advertisement information), which is used to delete the entries
that are not advertised any more.
The above data structure can be realized by different methods. One

realizing method is to use one single longest matching route table for all
data structures. No matter which specified realizing method is adopted, to
prevent repeated neighbor un-reachability checking, the neighbor cache
entries of the gateway device can be shared by all destination cache
entries that use the gateway device.
The neighbor cache contains the information maintained by the neighbor

un-reachability checking algorithm. The neighbor reachability status is the
most key information, whose value is one of the following five values:
1. INCOMPLETE: Performing the address resolution and the link-layer

address of the neighbor is not confirmed;
2. REACHABLE: The neighbor is in the recent reachable status (before

less than 10s);

3. STALE: The neighbor is un-reachable before the data flow is sent to

the neighbor and you cannot check the reachability;
4. DELAY: The neighbor is not reachable any more and the data flow is
sent to the neighbor recently. Do not detect the neighbor at once, but
send detection information after one short delay, which can provide the
reachability confirming for the upper protocol;
5. PROBE: The neighbor is not reachable any more; meanwhile, send the
unicast neighbor request detection to check the reachability.
The sending algorithm of packets:
When the node sends the packet to the destination, use the destination
cache, prefix list, and default router list to confirm the suitable next-hop IP
address and then the gateway device queries the neighbor cache to
confirm the link-layer address of the neighbor.
The operation of confirming the next hop of the IPv6 unicast address is as
follows:
The sender uses the prefix in the prefix list to perform the longest prefix
matching, so as to confirm the destination is connected or un-connected.
If the next hop is connected, the next-hop address is the same as the
destination address. Otherwise, the sender selects the next hop from the
default router list. If the default router list is null, the sender regards that
the destination is connected.
The information confirmed by the next hop is saved in the destination

cache and the next packet can use the information. When the gateway
device sends packets, first check the destination cache. If the destination
cache does not have the related information, activate the process of
confirming the next hop.
After learning the IPv6 address of the next-hop gateway device, the
sender checks the neighbor cache to confirm the link-layer address. If
there is no existing next-hop IPv6 address entry, the work of the gateway
device is as follows:
Create one new entry and set its status as INCOMPLETE;
Start the address resolution;
Make the transmitted packets in a queue;

When the address resolution ends, get the link-layer address and save it in
the neighbor cache. Here, the entry becomes the new reachable status
and the packets in the queue can be transmitted.
For the multicast packet, the next hop always is regarded as being
connected and confirm that the link-layer address of the multicast IPv6
address depends on the link type.
When the neighbor cache starts to transmit the unicast packet, the sender
checks the related reachability information and validate the neighbor
reachability according to the neighbor un-reachable checking algorithm.
When the neighbor is un-reachable, execute the operation of confirming
the next hop and check whether another path to the destination is
reachable.
If the IP address of the next-hop node is known, the sender checks the
link-layer information about the neighbor in the neighbor cache. If there is
no entry, the sender creates one and sets its status as INCOMPLETE.
Meanwhile, enable the address resolution and make the packets whose
address resolution is not complete in a queue. For the interfaces with the
multicast function, the address resolution process is to send one neighbor
request information and wait for one neighbor advertisement. When
receiving one neighbor advertisement response, the link-layer address is
saved in the neighbor cache and send the packets in the queue.
When transmitting the unicast packets and every time reading the entries
of the neighbor discovery cache, the sender checks the related information
of checking the neighbor un-reachability according to the algorithm of the
neighbor un-reachability checking, but the un-reachability checking makes
the sender send out the unicast neighbor request, so as to check whether
the neighbor is reachable.
When the data flow is sent to the destination for the first time, execute the
operation of confirming the next hop and then if the destination still can
communicate normally, the destination cache entries can continue to be
used. If the neighbor un-reachability algorithm decides to end the
communication on one point, execute the operation of confirming the next
hop again. For example, the traffic of the faulty gateway device should
switch to the gateway device that works normally and the data flow to the
mobile node may be re-routed to mobile agent.

When the node re-confirms the next hop, do not need to drop the entries
of the whole destination cache. Here, information of the PMTU and round
timer value is useful.
Functions of Neighbor Discovery Protocol

1. Router and prefix discovery
The gateway device must drop the router request and router
advertisement messages that do not meet the validity check
unconditionally.
The router discovery function is used to identify the gateway device that is
connected to the specified link and get the prefix and configured
parameters related with the address auto configuration.
As the response for the request message, the gateway device should
periodically send the multicast router advertisement message to advertise
the reachability of the node on the link. Each host receives the router
advertisement message from the gateway device connected to the link
and sets up the default router list (the gateway device used when the path
to the destination is un-known). If the gateway device frequently
generates the router advertisement messages, the host can learn the
existing of the gateway device within several minutes. Otherwise, use the
neighbor un-reachability check.
The router advertisement message should contain the prefix list that is
used to confirm the connection reachability. The host uses the prefix got
from the router advertisement message to confirm whether the destination
is being connected and whether it is reachable directly or whether it is
non-connected or is reachable only via one gateway device. The
destination is connected, but the destination is not covered by the prefix
learned by the router advertisement message. In this case, the host
regards that the destination is non-connected and the gateway device
sends the re-direction message to the sender.
The router advertisement message should contain some flags, which

advertise the host how to execute the auto configuration of the address.
For example, the gateway device can specify the host to use the status
address configuration or the non-status address configuration.

Besides, the router advertisement should contain the parameters managed

by the simplified network in centralized manner, such as the default value
of the hop limitation parameter used in the packet generated by the host
or the link MTU value.
When the host sends the router request message to the gateway device,
the gateway device should send the router advertisement message at once,
which can speed up the configuration of the node.
2. Address resolution
The IPv6 node resolves the IPv6 address to the link-layer address via the
neighbor request and neighbor advertisement message; do not execute
the address resolution for the multicast address.
The node activates the address resolution process via the multicast
neighbor request message. The neighbor request message is used to
request the target gateway device to return its link-layer address. The
source gateway device contains its link-layer address in the neighbor
request message and multicasts the neighbor request message to the
multicast address of the request node related with the target address. The
target gateway device returns its link-layer address in the unicast neighbor
advertisement message. With the pair of messages, the source and
destination gateway devices can resolve the link-layer address of each
other.
3. Re-direction function
When the packet must be sent to one non-connected destination, the

gateway device that forwards the packet needs to be selected. When the
selected gateway device is not the best next hop as the next hop of
transmitting messages, the gateway device needs to generate the re-
direction message and inform the source node that there is one better
next-hop gateway device to the destination.
The gateway device must confirm the local link address of each neighbor
gateway device, so as to ensure that the target address of the re-direction
message identifies the neighbor gateway device according to the local link
address.
When the source terminal does not answer the re-direction message
correctly or the source terminal ignores the un-authenticated re-direction

message, to save the frequency band and the processing expense, the
gateway device must limit the rate of sending the re-direction message.
When receiving the re-direction message, the gateway device cannot

update the route table.
4. Neighbor un-reachability check
Any communication that passes or reaches the neighbor is interrupted

because of various reasons, including hardware fault and hot swap of the
interface card and so on. If the destination becomes invalid, it is
impossible to recover; if the path becomes invalid, it is possible to recover.
Therefore, the node should actively track the reachable status of the
packet to the neighbor.
All paths between the host and the neighbor node should perform the
neighbor reachability check, including the communication between the
host and the host, between the host and the gateway device, and between
the gateway device and the host. It can also be used between the gateway
devices to check the neighbor or the fault of the neighbor forward path.
If the gateway device receives the confirming recently that the IP layer of
the neighbor has received the packet sent to it recently, the neighbor is
reachable. The un-reachability checking of the neighbor uses two methods
to confirm: One is the prompt from the upper protocol, providing the the
connection is being processed confirming; the other is that the gateway
device sends the unicast neighbor request message and receives the
responded neighbor advertisement message. To reduce the unnecessary
network traffic, the detection message is only sent to the neighbor.
The neighbor un-reachability checking and sending packet to the neighbor

are performed at the same time. When confirming the neighbor
reachability, the gateway device continues to send packets to the cache
link-layer address; if no packet is sent to the neighbor, do not send the
detection.
After IETF made the standard text RFC2461 of the neighbor discovery
protocol in Dec. 1998, the neighbor discovery becomes the important
protocol used by the IPv6 node, solving the interoperation problem
between all nodes connected on one link.

The current IPv6 standard are already stable and the related products and
devices developed by the international manufacturers also become mature,
but the requirement of China market for IPv6 technology is not clear.
Therefore, the IPv6 technology is still at the practice and operation phrase
of the trial network in China. With the speedup of the commercial process
of the IPv6 network application, the neighbor discovery protocol is used
more widely.
IPv6 Address
The most obvious difference between IPv4 and IPv6 addresses is the
length. The length of the IPv4 address is 32 bits and the length of the IPv6
address is 128 bits. The RFC 2373 not only explains the expressing modes
of the addresses, but also describes the different address types and the
structures. The IPv4 address can be divided to 2-3 different parts (network
ID, node ID, and subnet ID). The IPv6 address has larger address space
and supports more fields.
The IPv6 address has three types, including the unicast, multicast and
any-cast address. The unicast address and multicast address are similar to
the IPv4 address. IPv6 does not support the broadcast address in IPv4 any
more, but adds one any-cast address.
Address expressing mode:
The length of the IPv6 address is four times of the IPv4 address, so the
complexity of expressing IPv6 address is four times of IPv4 address. The
basic expression mode of IPv6 address is X:X:X:X:X:X:X:X. Here, X is one
4-bit hexadecimal integer (16 bits). Each number contains 4 bits, each
integer contains 4 numbers, each address includes 8 integers and there
are 128 bits totally (448 = 128). For example, the following are some
valid IPv6 addresses:
CDCD:910A:2222:5498:8475:1111:3900:2020
1030:0:0:0:C9B4:FF12:48AA:1A2B
2000:0:0:0:0:0:0:1
These integers are hexadecimal integers. A-F mean 10-15. Each integer in
the address must be expressed, but the start 0 is unnecessary to be
expressed. This is one standard IPv6 address expression mode. Besides,
there are another two common modes. Some IPv6 address may contain a
long list of 0 (just like the previous example 2 and 3). In this case, the
standard permits using space to express the long list of 0. That is to say,
the address 2000:0:0:0:0:0:0:1 can be expressed as 2000::1.

The two colons mean that the address can be expanded to one complete
128-bit address. In this method, only when 16-bit group is all 0, it can be
replaced by two colons, which can appear for only one time in the address.
In the mixed environment of IPv4 and IPv6, there may be three methods.
The lowest 32 bits in the IPv6 address can be used to express IPv4
address. The address can be expressed by one mixed mode, that is,
X:X:X:X:X:X:d.d.d.d. Here, X means one 16-bit integer, while d means
one 8-bit decimal integer. For example, the address 0:0:0:0:0:0:10.0.0.1
is one valid IPv4 address. Combine two possible expression modes and the
address can also be expressed as ::10.0.0.1.
The IPv6 address is divided to two parts (subnet prefix and interface ID),
so people hope that one IP node address can be expressed as one address
with the additional value by the mode of similar to CIDR address,
indicating how many bits in the address are the mask. The IPv6 node
address indicates the prefix length, which is separated from the IPv6
address by slash, such as 1030:0:0:0:C9B4:FF12:48AA:1A2B/60. In the
address, the prefix length used for routing is 60 bits.
IPv6 Addressing Model

Each unicast address identifies one separate network interface. The IP
address is specified to the network interface, but not node, so the node
with multiple network interfaces can have multiple IPv6 addresses. Here,
any one IPv6 address can represent the node. One network interface can
be associated with multiple unicast addresses, but one unicast address can
only be associated with one network interface. Each network interface
must have at least one unicast address. There is one important declaration
and one important exception. The declaration is related with the using of
the point-to-point link. In IPv4, all network interfaces, including the point-
to-point link connecting one node and the gateway device, need one
private IP address. Many organizations start to use the point-to-point link
to connect the branches and each link needs its own subnet, which
consumes much address space. In IPv6, if any point of the point-to-point
link does not need to accept or send data from the non-neighbor node,
they do not need special addresses. That is to say, if two nodes mainly
transmit the service flow, they do not need have the IPv6 address.
The requirement of distributing one unique unicast address for each

network interface blocks the expansion of the IPv4 address. One server
that provides the common services may break down when there are lots of
demands. Therefore, the IPv6 address model puts forward one important
exception: If the hardware can share the network load on multiple

network interfaces correctly, multiple network interfaces can share one

IPv6 address so that the server can be expanded to the server group with
load sharing, but do not need to upgrade the hardware when the demands
of the server increase.
IPv6 Address Type

The IP address has three types, including unicast, multicast and any-cast.
The broadcast address is not valid any more. RFC2373 defines three types
of IPv6 address types:
1. Unicast: The ID of one single interface. The packet sent to one unicast
address is transmitted to the interface with the address ID.
2. Any-cast: The ID of a group of interfaces (belong to different nodes).

The packet sent to one any-cast address is transmitted to one of the
interfaces with the address ID (select the nearest one according to the
calculation method of the routing protocol for the distance).
3. Multicast: The ID of one group of interfaces (belong to different nodes).

The packet sent to one multicast address is transmitted to all
interfaces with the address ID.
Unicast
The unicast address identifies one separate IPv6 interface. One node can
have multiple IPv6 network interfaces. Each interface must have one
related unicast address. The unicast address can be regarded to contain a
segment of information. The segment of information is contained in the
128-bit field. The address can define one special interface. Besides, the
data in the address can be explained as multiple small segments of
information. Anyway, when all information is placed together, one 128-bit
address that identifies one node interface is formed.
The IPv6 address can provide some information about its structure for the
node, which depends on who views the address and what to view. For
example, the node may only need to know that the whole 128-bit address
is one unique ID, but does not need to know whether the node exists in
the network. On the other hand, the gateway can use the address to
decide that one part of the address identifies one special network or one
unique node on the subnet.
For example, one IPv6 unicast address can be regarded as one entity with
two fields. One field is used to identify the network and the other is used

to identify the interface of the node on the network. The network ID can
be divided to several parts, identifying different network parts. The IPv6
unicast address function can be limited by CIDR like IPv4 address, that is
to say, divide the address on one special edge to two parts to two parts.
The high-bit part of the address contains the prefix used by routing, while
the low-bit part of the address contains the network interface ID.
The simplest method is to make the IPv6 address as one 128-bit data that
is not distinguished, but from the formatting view, it can be divided to two
segments, that is, interface ID and subnet prefix. The length of the
interface ID depends on the length of the subnet prefix. The lengths of the
interface ID and subnet prefix are variable. For the gateway device that is
near to the addressing node interface (far from the backbone network),
you can use fewer bits to identify the interface; but for the gateway device
that is near to the backbone network, just need a few address bits to
specify the subnet prefix. In this way, most part of the address is used to
identify the interface ID.
The IPv6 unicast address includes the following types:
Aggregatable global address;
Un-specified address or all-0 address;
IPv6 address with IPv4 address;
The supplier address based on the supplier and physical location;
OSI network service access point (NSAP) address;
Internet packet switch (IP X) address;
Unicast address format:
RFC 2373 changes and simplifies the IPv6 address distribution. One is to
cancel the address distribution based on the physical location and the
unicast address based on the supplier changes to the aggregatable global
unicast address. Seeing from the name change, for the address based on
supplier, permit the previous defined aggregation and the new aggregation
based on the exchange office. This reflects one more balanced address
classification. The NSAP and IPX address space is still reserved and 1/8 of
the addresses are distributed to the aggregatable addresses. Besides,
except for the multicast address and one type of reserved address, the
remaining part of the IPv6 address space is the un-distributed address,
reserving the enough space for the future development.
1. Interface ID

In the IPv6 addressing structure, any IPv6 unicast address needs one
interface ID. The interface is like the MAC address. The MAC address is
burned into the NIC by the manufacturer. The MAC address is unique
globally and it is impossible that two NICs have the same MAC address.
The address can be used to identify the interface on the network link layer.
The interface ID of the IPv6 host address is based on IEEE EUI-64 format
The format is based on the existing MAC address to create 64-bit interface
ID, which is unique globally and at the local. The appendix of RFC 2373
explains how to create the interface ID.
The 64-bit interface ID can uniquely identify each network interface, which
means that there can be 2 64 different physical interfaces in theory and
19
there are about 1.8 10 different addresses, which only uses a half of
the IPv6 address space.
2. Aggregatable global unicast address
The aggregatable global unicast address is another type of aggregation

and it is independent from ISP. The aggregatable address based on
supplier varies with the supplier, while the address based on exchange
office is located by IPv6 switching entity. The exchange office provides the
address block, while the user and supplier assign the contract for the
network access. This kind of network access is directly provided by the
supplier or exchange office, but the routing is done by exchange office. As
a result, when the user changes the supplier, do not need to re-organize
the address. Meanwhile, permit the user to use multiple ISPs to process
the single network address. The aggregatable global unicast address
includes all addresses whose start three bits are 001 (the format can be
used for the current un-distributed unicast prefix).
The aggregatable global unicast address includes the following fields:
FP field: It is the format prefix in the IPv6 address. The length is three bits,
used to identify to which kind the address belongs in the IPv6 address
space. Currently, the field is 001, indicating that it is the aggregatable
global unicast address.
The TLA ID field: The top-level aggregation ID, containing the highest
level address routing information. It is the maximum routing information
in the network interconnection. Currently, the field is 13 bits and can get
the maximum 8192 different top-level routes.
RES field: The field is 8 bits, reserved for the future use. At last, it may be
used to extend the top-level or the next-level aggregation ID field.

NLA ID field: It is the next-level aggregation ID and the length is 24 bits.

The ID is used by some organizations to control the top-level aggregation,
so as to assign the address space. That is to say, the organizations
(maybe including the large ISP and other organizations that provide the
public access) can use the 24-bit fields in fragments according to their own
addressing level structure. In this way, one entity can be divided to four
top-level routes inside the entity by two bits and the remaining 22-bit
address space is distributed to other entities (such as the small local ISP).
If the entities get the enough address space, the address space distributed
to them can be re-divided by the same method.
SLA ID field: It is the station-level aggregation ID, used by some

organizations to arrange the inner network structure. Each organization
can use the same method as IPv4 to create its own inner hierarchical
network structure. If 16-bit field is all used as the plane address space,
there can be 65 535 different subnets at most. If the front 8 bits are used
for the advanced routing of the organization, permit 255 advanced
subnets and each advanced subnet can have 255 sub subnets at most.
Interface ID field: It has 64 bits, including the 64-bit value of the IEEE
EUI-64 interface ID.
3. Special address and reserved address
The first 8-bit of all addresses in the first 1/256 IPv6 address space: 0000
0000 is reserved. The most empty address space is used as the special
address. The special addresses include:
Un-specified address: This is one all-0 address. When there is no valid

address, adopt the address. For example, when one host is enabled from
the network for the first time and does not get one IPv6 address, the
address can be used, that is, when the configuration information request is
sent out, fill the address in the source address of the IPv6 packet. The
address can be expressed as 0:0:0:0:0:0:0:0 or ::.
Loopback address: In IPv4, the return address is defined as 127.0.0.1 Any

packet that sends the return address must pass the protocol stack to each
the network interface, but is not sent to the network link. The network
interface must accept the packet, just like receiving the packet from the
outer node and returning it to the protocol stack. The return function is
used to test the software and configuration. Except for the lowest bit, the
IPv6 return address is all-0, that is, the return address can be expressed
as 0:0:0:0:0:0:0:1 or ::1.

The IPv6 address with IPv4 address: There are two kinds of addresses.
One permits the IPv6 node access, but does not support the IPv4 node of
IPv6; the other permits the IPv6 gateway device to use the tunnel mode
to transmit the IPv6 packet on the IPv4 network.
4. IPv6 address with IPv4 address
No matter whether people are willing, it is final to transit to the IPv6,

which means that IPv4 node and IPv6 node must find the coexisting
method. The most obvious difference of the two different IP versions is
address. At first, it is defined by RFC 1884 and is brought into RFC 2373.
IPv6 provides two kinds of special addresses with IPv4 address. The high
80 bits of the two kinds of addresses are all 0 and low 32 bits contain the
IPv4 address. When the middle 16 bits are set as FFFF, it indicates that
the address is the IPv6 address reflected by IPv4.
The IPv4 compatible address is used by the node to transmit the IPv6
packet via the IPv4 gateway device in the tunnel mode. The nodes
understand IPv4 and IPv6. The IPv4 reflection address is used by the IPv6
node to access the node that only supports IPv4.
5. Link local and station local address
For the organizations that are unwilling to apply for the global unique IPv4
network address, adopt the 10 model address to translate the IPv4
network address and provide one option for the organizations. The
gateway device used by the organizations should not forward the
addresses, but cannot block forwarding the addresses or distinguish the
addresses or other valid IPv4 addresses. You can configure the gateway
device to forward the addresses.
To realize the function, IPv6 extracts two different address segments from
the global unique Internet space. The link local address is used to number
the host on the single network link. The address identified by the front 10
bits of the prefix is the link local address.
The gateway devices do not process the packets with the link local address
at their source and destination ends, because they do not forward the
packets forever. The middle 54 bits of the address are set as 0. The 64-bit
interface ID also uses the IEEE structure and the part of the address space
64
permits some network to connect up to ( 2 - 1) hosts.

Multicast
Like the broadcast address, the multicast address is useful in the local
network similar to the old Ethernet. In the network, all nodes can detect
all data transmitted on the line. When each transmission starts, each node
checks the destination MAC address of the packet. If consistent with the
interface MAC address of the local node, the node accepts the packet. If it
is broadcast, the node only needs to listen, but does not need to make any
decision, so it is simple. For multicast, it is a little more complicated. The
node needs to reserve one multicast address. When it is found that the
destination address is the multicast address, you need to confirm whether
it is the multicast address reserved by the node.
The IP multicast is more complicated. One important reason is that IP

broadcast does not place the service flow on the Internet to be forwarded
to all nodes without differentiation. This is the success of IP. To receive the
IP broadcast packets, all broadcast packets are sent to the devices in the
network, which brings in lots of network cost. This is why the gateway
device should not forward the broadcast packets. However, for the
multicast, as long as the gateway device reserves the multicast address on
behalf of other node, it can forward selectively. When the node reserves
the multicast address, it declares becoming one member of multicast. And
then any local gateway device reserves the multicast address on behalf of
the node. When other nodes on the same network send information to the
multicast address, the IP multicast packet is encapsulated in the link-layer
multicast data transmission unit. On Ethernet, the encapsulated unit
points to the Ethernet multicast address. On other networks that use the
point-to-point circuit to transmit (such as ATM), send the packet to the
subscriber via other mechanism. Usually send the packet to each
subscriber via some type of server. The multicast that is not from the local
network is processed via the same method, just being transmitted to the
gateway device, which forwards the packet to the subscribing node.
1. Multicast address format
The format of the IPv6 multicast address is different from that of the IPv6
unicast address. The multicast address can only serve as the destination
address and no packet takes the multicast address as the source address.
The first byte in the address format is all 1, indicating that it is the
multicast address. The other part of the multicast address except for the
first byte includes the following three fields:
Flag field: It comprises four single bit flags. Currently, only the fourth bit
is specified and the bit is used to indicate that the address is the familiar
multicast address specified by the Internet coding organization or the
temporary multicast address used by special occasion If the flag bit is 0, it

indicates that the address is the familiar address; if the flag bit is 1, it
indicates that the address is the temporary address. The other three flag
bits are reserved for future use.
Range field: The length is 4 bits, indicating the multicast range, that is,
the multicast group includes only the nodes in one local network, one
station and one organization, or still includes the nodes at any location of
the IPv6 global address space. The possible values of the four bits are:
Group ID field: The length is 112 bits, used to identify the multicast group.
One multicast ID can show different groups according to the multicast
address is temporary or familiar and the address range. The permanent
multicast address uses the specified group ID with the special meaning,
The members in the group relies on the group ID and the range.
All IPv6 multicast addresses begin with FF. The first 8 bits of the address
are all 1. Currently, the remaining bits of the flag are not defined, so if the
third hexadecimal number of the address is 0, it indicates the familiar
address; if the third hexadecimal number of the address is 1, it indicates
the temporary address. The fourth hexadecimal number means the range,
which can be un-distributed value or reserved value.
2. Multicast group
IPv4 already has the multicast application, because the application sends
the same data to multiple nodes. Use the distributed multicast addresses
and multicast ranges to combine, showing various meanings and being
used on other applications. Some previous registered multicast address
includes the gateway devices in groups, DHCP service, audio and video
service, and the network game service. For details, refer to RFC 2375.
Any-cast
The multicast address can be shared by multiple nodes on some meaning.
All nodes of the multicast address member hope to receive all packets sent
to the address. One gateway device connected to five different local
Ethernet networks forwards the copy of one multicast packet to each
network (suppose that at least one on each network reserves the multicast
address). The any-cast is similar to the multicast address. Multicast nodes
share one any-cast address. The difference is that only one node hopes to
receive the packets to the any-cast address. Any-cast is useful for
providing some type of services, especially for some services that do not
need to have specified relation between the client and server, such as the
domain name server and the time server. The name server is just one
name server and it should work the same regardless of the distance.
Similarly, one near time server is more advisable. Therefore, when one
host sends out request to the any-cast address to get information, the
nearest server to the any-cast address should respond.

1. Distribution and format of any-cast address
The any-cast address is distributed to the outside of the normal IPv6

unicast address space. The any-cast address can be distinguished from the
unicast address in the format, so each member of one any-cast address
must be configured explicitly, so as to identify the any-cast address.
2. Any-cast routing
To get to know how to confirm the route for one unicast packet, you must
extract the lowest public routing naming character from a group of hosts
of one specified unicast address, that is, they are sure to have some public
network address number and the prefix defines the area of all any-cast
nodes. For example, one ISP can require each of its users to provide one
time server and the time servers share one any-cast address. The prefix
defining the any-cast area is distributed to ISP for re-distribution. The
routing in the area is defined by the distribution of the hosts that share
the any-cast address. In the area, one any-cast address is sure to carry
one routing option. The routing option includes some pointer, pointing to
the network interface of all nodes that share the any-cast address. In
previous case, the area is limited in the limited range. The any-cast hosts
may disperse on the global Internet. In this case, the any-cast address
must be added to all route tables in the world.
IPv6 Extension Header

Extension Header
It is the simplified IPv6 header. It is adopted by the most network service
flows that work in the non-option mode. Meanwhile, it improves the
processing capability of the network for the packets that need the option.
The new IPv6 extension header includes:
Hop-by-Hop Options Header: The extension header must follow the IPv6
header. It contains the optional data that each node on the path passed by
the packet must check. Up to now, only one option is defined, that is,
jumbo payload option. The option indicates that the payload length of the
packet exceeds 16-bit payload length field of IPv6. As long as the payload
(including the hop-by-hop options header) of the packet exceeds 65535
bytes, the packet must contain the option. If the node cannot forward the
packet, it must return one ICMPv6 error packet.

Routing header: The extension header indicates the special nodes passed
by the packet to the destination. It contains the address list of the nodes
passed by the packet. The original destination address of the IPv6 header
is not the final destination address of the packet, but it is the first address
listed in the routing header. After the node of the address receives the
packet, process the IPv6 header and routing header, and then send the
packet to the second address in the routing header list until the packet
reaches the final destination.
Fragmentation header: The extension header contains one fragment offset

value, one more fragments flag and one ID field, used by the source
node to fragment the packet whose length exceeds the path MTU between
the source and destination.
Destination Options Header: The extension header contains the options

that can only be processed by the final destination node. Currently, only
the fill option is defined. The header is filled as 64-bit boundary for future
use.
Authentication Header (AH): The extension header provides one

mechanism of performing the encrypted authentication and calculation for
some parts of the IPv6 header, extension header and payload.
Encapsulation Security Payload (ESP) header: This is the final extension

header, which is not encrypted. It indicates that the remaining payload is
encrypted and provides the enough de-encryption information for the
authorized destination node.
Usage of Extension Header

Incorporating the IPv4 options to the standard Ipv4 header is complicated.
The shortest IPv4 header is 20 bytes and the longest is 60 bytes. The
additional data contains Ipv4 option and must be translated by the
gateway device to process the IP packet. The method has two influences.
One is that the gateway device performs the flow processing for the
packets of the additional options, which results in the reducing of the
processing efficiency; the other is that because the options result in the
reducing of the performance, the application developer are inclined to not
use the option.
Using the IPv6 extension header can realize the option on the premise of
not affecting the performance. The developer can use the option if

necessary, but does not need to care that the gateway device treats the
packets with extension options distinctively unless the routing extension
header or hop-by-hop option is set. Even the two options are set, the
gateway device still can perform the necessary processing, easier than
using the IPv4 option.
Extension Header ID
All IPv6 headers are the same long and look nearly the same The unique
difference is the next header field. In the IPv6 packet without extension
header, the value of the field means the upper protocol. That is to say, if
there is the TCP field in the IP packet, the 8-bit binary value of the next
header field is 6 (from RFC 1700); if there is UDP packet in the IP packet,
the value is 17. The next header field value indicates that whether there is
the next extension header and what is the next extension header.
Therefore, the IPv6 headers can be linked, beginning from the basic IPv6
header to link the extension headers one by one.
Extension Header Order

One IPv6 packet can have multiple extension headers, but only one case
permits the one type of extension headers appears in one packet for many
times and the extension headers have one preferred order when being
connected. RFC 1883 defines that the extension headers should comply
with the following order:
1. IPv6 header
2. Hop-by-hop options header
3. Destination option header (applied in the first destination of the IPv6

destination address field and the additional destination listed in the
routing header)
4. Routing header
5. Fragmentation header:
6. Authentication Header (AH)
7. ESP header
8. Destination options header (when the routing header is used, it is only

applied in the final destination of the packet)
9. Upper header
From the previous order, we can see that only the destination extension
header can appear for many times in one IP packet when the packet
contains the routing extension header. The previous order is not absolute.

For example, when the remaining part of the packet needs to be encrypted,
the ESP header must be the last extension header. Similarly, the hop-by-
hop option has higher priority than all other extension headers, because
each node that receives the IPv6 packet must process the option.
Set up new options:
The extension header must be confirmed via the next header field of the
IPv6 header, which means that the field is 8 bits and there can be only
256 different values at most. Even the number of the possible values of
the field is reduced, all possible values of the upper header also must be
supported. That is, the value identifies not only the extension header, but
also all other protocols encapsulated in the IP packet. Therefore, many
values are assigned and the un-assigned values are limited.
Some protocol IDs of the extension header in IPv6 is from IPv4, such as
ID authentication header and ESP header. Up to now, many extension
headers are assigned, but it is also permitted to set up new options via the
hop-to-hop options extension header and destination option extension
header. Besides saving protocol values for the next header field, it is easy
to realize new options by using the option header extension. If using one
new header type to send IP packet and the destination node supports the
new header type, everything goes well. Contrarily, if the new header type
is unknown for the destination node, the destination node has to drop the
packet. On the other hand, all IPv6 nodes must support hop-by-hop
options extension header, destination option extension header and some
basic options (refer to the next section). Here, if the destination node
receives the packet with the destination option extension header, even
does not support the option in the extension header, it still can respond
The option also can request the destination node to return one ICMP error
packet, indicating that the destination node does not understand the
option.
Option extension header
The hop-by-hop extension header and the destination option extension

header can contain specified options. RFC 1883 defines two filling options,
used to ensure that the extension header field complies with the boundary
requirement. That is, if the option uses three 8-bit field followed by one
32-bit field, fill in additional 8 bits to ensure that the 32-bit field in the
option is not taken apart when exceeding one 32-bit field boundary. If no
need to fill in, just define one function option, that is, the jumbo payload
option used in the hop-by-hop options header.
All option extension headers (hop-by-hop options extension header and

destination option extension header) have similar frame format The
extension headers only have two pre-defined fields, that is, the next

header field and header extension length field All IPv6 headers contain the
next header field. The header extension length field occupies 8 bits,
indicating the length of the option header. The length takes 8 bytes as the
unit, excluding the first 8 bytes of the extension header, that is, if the
option extension header only has 8 bytes, the field value is 0. The filed
limits the extension header to 2048 bytes at most. The remaining part of
the extension header is the options contained by the extension header.
Options
The IPv6 option contains the following three fields:
Option type: The field is the 8-bit ID, indicating the type of the option.
Even the destination node cannot identify the option, the front 3-bit code
can also translate the option type.
Option data length: The field is 8-bit integer, indicating the length of the
option data field. The maximum value of the field is 255.
Option data: The field contains the specified data of the option and the
maximum length is 255 bytes. The front two bits of the option type field
indicates that the destination node should take actions when the specified
options cannot identified. There are the following four option types:
00: Ignore the option and complete the processing for the remaining part
of the extension header;
01: Drop the whole packet;
10: Drop the packet; no matter whether the destination address of the
packet is multicast address, send one ICMP packet to the source address
of the packet;
11: Drop the packet; if the destination address of the packet is unicast
address or any on-demand address (that is non-multicast address), send
one ICMP packet to the source address of the packet.
The third bit of the option type indicates whether the value of the option
data can change when the packet is transmitted from the source address
to the destination address. If it is 0, the option data cannot change; if it is
1, the option data is variable. Both hop-by-hop options extension header
and destination option extension header contain the same options, that is,
two filling options (filling option 1 and filling option N). The filling option 1
is special; it has only 8 bits, which are all set as 0; there is no option data
length or other option data.

The filling option N is identified by one of the previous four option types. It
uses multiple bytes to fill in the extension header. If the extension header
needs N bytes to fill in, the value of the option data length field is N-2,
that is, the option data field occupies N-2 bytes, which are all set as 0.
Plus the one byte of option type field and one byte of the option data
length field, totally N bytes are filled.
Hop-by-hop Extension Header

Each node on the route from the source node to the destination node (that
is, each gateway device that forwards the packet) checks the information
in the option hop by hop. Up to now, only one hop-by-hop option (that is
jumbo payload option) is defined.
The same as other option extension headers, the front two fields indicate
the length of the next header protocol and extension header (here,
because the whole option has only 8 bits, the field value of the extension
header length is 0). The jumbo payload option starts from the third byte of
the extension header. The third byte is the extension header type and the
value is 194. The fourth byte (that is the value of the jumbo payload
option data length) is 4. The last field of the option is the jumbo payload
length, indicating the actual bytes contained in the IP packets (including
the hop-by-hop option extension header, but excluding the IPv6 header).
The node can use the jumbo payload option to send the jumbo IP packet
only when each gateway device on the way can process. Therefore, the
option is used in the hop-by-hop extension header and it is required that
each gateway device on the way must check the information. The jumbo
payload option permits the IPv6 packet payload length to exceed 655535
bytes , exceeding the 4 billion bytes. If the option is used, it is required
that the 16-bit payload length field value of the IPv6 header must be 0
and the jumbo payload length field value in the extension header is no less
than 65535. If the two conditions are not satisfied, the node that receives
the packet should send the ICMP error packet to the source node,
informing the problem. Besides, there is another limitation: If there is
fragmentation extension header in the packet, the jumbo payload option
cannot be used at the same time, because the packet cannot be
fragmented when the jumbo payload option is used.
Routing Extension Header

The routing header replaces the source routing realized in IPv4. The
source routing permits the user to specify the path of the packet, that is,
the gateway devices on the way to the destination. In the IPv4 source
routing, use the IPv4 option and there is some limitation for the number of
the medium gateway devices specified by the user. The IPv4 header with
extension has 40 additional bytes and up to 10 32-bit addresses can be

filled. Besides, each gateway device on the path must process the whole
address list no matter whether the gateway device is in the list, so the
processing for the source route packet is slow. IPv6 defines one common
routing extension header, which has two fields, that is, routing type field
and remaining fragment field. The two fields occupy one byte respectively.
The routing type field indicates the type of the used routing header, while
the remaining fragment field indicates the number of the additional
gateway devices listed by the remaining part of the extension header. The
gateway devices must be passed by the packet to the destination. The
remaining part of the extension header is the specified data of the type,
which is related with the routing header type. RFC 1883 defines one type,
that is, type 0 routing header.
Type 0 routing extension header solves the main problem of the IPv4
source routing. Only the gateway devices in the list process the routing
header, and the other gateway devices do not need to process. Up to 256
gateway devices can be specified in the list. The operation process for the
routing header is as follows:
The source node constructs the list of the gateway devices that must be
passed by the packet and construct type 0 routing header. The header
contains the list of the gateway devices, the final destination node address
and the remaining fragments. The remaining fragments (8-bit integer)
indicates the number of the gateway devices that must be passed by the
packet to the destination node.
When the source node sends the packet, set the destination address of the
IPv6 header as the address of the first gateway device in the routing
header list.
The packet is forwarded till reaching the first station of the path, that is,
the destination address of the IPv6 header (the first gateway device in the
routing header list). Only the gateway device checks the routing header
and the medium gateway devices on the path ignore the routing header.
At the first station and all later stations, the gateway device checks the
routing header to ensure that the remaining fragments are consistent with
the address list. If the value of the remaining fragments is 0, it indicates
that the gateway device node is the final destination of the packet and the
node continues to process the other part of the packet.
Suppose that the node is not the final destination of the packet. The node
gets its own address out from the destination address field of the IPv6

header and replaces it with the address of the next node in the routing
header list. Meanwhile, the node reduces the value of the remaining
fragment field by 1 and then sends the packet to the next station. The
other nodes in the list repeat the process until the packet reaches the final
destination.
Fragment Extension Header

IPv6 only permits the source node to fragment the packet, which simplifies
the medium processing for the packet. However, in IPv4, the medium
node can fragment the packet that exceeds the length permitted by the
local link. The processing mode requires that the gateway device must
complete the additional work and the packet may be fragmented for many
times during transmission. When the packet to be sent by one node is too
large for one single data transmission unit of the local link, the packet
needs to be fragmented. For example, MTU that Ethernet permits to
transmit is 1500 bytes; to send one 4000-byte IP packet, if the packet is
not fragmented to three parts and each part is smaller than 1500 bytes,
the packet cannot be transmitted in the Ethernet link. Later, some links
may have smaller MTU, such as 576 bytes and the gateway device on such
kind of link must re-fragment the fragmented 1500-byte IP packet to
smaller fragment.
Because of the fragmenting in IPv4, the medium node and destination

node must add the necessary cost for processing the fragmentation. With
the path MTU discovery mechanism, the source node can confirm the
maximum length of the packet that can be transmitted in the link between
the source node and the destination node, so as to avoid the fragment
processing of the medium gateway device. RFC 1883 sets the minimum
MTU as 576 bytes, but in the document that is to replace RFC 1883, the
required minimum MTU is 1280 bytes and it is suggested that the link is
configured to transmit 1500-byte packet at least
The previous description shows that the source node can transmit up to
1280-byte packet without considering the packet fragmentation. Maybe
the 1500-byte packet is not fragmented, but the IPv6 suggests that all
nodes execute the path MTU discovery mechanism and only permit the
fragmentation of the source node. That is to say, before sending any
packet, check the path from the source node to the destination node and
calculate the sent maximum packet without fragmentation. To send the
packet whose length exceeds the maximum value, the source node must
fragment the packet. In IPv6, the fragmentation only happens to the
source node and use the fragmentation header to express.
Next header field: The 8-bit field is common for all IPv6 headers.

Reserved: Currently, the 8-bit field is not used and is set as 0.
Fragment offset field: It is similar to the IPv4 fragment offset field. The
filed has 13 bits and takes the 8 bytes as unit, indicating the location
relation between the first byte of the data in the packet (fragment) and
the first byte of the fragmentable data in the original packet. That is to say,
if the value is 175, it indicates that the data in the fragment starts from
the 1400th byte of the original packet.
Reserved field: Currently, the 2-bit field is not used and is set as 0.
M flag: It indicates whether there is follow-up field. If it is 1, it indicates

that there is follow-up field; if it is 0, it indicates that this is the last
fragment.
ID field: The field is similar to the IPv4 ID field, but it is 32 bits, while in
IPv4, it is 16 bits. The source node distributes one 32-bit ID for each
fragmented IPv6 packet, used to identify the packet that is sent from the
source address to the destination address recently (in the life time of the
packet). Only part of the IPv6 packet can be fragmented. The
fragmentable part includes payload and extension header that can be
processed only when reaching the final destination. For the IPv6 header
and the extension header that must be processed by the gateway device
when sending to the destination node, such as the routing header or hop-
by-hop options header, do not permit fragmenting.
Destination Extension Header

Similar to the hop-by-hop options header, the destination options header
provides one mechanism of delivering the optional information with the
IPv6 packet. The remaining extension header options, such as
fragmentation header, ID authentication header and ESP header, are
defined because of some specified reason, but the destination options
extension header is the new option defined for the destination node. The
destination option uses the previous described format of constructing the
option.

GRE Technology
This chapter describes the principle and implementation of GRE protocol.
Main contents:
Terms
Typical Application
Terms
VPN: Virtual Private Network Through VPN technology, two or multiple
network sites can be connected through the Internet. In the VPN, the
running mode is like that all sites are in a single private network
GRE: Generic Routing Encapsulation
Tunnel: Through a tunnel, a kind of protocol packets is encapsulated into

another type of protocol. As a result, the protocol packets can pass
through another protocol network.

Main contents:
The location of GRE in the TCP/IP protocol stack
Structure of the GRE packet
Work flow of the GRE
Advantage and disadvantage of GRE
The GRE technology is used to create a tunnel between the source end
and the destination end. The packets that will pass the tunnel are
encapsulated with a new packet header (GRE packet header). Then, the
packets with tunnel destination address are put into the tunnel. When the
packets reach the destination of the tunnel, the GRE header is stripped.

Then, use the destination address of the original packets to perform

addressing operation. The GRE tunnel is usually point-to-point. The GRE
also provides the capability of sorting packets. The GRE tunnel may cause
performance problem for extra encapsulation/de-capsulation process is
required.
Location of GRE in the TCP/IP Protocol

Stack
The GRE packets are transmitted after being added with IP header.
Therefore, the GRE is over the IP layer. The protocol ID in the IP header is
47.
Structure of the GRE Packet

The packets passing the GRE tunnel are composed of three parts.
Payload packet: the network layer packets (such as IP packets) before

entering the tunnel, serves as the payload of the tunnel packets. The
packet protocol is called GRE tunnel passenger protocol.
GRE header: it is added after the payload packet enters the tunnel;
includes the GRE protocol and passenger protocol-related information.
Delivery header: encapsulated packet header (such as IP header) of

external protocol, namely, the protocol header of the tunnel-resident
network. It is a tool for a protocol packet to pass through another protocol
network.
The structure of header is as follows:

A simplest GRE header contains four bytes, namely, when the C, K, and S
flag bit are 0, the GRE header only contains the information of bit 0 to 31.
Checksum flag bit
Bit 0 is the flag bit of the checksum. Only when the flag bit of checksum is
set to 1, the checksum field is valid.
Key flag bit
Bit 2 is the key flag bit. Only when the key flag bit is set to 1, the key field
is valid.
Sequence number flag bit
Bit 3 is the sequence number flag bit. Only when it is set to 1, the
sequence number filed is valid.
Reserved 0 and Ver field
Not used, they must be cleared.
Protocol type filed
The protocol type field marks the type value of the payload packet.
Generally, the values of the protocol field and the Ethernet frame type are
the same. For example, the protocol type of IP packets equal 0800.
Checksum field
The checksum field carries the checksum of the GRE headers. The
checksum must cover the GRE headers and the payload packets.
Key field
The key field carries the keys of the tunnel. The same key must be
configured at two ends of the tunnel (or do not configure keys at two ends)
for a connected tunnel.
Sequence field
The sequence field carries the sequence number of the packets. If the
sequence flag bit is set, the packets passing the tunnel will carry sequence
numbers. The sequence number starts from 0. 1 is added when one
packet is sent. After the opposite end receives the packet, it will record the
sequence number of the received packet. If invalid packet is received, the
opposite end will discard the packets.
Whether the checksum, sequence, and key fields should be enabled is

controlled by the tunnel checksum, tunnel sequence-datagrams, and
tunnel key commands.
An example is given to describe the structure of the GRE packets:

The shadowed part is the new IP header; the part in the pane is the GRE
header; the rest is the real IP packet, serving as the data.
45 00 05 f4 8f e3 00 00 7f 2f fd 85 c0 a8 01 02 c0 a8 01 01 00 00 08 00
45 00 05 dc 72 3f
05f4 indicates the total length (1524) of the new IP packets.
2f indicates the type of the protocol contained in the IP packet: GRE (47).
c0a80102 c0a80101 indicates the source address and destination address

(source and destination address of the tunnel) of new IP packets.
0000 0800 indicates the GRE header: all the flag bits are 0, which
indicates that the GRE packet does not contain checksum, key, and
sequence number; the passenger protocol is IP.
Work Flow of the GRE

The packets of the GRE tunnel are encapsulated at the source end of the
tunnel and de-capsulated at the destination end of the tunnel. The
forwarding between the source and destination is regarded as common
packets.
Packet receiving: If the destination of the packets is the router, send the
packets to the upper-level protocol for processing; if the protocol is GRE
(47), search the corresponding tunnel interface. Then, process the GRE
headers. Perform a series of test and then strip the external IP headers.
Modify the recvif field of the mbuf to the local tunnel interface. At last,
send the packets to the IP input queue.
Packet sending: If the packets are sent to the tunnel interface, add GRE
headers according to the interface configuration. Add the IP headers of the
source address and destination address specified by the tunnel; route
according to the destination address of the tunnel to send the packets to
the actual physical interface.

Take the preceding figure as an example to describe the work principle of

GRE.
Create a GRE tunnel (Tunnel1) between switch2 and switch4. The

trackbacks at the two ends of Tunnel1 are respectively 12.1.1.1 and
21.1.1.1. Configure static route in switch2. The 31.0.0.0 network is
reachable through tunnel1.
Send a packet from switch1 to destination address 31.1.1.1. Route the

packets from port 11.1.1.1. In this case, the source address and the
destination address of the IP packets are 11.1.1.1 and 31.1.1.1.
After the packets reach switch2, the switch2 routes the packets. Owing to
the existence of static route, switch 2 is determined to forward packets
from the tunnel. The packets are encapsulated.
Encapsula tion
In this case, the packets to be forwarded are the payload packets (IP
packets in this case). The tunnel adds a GRE header to the header. The
protocol type of the GRE header is set to 0800 (IP protocol type). Then,
add an IP header (delivery header) to the GRE header. The protocol value
of the IP header is set to 47 (GRE protocol ID). The destination address of
the IP header is set to the destination 21.1.1.1 of Tunnel1. Set the source

address of the IP header to 12.1.1.1. Then, perform routing according to

21.1.1.1. As a result, the packets are sent from interface 12.1.1.1.
After the encapsulation is complete, the packets are sent from interface
12.1.1.1.
For warding
After switch3 receives packets, it sends the packets to the IP layer for
routing. In this case, the IP header analyzed by switch3 is Delivery header
(the payload packet is encapsulated and switch3 cannot reach the IP
header of payload packet). Therefore, perform route forwarding according
to the destination address 21.1.1.1 of the delivery header.
The process lasts until the packets reach the destination switch4 of the
tunnel.
D e- capsula tion
After switch4 receives the packets, it also analyzes the Delivery header. If
the destination address 21.1.1.1 is its own address, it checks the protocol
field of the IP packets. Since the protocol field is 47, the IP packets should
be processed by the GRE tunnel. The tunnel first removes the Delivery
header, and then checks the protocol type of the GRE header. Protocol
type is 0800, therefore, the tunnel sends the payload packet to the IP
layer for processing to implement de-capsulation.
Switch4 performs routes according to the destination address 31.1.1.1 of

the payload packets. The packets are sent through the interface 21.1.1.1
and reach the actual destination switch5.
Advantage and Disadvantage of GRE

The configuration of the GRE tunnel is simple. The tunnel can be created in
multiple physical lines (PPP, and Frame Relay). It isolates the host network
environment and the VPN route environment.
The disadvantage of the GRE is the high management cost and the scale
of the tunnel is large. The GRE is manually configured. Therefore, the cost
for configuring and maintaining tunnels is relevant with the number of the
tunnels. When the terminal of the tunnel changes, the tunnel should be re-
configured.

Typical Application
The GRE tunnel technology can meet the requirements of Extranet VPN
and Intranet VPN.

Transition Technology
Main contents:
Introduction to the transition technology
Tunnel technology
Introduction to the Transition

Technology
With the rapid development of the Internet, the existing IPv4 addresses
are in short supply. The technology of using temporary IPv4 address or
Network Address Translation (NAT) relives the condition of lacking IPv4
addresses. At the same time, the technology increases the overhead of
address resolution and processing, which causes the failure of high-layer
applications. But, the problem that the IPv4 addresses will be used up is
not solved. The IPv6 protocol adopting 128-bit IP address solves the
problem of insufficient IP v4 addresses. At the same time, the address
capacity, security, network management, mobility, and QoS are
significantly improved. IPv6 is the core standard of the next generation
internet protocol. IPv6 is not compatible with IPv4. But it is compatible
with all other protocols in the TCP/IP protocol suite, namely, IPv6 can
completely replace IPv4.
The conversion from IPv4 network to the IPv6 network cannot be

completed immediately. It is inevitable that two types of networks co-exist
in certain time. Therefore, at the designing phase of the IPv6 protocol, the
transition and effective seamless interconnection of IPv4/IPv6 are taken
into consideration. Multiple transition technologies and interconnection
solution have emerged. Different technology has different features to solve
the communication problems in different transition periods and
environments. In these technologies, the basic technologies for solving the
transition problem include: dual protocol stack, tunnel, and NAT-PT.

Tunnel Technology
The tunnel technology provides a method using the existing IPv4 route
architecture to transfer the IPv6 data: regard the IPv6 packets as
structureless and meaningless data, encapsulate into IPv4 packets and to
transfer through IPv4 network. According to the creation mode, the tunnel
technology includes manually configured tunnel and automatically
configured tunnel. The tunnel technology uses the existing IPv4 network.
It provides a communication method between IPv6 nodes during the
transition, but it cannot solve the interconnection problem between IPv6
node and IPv4 node.
In the tunnel, the following functions are widely used: manually configured
tunnel, automatically configured tunnel, 6to4, 6over4, and ISATAP.
1. Manually configured tunnel
The tunnel is manually configured. The terminal address of the tunnel is

determined by the configuration. You do not need to assign special IPv6
address for nodes. This is applicable to the IPv6 nodes frequently
communicated. The encapsulation nodes of each tunnel must save the
address of the tunnel terminal. When the IPv6 packets are transmitted
over the tunnel, the terminal address will be encapsulated as the
destination address of IPv4 packets. The encapsulation node determines
whether forwarding the packets through the tunnel according to the route
information. The interconnected nodes adopting the manually-configured
tunnel mode must have available IPv4 connection, and must have as lease
one unique IPv4 address. Each node should support IPv6 and the router
should the dual-protocol stack. If the tunnel passes NAT facilities, the
mechanism fails.
Typical application of the manually configured tunnel:
The manually configured tunnel is applicable to the network with small

topology change. It configures the transition from IPv4 to IPv6. For the

detailed configuration of the manually configured tunnel, refer to the

Configuration of Transition Technology.
2. 6to4 tunnel
The 6to4 requires adopting the special IPv6 address (IPv4ADDR::/48)

derived from the IPv4 address of automatic sub-node. Therefore, the node
adopting the 6to4 mechanism must have at least one unique IPv4 address.
The IPv4 address of the tunnel terminal can be retrieved from the IPv6
address. Therefore, the tunnel is automatically created. The mechanism is
applicable to the interconnection of the nodes running IPv6. The 6to4
mechanism requires that the router in the tunnel terminal should support
dual-protocol stack and 6to4. In addition, the host must support IPv6
protocol stack. In the 6to4 mechanism, between the IPv6 node adopting
6to4 and the pure IPv6 node, run BGP4+ relay router (6to4 relay router)
to intercommunicate. This mechanism regards the WAN IPv4 network as a
unicast point-to-point link layer. It is applicable to the preliminary stage of
the co-existent IPv4/IPv6 to serve as the transition tool.
The typical application of 6to4 is illustrated as follows:
For the configuration of 6to4 tunnel, refer to the Configuration Manual of

Transition Technology.

SLA Technology
This chapter describes the SLA theory and how to realize it.
Main contents:
SLA terms
Introduction to SLA
Debug commands and debug information
Introduction to SLA
SLA Terms
SLA: Service Level Agreements; sending the packets of the specified
protocol to detect and monitor the network communication;
RTR: Response Time Reporter; SLA calculates and outputs the report
according to the packet transmission, so it is also called RTR (Response
Time Reporter);
RTR ENTITY: RTR entity is one common concept; different application

detection corresponds with the specified RTR entity. Currently, the RTR
entities include MACSLA, ICMPECHO, JITTER, UDPECHO, ICMP-PATH-ECHO,
ICMP-PATH-JITTER, and FLOW-STATISTICS.
ICMPECHO: It is the RTR entity that sends the ICMP PING packet to
detect the network communication. With the detection, output the packet
round delay, packet loss and so on.
JITTER: It is the RTR entity that simulates the VoIP coder/decoder to

send the analog VoIP packets regularly, so as to detect the quality of the
network transmitting the VoIP packets; with the detection, output the
round delay, uni-directional delay, jitter, MOS value of the packet and so
on.

ICMP-PATH-ECHO: It is the RTR entity that sends the ICMP PING

packets regularly to detect the network communication. With the detection,
output the round delay and packet loss of the packet from the source to
the destination.
ICMP-PATH-JITTER: It is the RTR entity that sends the ICMP PING

packet regularly to detect the network communication. With the detection,
output the round delay, packet loss and jitter of the packet from the
source to the destination.
FLOW-STATISTICS: It is the RTR entity that detects the traffic of one

interface regularly. With the detection, record the peak value of the
interface traffic and detect the history.
UDPECHO: It is the RTR entity that sends the UDP packets regularly to
detect the communication of the UDP packet in the network. With the
detection, output the round delay and packet loss (not the connection
packet, but the data detection packet) of the packet.
RTR GROUP: The RTR group is the set of one or multiple RTR entities.
The RTR group comprises the single RTR entity and the group cannot
become the member of the group. One RTR entity can belong to multiple
RTR groups, but one RTR entity can only belong to one group for one time.
RTR SCHEDULE: It schedules one RTR entity or RTR group to detect the
network communication.
VOIP JITTER: It is used to indicate the change of the transmission delay

of the VoIP packet.
CODEC: It is used for the coding and decoding of the VoIP signals.
MOS: It is used to indicate the index of the transmission quality of the

VoIP packets.
ICPIF (Impairment Calculated Planning Impairment Factor): It indicates

the loss of the VoIP packets during transmission.
PCM: Pulse Code Modulation.
Introduction to SLA
There are many factors that affect the normal running of the network,
such as the complexity of the network environment, the configuration
mistake of the administrator, the failure of the network device and even
irresistible factors. Therefore, Detecting and recording the detection result
regularly for the network communication in the networking and network
running is important for solving the problems when the network fails. As
for this, SLA is developed, a the network detection and monitoring tool.
The basic theory is to use the different kinds of RTR entities to represent
different kinds of network detections and initiate the schedule for the

entities to reach the detection purpose. Meanwhile, with the rich schedule
policies, SLA can track and monitor the network communication in detail.
RTR Entity
RTR entity is one common concept, not related with the specified type of
RTR entity. Currently, the RTR entity types of the system include the
MACSLA entity used to detect the L2 connectivity, the ICMPECHO entity,
the ICMP-PATH-ECHO entity, the ICMP-PATH-JITTER entity, and the
UDPECHO entity used to detect the network communication, the JITTER
entity used to detect the transmission of the VoIP packets in the network,
and the FLOW-STATISTICS entity used to detect the interface traffic.
The detected history record mode is saved at the local, which is

convenient for the network administrator to view information and fix faults.
I C M PEC H O Ent it y
The ICMPECHO entity is used to detect the basic communication of the
network. It sends the ICMP PING packets to one destination address in the
network, so as to detect the transmission delay and packet loss of the
packet from the source to the destination.
The common network devices all support PING, so the entity can take
effect in detecting the basic communication of the network. With the rich
schedule policies and log recording function, the network administrator can
get to know the network communication and history information, as well
as reducing the work of inputting the common PING commands.
I C M P- PATH- EC H O En tit y
ICMP-PATH-ECHO entity is used to detect the basic communication of the
network. It sends the ICMP PING packets to one destination address in the
network regularly, so as to get the packet transmission delay and packet
loss from the detection end to the destination end, and get the delay and
packet loss between the detection end and the medium devices from the
detection end to the destination end.
schedule policies and history recording function, the network administrator

can get to know the network communication (for example, which network
device has serious delay on the path) and history information.
I C M P- PATH-JITTER Entit y
The ICMP-PATH-JITTER entity is used to detect the basic communication of
the network. It sends the ICMP PING packets to one destination address in
the network regularly, so as to get the packet transmission delay, jitter
and packet loss from the detection end to the destination end, and the get
the delay, jitter, and packet loss between the detection end and the
medium devices from the detection end to the destination end.
schedule policies and history recording function, the network administrator
can get to know the network communication (for example, which network
device has serious delay on the path) and history information.
JI TTER Enti t y
Introduction to VoIP and the related communication detection standards
VoIP is short for Voice over IP. It mainly converts the voice or fax to data
and then share one IP network (Internet) with the data for transmission.
The cost for transmitting the voice and fax on Internet is low, so the
technology is widely applied. The voice is transmitted on the IP network.
Compared with the traditional telephone, it adopts the voice coding mode
to digitalize the analog voice, pack it, and then adopt the Best-Effort IP
transmission mechanism to transmit it to the receiving end via the IP
network. After collecting the packets, the receiving end decodes the voice
to get the analog voice. From the transmission of the voice on the IP
network, we can see that the packet delay and packet loss caused by the
network transmission quality, the cost for the converting between the
analog voice and the data caused by the codec, the
compression/decompression cost, echo cost, process delay and so on
become the factors that affect the Internet VoIP transmission quality. This
shows that the transmission of the voice on the IP network needs to
consider many factors that are different from the traditional telephone
network and traditional data network and the factors limits the VoIP
quality.
Therefore, the related standards are needed to measure the VoIP

transmission quality. The VoIP quality is apperceived by the receiver, so
ITU-TP.800 defines the subjective measuring method for VoIP quality MOS
(Mean Opinion Score). Based on the subjective evaluation, the actions of
listening and apperceiving the VoIP quality are searched and quantized.
Which level of VoIP quality gets how much MOS depends on the reflection

of the human. The corresponding relation of the VoIP quality and MOS is
that the network configuration, standard and monitoring provide the
foundation.
MOS is divided to five levels (1-5) according to VoIP transmission quality.

Level 5 indicates the best VoIP quality and level 1 indicates the poorest. In
this way, the VoIP quality standards are quantized. Usually, the MOS of
more than 3.6 is regarded as good VoIP quality. It is hard to apply the
MOS scoring method in practice (because it is hard to get many persons
together to evaluate the VoIP quality), so many other methods are
generated. However, any measuring method needs to be converted to
MOS to measure the VoIP quality at last.
Another well-known standard is called ICPIF (Calculated Planning

Impairment Factor). ICPIF is to quantize the main impairment of the VoIP
quality. The ICPIF value is the sum of the impairment factors (total
impairment or Itot) minus the expected impairment factor of the user
(also called access advantage factor, indicating the degradation of the
tolerable VoIP quality because of the network access). The formula is:
Icpif = Io + Iq + Idte + Idd + Ie A
Note
Here, Io indicates the impairment caused by non-optimal loudness rating;

Iq indicates the impairment caused by PCM quantization distortion; Idte
indicates the impairment caused by the telephone echo; Idd indicates the
impairment caused by the uni-directional transmission time (uni-
directional); Ie indicates the impairment caused by the device factor, such
as codec type and packet loss. A indicates the access advantage factor,
also called user Expectation Factor.
The value range of ICPIF is 5-55. If the ICPIF value is small than or equal
to 5, it is called low impairment and the VoIP quality is best, but if the
ICPIF value is no less than 55, it is called high impairment and the VoIP
quality is called high impairment and the VoIP quality is poorest. The ICPIF
value lower than 20 is regarded as acceptable. (Since 2001, ICPIF is not
recommended by ITU-T, and E-MODEL replaces it. But currently, we also
measure the communication quality according to ICPIF)
As mentioned previously, any measuring standard needs to correspond to

MOS at last, including ICPIF. The relation of ICPIF and MOS is as follows:
ICPIF range MOS score

0-3 5
4 - 13 4

14 - 23 3
24 - 33 2
34 - 43 1
Currently, in the VoIP network transmission, the common VoIP codec

includes:
G.711 A Law (adopting g711alaw: 64 kbps PCM compression method)
G.711 mu Law (adopting g711ulaw: 64 kbps PCM compression method)
G.729A (adopting g729a: 8 kbps CS-ACELP compression method)
The main transmission parameters are as follows:
Default
Default
Default packet interval Default sending
Codec packet
length between frequency
quantity
packets
160 + 12 RTP
G.711 mu-Law (g711ulaw) 20 ms 1000 Once every 1 minute
bytes
160 + 12 RTP
G.711 A-Law (g711alaw) 20 ms 1000 Once every 1 minute
bytes
G.729A (g729a) 20 + 12 RTP bytes 20 ms 1000 Once every 1 minute
Test procedure of JITTER entity
In the IP network, it is hard to measure the MOS value actually (because

the related VoIP network devices are needed), so the MOS value is
estimated according to the analog VoIP codec and the transmission status
of the VoIP packet in the network (the packet sending speed, interval,
packet size and so on). The JITTER entity is the RTR entity that is
developed based on the previous theory to measure the transmission
quality of the VoIP packet in the IP network.
The JITTER entity can simulate three kinds of codec or customized codec
to send the UDP packets with the corresponding rate, interval and size,
and measure the round-trip time, uni-directional packet loss and uni-
directional delay. Based on the statistics information, calculate the ICPIF
value and estimate the MOS value according the ICPIF value at last.
Use the JITTER entity to test the network transmitting VoIP packets.
Consider two factors for calculating ICPIF, that is, the uni-directional delay
of the packet and the packet loss. Therefore, the formula for calculating

ICPIF Icpif = Io + Iq + Idte + Idd + Ie A can be simplified. Suppose that

Io, Iq and Idte are 0 and then Icpif IddIeA. That is to say, the ICPIF
value can be the delay impairment factor of the packet plus the device
impairment factor of the lost packet minus the expected factor.
Idd is called uni-directional delay impairment factor, which is related with

the uni-directional transmission delay and some constant values (defined
by ITU), such as codec delay, and look ahead DSP delay. The relation of
Idd and uni-directional delay is as follows:
Uni-directional delay (ms) Idd

150 or less 0
200 3
250 10
300 15
350 20
400 25
500 30
600 35
800 or greater 40
Ie is called device impairment factor, which is related with the packet loss.
Ie can be got according to the percentage of the packet loss. The relation
is as follows:
Packet loss percentage PCM (G.711) Ie CS-ACELP (G.729A) Ie

0% 0 10
2% 12 20
4% 22 30
6% 28 38
8% 32 42
The expected factor is used to indicate the conflict balance of the user
access and VoIP quality. For example, compare the countryside where the
signal is difficult to receive with the plain where the signal is good. The
VoIP quality of the wireless telephone of the former is sure to be lower
than the expected VoIP value of the cable phone of the latter. Currently,
the relation of the common user access mode and the expected factor is
as follows:
Max. expected
Communication service type
factor
General cable communication link 0

Mobile communication in the net link of one building 5

Mobile communication of one area or the communication in high-speed
10
movement
The area where the signal is difficult to receive (for example, reflect via the
20
satellite for many times)
These values are just the recommended upper threshold. In

implementation, we can also set the value as 0 by default.
With the uni-directional delay impairment factor (Idd), the impairment

factor (Ie), and the expected factor (A), we can calculate the ICPIF value
according to the formula. As mentioned before, any voice measuring
method need to correspond to the MOS value. Therefore, after ICPIF is
calculated, it also needs to be converted to the corresponding MOS. The
relation of the ICPIF value and the MOS value is as follows:
ICPIF range MOS Quality type

0-3 5 Best
4 - 13 4 High
14 - 23 3 Medium
24 - 33 2 Low
34 - 43 1 Poor
The measured MOS value is just one suggestion for the network to
transmit the VoIP packets, but there may be some difference with the
actual measured MOS.
During the JITTER measuring process, we use the UDP packets (because
the VoIP packets are encapsulated in the UDP packet) to simulate the
transmission of the VoIP packets and calculate the ICPIF value and MOS
value according to the transmission status, so as to detect the quality of
the network transmitting the VoIP packets. The size of the sent UDP
packet, the number of the sent UDP packets, and the interval of sending
the UDP packets depend on the type of the codec to be simulated.
Meanwhile, the user also can customize the codec to configure the
parameters.
To reach more exact measuring and be compatible with Cisco, you need to
configure the RTR Responder at the destination end of the measurement.
Responder is used to set up the connection with the source end and
respond to the detection packet sent by the source end, so as to make the
measurement result more exact. To use the JITTER entity detection,
Responder must be configured at the destination end.

The source end and Responder end adopt the one inner protocol realized
by CiscoSAA control protocol for the connection and communication
detection. The protocol is encapsulated in the UDP packet, belonging to
the application layer protocol.
The SAA control protocol is one private protocol of Cisco and the main
packet formats include the SAA connection request packet and response
packet and SAA packet.
When using the JITTER entity detection, the SLA source end first initiates
the SAA connection request packet according to the specified parameters
and sends it to the destination monitoring port 1967. The SAA connection
request packet is as follows:
Note
Here, the version field indicates the version of the SAA control protocol
and currently, it is 1. Id indicates the ID of initiating the SAA connection
request, used to identify one connection; the frame length indicates the
length of the SAA connection request packet, it is 52 bytes when the life
time field is 2 bytes and it is 56 bytes when the life time field is 6 bytes;
4-byte reserving area is all-0; the 2-byte command type indicates the
connection property and 0004 is the JITTER detection connection;
currently, the 6-byte reserving area is the unknown area and usually, it is
000100000000. The followed are the 4-byte destination IP address and 2-
byte destination port number, indicating the destination IP and port
number of the JITTER connection. The 2-byte or 6-byte life time field
indicates the life time of the connection from being set up to being
disconnected, taking ms as the unit, and it is equal to the number of the
packets sent for one time the interval of sending packets + the packet
timeout. The last are the packet end flag field and usually, it is 0001001c,
and the all-0 filling field.
When RESPONDER receives the request packet, send the SAA connection
response packet after processing. If setting up the connection succeeds,
the detection starts. Otherwise, cut off the connection. The format of the
SAA connection response packet is as follows:

Note
Here, the version field indicates the version of the SAA control protocol
and currently, it is 1. Id indicates the ID of initiating the SAA connection
request, used to identify one connection; the packet length indicates the
length o the SAA connection response packet and it is 8 bytes; 2-byte
response code is 0x0000 for success and 0x0002 for failure; the last is the
2-byte reserving area and it is all 0.
After receiving the response packet from the RESPONDER end, the source
end processes it. If the response packet indicates failure, cut off the
connection; if the response indicates success, start to fill in the SAA packet
and send it to the RESPONDER end for detection. After receiving the
packet, the RESPONDER end processes it, filling in the desired contents
and sending to the source end, that is, completing the packet detection.
The format of the JITTER packet is as follows:
Note
Here, 2-byte packet ID is 0x0002; delta indicates the processing delay of

the RTR responder from receiving the packet to sending the packet, filled
by RTR responder; 4-byte send time indicates the time of sending the
packet, filled by the request end; 4-byte receive time is filled by
RESPONDER, indicating the time of receiving the packet; 2-byte sending
serial No. is filled by the sending end, indicating the serial No. of sending
the packet; the last is the 2-byte receiving serial No. filled by RESPONDER,
indicating the serial No. of receiving the packet.
The detection procedure of the SLA JITTER entity is as follows:

Calculate the detection result of JITTER entity
For the JITTER entity, the results that need to be saved include the packet
round-trip delay, jitter, uni-directional delay (need to synchronize the
clocks of the source and destination ends), and packet loss. The ICPIF
value and MOS value can be calculated according to the previous
parameters.
After setting up the connection, the source end sends the UDP detection
packets to the destination port according to the options negotiated by the
SAA control protocol. Before sending the packet, fill the sending time (ST1)
into the packet and fill in the sending serial No. (QS1), while the
destination end fills the receiving time (RT1) and the receiving serial No.
(QR1) into the packet and fills the delay caused by the processing time
(DT1) into the packet before sending. In this way, if the sending end
receives the packet within the timeout, record the receiving time (AT1).
Record the ST2, QS2, RT2, QR2, DT2, and AT2 of the second packet, as
follows:
The round-trip delay of the packet:
RTTRT1-ST1+AT1-RT1-DT1AT1-ST1-DT1
The packet jitter:
JITTERSD(RT2-RT1)-(ST2-ST1)i2i1
JITTERDSAT2-AT1-((RT2+DT2)-(RT1+DT1)) =i3i2

Here, i1 is the sending interval of the second packet and the first packet;
i2 is the receiving interval of the first packet and the second packet; i3 is
the interval of receiving the response packets of the first packet and the
second packet.
Meanwhile, if the clocks of the source end and destination end are
adjusted to be consistent, the uni-directional delay is:
DelaySD=RT1-ST1;
DelayDS=AT1-RT1-DT1;
The calculation of the lost packets is performed according to the sending

and receiving serial No. of the SAA packets filled by the source end and
RESPONDER end. If it found that the sending serial No. is inconsistent with
the receiving serial No. after the source end receives the filled detection
packet from the destination end or there is no response packet within the
timeout, it indicates that there is lost packet in the network.
The severity of the packet loss is directly reflected by the device

impairment factor (Ie) in the ICPIF factors, so you need to measure and
calculate the lost packets when the detection ends, so as to calculate the
ICPIF and MOS value.
The calculation of the lost packets is as follows:

And then, you can calculate ICPIF: Icpif IddIeA according to the
uni-directional delay and lost packets. After calculating ICPIF, you can get
the MOS value according to the converting relation of ICPIF and MOS, so
as to get the standard of measuring the network transmitting the VoIP
packets.
U D PEC H O Enti t y
The UDPECHO entity is to detect the UDP packets transmitted in the IP
network. The destination address and port of the sent packet need to be
specified in the entity. You can monitor the transmission of the UDP
packets in the IP network via the scheduling for the entity.
The UDPECHO entity can record the round-trip delay and packet loss of
the UDP packets in the IP network via the valid monitoring, even can
record the monitored history information via the logs for the network
administrator to get to know the network communication and fix faults.
The request and response packet of the SAA UDPECHO entity is the same
as the SAA JITTER entity. The packets of the UDPECHO entity are different
from the packets of the JITTER entity. The packet format is as follows:
2 bytes 2 bytes Optional

Packet ID DT Part1 Part2

Note
Usually, the 2-byte packet ID is 00 01, used to identify the data frame
between the sender and responder, but not the request or response
packet; the 2-byte DT field is 00 00 for sender and 00 02 or 0001 for
responder; Part1 and Part2 are optional and the filled contents of them are
related with the rtr attribute data-pattern and packet size. The filling
format is: part 1 gets all even bits of data-pattern. If the value of the even
bit is smaller than or equal to f, complement 0 in the front; if the value of
the even bit is larger than f, fill in ff. For part 2, first n = data-pattern
length/2 and then get the value of the first to nth (ASC code) from data-
pattern. By default, the SAA packet length is 16 bytes; if the current filled
length does not reach 16 bytes, the latter vacancy is filled by the ASC
code.
FL OW -STATI STI C S Enti t y

The FLOW-STATISTICS entity is to detect the interface traffic. One FLOW-
STATISTICS entity corresponds to one interface. With the scheduling for
the entity, the traffic on the interface can be monitored.
The interval for the FLOW-STATISTICS entity to monitor the interface

traffic is 10s-10min. With the valid monitoring, you can record the traffic
peak value information on the interface, even can record the monitored
history information via the logs for the network administrator to get to
know the network communication and fix faults. Therefore, FLOW-
STATISTICS traffic statistics entity is useful.
M AC SL A En tity
The MAC SLA entity is to detect the traffic of the Ethernet link. Currently,
MAC SLA is based on the Delay Measure function of the CFM protocol to
realize, so if you want to configure and run the MAC SLA entity, you need
to configure the CFM domain, service instance and MEP, while the MAC
SLA entity is performed between the specified CFM domain, service
instance and MEP.
Currently, MAC SLA supports the function of detecting the four quality
parameters, including uni-directional delay, bi-directional delay, jitter and
delay of the Ethernet link traffic. When the quality parameter exceeds the
threshold, output the corresponding log information.

The detection function of MAC SLA entity is widely used in Ethernet and
can reflect the network quality.
RTR Group
One RTR group is the set of one or multiple RTR entities. One RTR member
can belong to multiple RTR groups. One group cannot become the member
of the group any more. One group can contain one member for only one
time. The RTR group is identified by the group ID and the group name is
automatically generated by the system.
The RTR group is to schedule one RTR set. The scheduling for the RTR
group is equivalent to schedule all existing RTR entities in the RTR group
and the detection result is saved in the RTR entity history record.
RTR Schedule
If only the RTR entity or RTR group is configured, the detection cannot be
performed. The detection can be completed only after initiating the
scheduling. The RTR schedule is the policy of performing the scheduling
and detection for the RTR entity or group.
The RTR scheduling can take one single entity member or one RTR group
as the object, but cannot take one group and one entity as the object at
the same time. The RTR schedule is identified by the schedule ID and is
not related with the RTR entity type, but the scheduling interval must
consider the attributes of the RTR entity or the members in the RTR group
to be scheduled.
The RTR schedule provides rich schedule policies. You can choose to start
scheduling at once or after some time, even can set the absolute time of
starting the schedule. Besides, the schedule can die out after the set
schedule times or exist forever.

Debug Commands and Debug

Information
After configuring the entity detection of the SLA module, you can use the
SLA display and debug commands to view the detection procedure and
result.
SLA display command
show rtr entity [entityId]

show rtr group [groupId]
show rtr schedule [scheduleId]
show rtr history [entityId]
show rtr entity

Displayed Information Explanation
26-8#show rtr entity There are 6 valid rtr entities
There are 6 valid entities now in the system in the system.
-------------------------------------------------------------- Rtr Id:1 is one icmpecho
ID:1 name:IcmpEcho1 Created:TRUE entity. The time of creating
the entity and the last
****************type:ICMPECHO****************
modifying time; schedule for
CreatedTime:THU JAN 01 05:15:38 2009 0 times; the detected
LatestModifiedTime:THU JAN 01 05:21:43 2009 destination address is 1.1.1.2;
Times-of-schedule:0 send two packets for each
TargetIp:1.1.1.2 schedule; the packet size is
80 bytes; the timeout is 5s;
Transmit-packets:2
the alarm mode is SHELL,
Totally-send-packets:0 none indicates no alarm, log
Packet-size:80 indicates the shell prompt,
Timeout:5(s) log-andtrap indicates the shell
Alarm-type:LOG prompt and sending the trap
Threshold-of-rtt:5 (direction be) information to inform the
NMS, and trap indicates only
Threshold-of-packet-loss:200000000 (direction be)
sending trap to inform the
Number-of-history-kept:200 NMS; the round-trip delay
Periods:1 threshold is 5ms; when the
In-scheduling:FALSE round-trip delay of the
Schedule frequency:23(s) detection is no less than the
threshold, provide the alarm
Status:DEFAULT
by alarm-type; the threshold
of the packet loss is
200000000, be means
alarming when no less than
the threshold, se means
alarming when smaller than
or equal to the threshold, and
alarm by alarm-type; 200
history records can be saved

at most and the new records

cover the old records when
exceeding 200; save the
history record when
--------------------------------------------------------------
scheduling for one time;
ID:2 name:IcmpPathEcho2 Created:TRUE currently, it is not scheduled;
****************type:ICMPPATHECHO************** the schedule frequency is
** 23s; the link status is
CreatedTime:THU JAN 01 05:15:45 2009 DEFAULT; if the destination is
LatestModifiedTime: THU JAN 01 05:36:34 2009 reachable, the link status is
REACHABLE.
Times-of-schedule:0
Transmit-packets:1 (each hop)
Request-data-size:32
Timeout:5000(ms)
Frequency:60(s)
TargetOnly:TRUE
Rtr id 2 is the ICMP-PATH-
Verify-data:FALSE
ECHO entity; the time of
Alarm-type:LOG creating the entity is THU JAN
Threshold-of-rtt:4 (direction be) 01 05:15:45 2009; the last
Threshold-of-pktloss:1 (direction be) modifying time is THU JAN 01
Number-of-history-kept:100 05:36:34 2009; the entity is
Periods:1 scheduled for 0 times, that is,
not start to schedule; only
In-scheduling:FALSE
send one ICMP packet to the
Status:DEFAULT destination end and the
-------------------------------------------------------------- medium devices during each
schedule; the valid payload is
s32 bytes; the timeout is
5000ms; the schedule
frequency is 60s; just detect
the network of the destination
end and to detect the
network of the medium
device, set as FALSE; do not
check the data; the alarm
mode LOG is SHELL prompt,
none means no alarm, log
means the shell prompt, log-
andtrap means the shell
prompt and sending the trap
information to inform the
NMS, and trap means just
NMS; the threshold of the
packet loss is 1 and can only
be set as 1, be means
alarming when smaller than
-------------------------------------------------------------- or equal to the threshold, and
ID:3 name:IcmpPathJitter3 Created:TRUE alarm by alarm-type; save
****************type:ICMPPATHJITTER************* 100 history records and the
*** new records cover the old
CreatedTime:THU JAN 01 05:15:50 2009 records when exceeding 100;
LatestModifiedTime:THU JAN 01 05:15:50 2009 save the history record during
each detection; not in the
Times-of-schedule:0
debug state; the link status is
Transmit-packets:10 (each hop) DEFAULT; if the destination is
Packets-interval:20(ms) reachable, the status is

Request-data-size:32 REACHABLE.
Timeout:5000(ms)
Frequency:60(s)
TargetOnly:FALSE
Verify-data:FALSE
Alarm-type:LOG
Threshold-of-rtt:6 (direction be)
Threshold-of-pktLoss:200000000 (direction be)
Threshold-of-jitter:5 (direction be)
Number-of-history-kept:100 Rtr id 3 is the ICMP-PATH-
Periods:3 JITTER entity; the time of
In-scheduling:FALSE creating the entity is THU JAN
01 05:15:50 2009; the last
Status:DEFAULT
modifying time is THU JAN 01
-------------------------------------------------------------- 05:48:03 2009; the entity is
scheduled for 0 times, that is,
not start to schedule; only
send 10 ICMP packet to the
destination end and the
medium devices during each
schedule; the valid payload is
s32 bytes; the timeout is
5000ms; the schedule
frequency is 60s; just detect
the network of the destination
end and between the source
and the medium devices; do
-------------------------------------------------------------- not check the data; the alarm
ID:4 name:Jitter4 Created:TRUE mode LOG is SHELL prompt,
****************type:JITTER**************** none means no alarm, log
CreatedTime:THU JAN 01 05:15:53 2009 means the shell prompt, log-
andtrap means the shell
LatestModifiedTime:THU JAN 01 05:52:41 2009
prompt and sending the trap
Times-of-schedule:0 information to inform the
Entry-state:Pend NMS, and trap means just
TargetIp:1.1.1.2 targetPort:3434 sending trap to inform the
Codec:G.729A Packet-size:32 Packet-number:1000 NMS; the threshold of the
round-trip delay is 6ms and
Packet-transmit-interval:20(ms)
provide the alarm by alarm-
frequency:60(s) type when the round-trip
TimeOut:5000(ms) delay of the actual detection
Alarm-type:LOG-AND-TRAP is no less than the threshold;
Threshold-of-dsDelay:8(direction be) the threshold of the packet
Threshold-of-dsJitter:8(direction be) loss is 200000000; be means
Threshold-of-dsPktLoss:3(direction be)
Threshold-f-sdDelay:8(direction be) alarming when smaller than
Threshold-of-sdJitter:8(direction be) or equal to the threshold, and
Threshold-of-sdPktLoss:2(direction be) alarm by alarm-type; the jitter
Threshold-of-rtt:6(direction be) threshold is 5ms; save 100
history records and the new
Threshold-of-mos:10000000 (direction be)
records cover the old records
Threshold-of-icpif: 100000000 (direction se) when exceeding 100; save
Number-of-history-kept:120 the history record every
Periods:1 detecting for three times; not
Status:DEFAULT in the debug state; the link
-------------------------------------------------------------- status is DEFAULT; if the
destination is reachable, the
status is REACHABLE.

Rtr Id:4 is one jitter entity;

the time of creating the entity
-------------------------------------------------------------- is THU JAN 01 05:15:53
2009; the last time of
ID:5 name:UdpEcho5 Created:TRUE
modifying the entity is THU
****************type:UDPECHO**************** JAN 01 05:52:41 2009; the
CreatedTime:THU JAN 01 05:15:56 2009 entity is scheduled for 0
LatestModifiedTime:THU JAN 01 06:43:11 2009 times; the entity can run; the
Times-of-schedule:0 destination IP address of the
detection is 1.1.1.2; the
Entry-state:Pend
destination port number is
TargetIp:1.1.1.2 TargetPort:1234 3434 and the simulated is the
TimeOut:5000(ms) well-known codec G729.A,
request-data-size:16 that is, the packet size is
Frequecy:6(s) 32bytes, send 1000 packets
Alarm-type:none during each schedule, the
schedule interval is one
Threshold-of-rtt:15 (direction be)
minute, and the interval of
Threshold-of-pktloss:1 (direction be) sending packets is 20ms; the
Data-pattern:abcd timeout is 5000ms; the alarm
Number-of-history-kept:10 mode is shell and send trap to
Periods:1 inform the NMS, none means
no alarm, log means the shell
Status:DEFAULT
prompt, log-and-trap means
-------------------------------------------------------------- the shell prompt and sending
the trap information to inform
the NMS, and trap means just
NMS, and alarm according to
the alarm-type; the mos and
icpif thresholds are the
calculation result 106, for
example, the MOS threshold
is 10.000000 and it is
10000000 after calculation;
the number of the history
records is 120, and the new
-------------------------------------------------------------- records cover the old records
ID:6 name:flow-statistics6 Created:TRUE when exceeding 100; the link
****************type:FLOWSTATIC**************** status is DEFAULT; if the
CreatedTime:THU JAN 01 05:15:59 2009 destination is reachable, the
status is REACHABLE.
LatestModifiedTime:THU JAN 01 06:51:15 2009
Times-of-schedule:0
Alarm-type:none
Threshold-of-inputPkt:20000 (direction be)
Threshold-of-inputFlow:200000000 (direction be)
Rtr id:5 is one UDPECHO
Threshold-of-outputPkt:200000000 (direction be)
entity; the time of creating
Threshold-of-outputFlow:200000000 (direction be) the entity is THU JAN 01
Interface:vlan2 05:15:56 2009; the last time
Statistics-interval:60(s) of modifying the entity is THU
Number-of-history-kept:220 JAN 01 06:43:11 2009; the
entity is scheduled for 0

Periods:1 times;, that is, do not start to

Status:DEFAULT schedule; the entity is in the
-------------------------------------------------------------- PEND state; the destination
IP address of the detection is
1.1.1.2; the destination port
is 1234; the timeout is
5000ms; the valid payload is
16 bytes; the schedule period
is 6s; the alarm mode is not
alarm; the round-trip delay
threshold is 15ms; be means
alarming when the actual
detection value is no less than
alarming when the actual
detection value is smaller
than or equal to the
threshold, and alarm by
alarm-type; the packet filling
field is abcd; the number of
the history records is limited
to 10 and the new records
history record during each
schedule; the link status is
DEFAULT; if the destination is
reachable, the status is
REACHABLE.
Rtr Id:6 is one FLOW-

STATISTICS entity; the time
of creating the entity is THU
JAN 01 05:15:59 2009; the
last time of modifying the
entity is THU JAN 01
06:51:15 2009; the entity is
scheduled for 0 times;, that
is, do not start to schedule;
the alarm mode is none, that
is, not alarm; the threshold
for the number of the packets
received by the interface is
20000, be means alarming
when the number of the
packets actually received by
the interface is no less than
alarming when the number of
the packets actually received
by the interface is smaller
than or equal to the

threshold, and alarm

according to the alarm-type;
the interface for detection is
vlan2; the detection interval is
60s; the number of the saved
history records is limited to
220 and the new records
history record during each
schedule; the link status is
DEFAULT; if the destination is
reachable, the status is
REACHABLE.
show rtr group

26-8#show rtr group There is one rtr group in the system.
There are 1 valid groups now in the Rtr group2: The interval of scheduling the
system members is 200s and the member list is 3, 45, 60-
---------------------------------------------- 80, 7
ID:2 name:rtrGroup2
Members schedule interval:200
*****************************
type:SINGLE Entity Id :3
type:RANGE Entity start id:60
end id:80
26-8#
show rtr schedule

26-8#show rtr schedule There is one rtr schedule in the system:
There are 1 schedule in the system now Rtr schedule38: Schedule rtr entity 1;
---------------------------------------------------------- start to schedule after three minutes; the
---- life time is 500s; the ageout is 400s;
SCHEDULE ID:38 schedule for twice; the schedule interval
is 35s.
Schedule entity:1
Schedule start after 0:3:0 time
Schedule lives time:500(s)
Schedule repeat time:2 (times) Schedule
interval:35(s)
Schedule ageout time:400(s)
----------------------------------------------------------

----
show rtr history

After scheduling, view the history records of rtr entity 1:

26-8#show rtr history 1 Rtr1 scheduling result is as follows: The
-------------------------------------------------------- maximum number of the history records
------ saved by the ICMP-ECHO entity is 200;
ID:1 Name:IcmpEcho1 CurHistorySize:2 currently, two history records are saved and
MaxHistorysize:200 save according to the schedule interval 23s
History recorded as following: The bi-directional delay is 1ms and there is
THU JAN 01 01:06:18 1970 no lost packet.
Rtt:1(ms) PktLoss:0
THU JAN 01 01:29:38 1970 Note If there is another history record when
Rtt: 1(ms) PktLoss:0 the number of the history records reaches
----------------------------------------------------- 200, the new record covers the oldest record.
---------
After scheduling, view the history record of rtr entity 2:

26-8#show rtr history 2 The result of the rtr2 schedule is as
-------------------------------------------------------------- follows:
ID:2 Name:IcmpPathEcho2 The maximum number of the history
History of record from source to dest: records saved by the ICMP-PATH-ECHO
CurHistorySize:2 MaxHistorysize:100 entity is 100s; currently, two history
THU JAN 01 00:11:59 1970 records are saved; save according to the
Rtt:3 schedule interval 60s.
THU JAN 01 00:21:59 1970
Rtt:3 The bi-directional delay is 3ms; if invalid
----------------------------------------------------------- is displayed, it indicates that the network
--- is unreachable, that is, one packet is lost,
so the entity just sends only one ICMP
packet.
Note If there is another history record

when the number of the history records
reaches 100, the new record covers the
oldest record.

26-8#show rtr history 3 The result of the rtr schedule is as
--------------------------------------------------------- follows:
-------------------------------------------------------------- The maximum number of the history
ID:3 Name:IcmpPathJitter3 records saved by the ICMP-PATH-JITTER
History of hop-by-hop: entity is 100; currently, one history

3.3.3.2 Rtt:1 Jitter:0 Pkt loss:0 record is saved; save according to the
1.1.1.2 Rtt:2 Jitter:0 Pkt loss:0 schedule interval 60s.
History of record from source to dest:
CurHistorySize:1 MaxHistorysize:100 The network environment is as follows:
THU JAN 01 02:30:03 1970
Source-router 1-destination
Rtt:2 Jitter:0 Pkt loss:0
The round-trip delay from the source to
-----------------------------------------------------------
router 1 (3.3.3.2) is 1ms; the jitter is 0;
---
there is no lost packet;
destination 1.1.1.2 is 2ms; the jitter is 0;
there is no lost packet;
And then record the history records

from source to destination.
destination 1.1.1.2 is 2ms; the jitter is o
and there is no lost packet.
Note If there is another history record

when the number of the history records
oldest record.

26-8#show rtr history 4 The result of rtr4 schedule is as follows:
-------------------------------------------------------------- It is the JITTER entity; the maximum
ID:4 Name:Jitter4 CurHistorySize:1 number of the saved history records is
MaxHistorysize:120 120; currently, one history record is
History recorded as following: saved.
THU JAN 01 00:16:06 1970 There is no lost packet from the source
SdPktLoss:0 DsPktLoss:0 to destination and from destination to
Rtt:16 SdDelay:11 DsDelay:15 source. The round-trip delay is 16ms; the
SdJitter:10 DsJitter:10 uni-directional delay from source to
Mos:4.300000 icpif:10.000000 destination is 11ms and the uni-
----------------------------------------------------------- directional delay from the destination to
--- source is 15ms; the jitter from source to
destination is 10ms; the jitter from the
destination to source is 10ms; the MOS
value is 4.3; the icpif value is 10.0.
Note
1. If there is another history record
when the number of the history
records reaches 100, the new
record covers the oldest record.
2. The NTP protocol must be
configured; let the clock to
synchronize.

After configuring the RTR entity 5, view the history records of rtr entity 5:

26-8#show rtr history 5 The result of rtr 5 schedule is as follows:
-------------------------------------------------------------- The detection type is UDPECHO; the
ID:5 Name:UdpEcho5 CurHistorySize:2 maximum number of the history records
MaxHistorysize:10 is 10, currently, two history records are
THU JAN 01 00:31:04 1970 The following is the statistics information
Packet loss:0 Rtt:18(ms) after the entity is scheduled:
THU JAN 01 00:31:10 1970 The number of the lost packets is 0 and
Packet loss:0 Rtt:18(ms) the roung-trip delay is 18ms.
-----------------------------------------------------------
---
Note
If there is another history record when
the number of the history records
oldest record.
After configuring the RTR entity 6, view the history records of rtr entity 6:

26-8#show rtr history 6 The result of rtr 6 schedule is as follows:
-------------------------------------------------------------- It is the FLOW-STATISTICS entity; the
ID:6 Name:flow-statistics6 maximum number of the history records
CurHistorySize:2 MaxHistorysize:220 is 220, currently, two history records are
THU JAN 01 00:31:27 1970
Input pkt:1 (packets/s) Input The following is the traffic statistics of
flow:0(bits/s) Output pkt:1 (packets/s) the interface:
Output flow:0(bits/s)
The rate of receiving the packets is
THU JAN 01 00:31:37 1970
1packets/s; the receiving traffic is
Input pkt:1 (packets/s) Input
0bits/s; the rate of sending the packets is
flow:0(bits/s) Output pkt:1 (packets/s)
1packets/s; the maximum sending traffic
Output flow:0(bits/s)
is 0bits/s.
SLA Debug Commands

debug rtr all: show all SLA debug information
debug rtr icmpecho: the detection information of debugging the

ICMPECHO entity
debug rtr icmp-path-echo: the detection information of debugging the

ICMP-PATH-ECHO entity
debug rtr icmp-path-jitter: the detection information of debugging the

ICMP-PATH-JITTER entity

debug rtr jitter : the detection information of debugging the jitter

entity
debug rtr udpecho: the detection information of debugging the udpecho

entity
debug rtr flow-statistics: the detection information of debugging the flow-

statistics entity
debug rtr macping: the detection information of debugging the macping

entity
debug rtr group: the information of debugging the rtr group
debug rtr schedule: the information of debugging the rtr schedule
debug rtr responder: the information of debugging the rtr responder
Enable the debug during the entity detection and you can see the specific
debug information.

VRRP Technology
This chapter describes the VRRP protocol theory and how to realize it.
Main contents:
Related terms of VRRP protocol
Introduction to VRRP protocol
Related Terms of VRRP Protocol

VRRPVirtual Router Redundancy Protocol
Master: One status of VRRP; the active device is in the state; ensure the
forwarding of the IP packets;
Backup: One status of VRRP; the standby device is in the state; ensure
the switch in time when the active device fails.
Introduction to VRRP Protocol

VRRP is the redundancy backup protocol. Usually, the hosts in one
network are configured with one default route. In this way, the packets
whose destination addresses are not in the local segment are sent to the
default gateway A via the default route, so as to realize the
communication between the host and the outer network. When the
gateway A fails, all the hosts with A as the default route next hop in the
local segment disconnects the communication with the outside.

Here, the used gateway is any network device with the IP forwarding
function, such as switch and router. To make it easy for the reader to
understand, the following uses router to express the gateway.
VRRP is to solve the previous problem and it is designed for the LAN with
multicast or broadcast capability (such as Ethernet). VRRP makes a group
of routers of the LAN (including one MASTER and several BACKUP) form
one virtual router, called one backup group.
The virtual router (that is backup group) has its own IP address. The
router in the backup group has its own IP address. The hosts in the LAN
just need to know the IP address of the virtual router, but do not need to
know the IP address of the master router or the IP address of the backup
router. They set their default route as the IP address of the virtual router.
Therefore, the hosts in the network communicate with other networks via
the virtual router. When the master router in the backup group fails, the
other backup router in the backup group becomes the new master and
continues to provide route service for the hosts in the network, so as to
realize the un-interrupted communication with the out network.
Basic Hierarchy of VRRP in TCP/IP
The VRRP protocol is one IP packet and the protocol number is 112 (0x70).
Structure of VRRP Packet

The structure of the VRRP packet:

Version: Version number; it is 2.
Type: The packet type is 1, indicating ADVERTISEMENTS;
VRID: The configured vrid of the interface, Virtual Router Identifier (VRID).
Priority: The priority configured on the interface. The priority of the router
with the virtual IP address (the router with VIP as the interface IP) is 255;
the priorities of the other routers are 1-254 and the default value is 100.
Count IP Addr: The number of the virtual IP addresses; usually, it is 1.
AuthType: the authentication type;
0: no authentication; AuthData field is all 0.
1: simple text authentication.
Advertise Interval: the period of sending ADVERTISEMENT, taking the

second as the unit; the default value is 1s.
IP Address: virtual IP address.
Checksum: the check summary.
Auth Data: 8 characters at most; if there are no 8 characters, fill 0.
VRRP Workflow
Simply speaking, VRRP is one fault tolerance protocol. It ensures that
when the next hop router of the host fails, there is another router to
replace in time, so as to keep the continuity and reliability of the
communication. To make VRRP work, configure the virtual router number
and virtual IP address on the router. In this way, one virtual router is
added to the network, while the communication between the host on the
network and the virtual router does not need to know any information of
the physical router on the network. One virtual router comprises one
master router and several backup routers. The master router realizes the
real forwarding function. When the master router fails, one backup router
becomes the new master router and takes over the work.
VRRP just defines one kind of packetsVRRP packet, which is one

multicast packet. The packet is sent by the master router to advertise its
existing. The packet can be used to detect the parameters of the virtual
router and also can be used for the selection of the master router.

VRRP defines three kinds of models, including Initialize, Master and

Backup. Here, only the Master state can provide the services for the
forwarding request via the virtual IP address.
The VRRP protocol defined in RFC2338 is made on the basis of the private
HSRP protocol of Cisco, but VRRP simplifies the mechanism put forward by
HSRP, reducing the additional load brought by the redundancy function to
the network. For example, HSRP defines that the virtual router has 6
states, while VRRP has only three, so as to reduce the complexity of the
protocol. In the stable state, HSRP has two states that can send packets,
while in VRRP, only the router in the Master state can forward packets and
the packets are one kind, which reduces the occupied bandwidth The HSRP
packets are based on UDP, while the VRRP packets are encapsulated on
the IP packet. Meanwhile, VRRP supports using the actual interface IP
address as the virtual IP address.
VRRP router forms the different virtual routers via VRID. The routers that
form one virtual router are divided to master router and backup router.
The master and backup virtual routers needs to be confirmed via some
rules. The following are the rules for selecting the master and backup
routers:
1. Select the master router according to the priority. The router with the
highest priority is the master router and the status is Master. If the
priorities of the routers are the same, compare the IP addresses of the
interfaces, the one with larger IP address becomes the master router.
2. The other routers serve as the standby router, monitoring the status
of the master router in real time. When the master router works
normally, it sends one VRRP multicast packet (224.0.0.18), informing
the backup router in the group that it is in the normal state. If the
backup router in the group does not receive the packets from the
master router for a long time, it turns to Master. When there are
multiple backup routers in the group, there may be multiple master
routers. Here, each master router compares the priority in the VRRP
packet and its local priority If the local priority is smaller than the
priority in VRRP, its status turns to Backup. Otherwise, keep its status.
In this way, the router with the highest priority becomes the new
master router and completes the backup function of VRRP.
The virtual router has three status, including Initialize, master and backup.
Master status:
Must answer the ARP request for the virtual IP address; the response
of ARP is the corresponding MAC address of the virtual router IP
address;
Be responsible for forwarding the packets via virtual IP;

Cannot receive the packets with destination IP address as the virtual

router IP (except for the IP address owner);
Must receive the packets with the related IP address as the destination
(if it is the IP address owner);
Must send and receive the protocol packets (multicast);
When turning to master from other status, send the free ARP packets;
BACKUP status:
Cannot answer the ARP request for the virtual router IP address;
Cannot receive the packet with the destination IP address as the

virtual router IP address;
Cannot send the protocol packets; must receive the protocol packets
(multicast);
INITIALIZE status:
No any operation except for answering startup.
The converting of the three status:

VRRP Features
VRRP has the following features:
Gateway backup: Multiple routers share one IP address, preventing

the single virtual IP address with multiple connected clients from
becoming invalid and minimizing the network back hole. This is the
main function of VRRP.
Load balance: It is one function with high VRRP added value. Use
multiple virtual routers to back up multiple gateways; the terminal
sets different virtual router IP addresses to realize the load balance.
Security expanding: The interacting of the VRRP protocol packets can

expand the security via the security authentication mode. VRRP defines
two kinds of authentication modes, including no authentication, and simple
clear text passwords.
no authentication: In one secure network, you can set the

authentication mode as NO. The router does not perform any
authentication processing for the received and sent VRRP packets,
which can improves the VRRP performance.
simple clear text passwords: In one network that may be threaten

by the security, you can set the authentication mode as SIMPLE.
Encrypt the sent VRRP packet and de-encrypt the received VRRP
packet. If the authentication fails, refuse the illegal packet, so as
to ensure the normal running of the VRRP protocol.
Debug Commands and Debug

Information
1. Packet debug
debug vrrp packet or

debug vrrp interface _interface_ group _groupId_ packet
The command is used to print the information of the VRRP packet.
9:54:48: VRRP 1[vlan 1]: Send advertisement priority 100
It shows that the switch sends the VRRP packet from the interface VLAN1;
the VRID is 1 and the priority is 100.
1d14h: VRRP: vlan1 receive packet from 128.255.17.54

1d14h: VRRP: Version 2, Type 1, Vrid 1, Priority 100, AuthType 0,

Adver_Interval 1
It shows that the switch receives the VRRP packet from the interface
VLAN1; the contents of the packet is displayed in detail.
2. Event debug
debug vrrp event

debug vrrp interface _interface_ group _groupId_ events
The command can be used to view the status change of the VRRP device
in detail.
20:00:18: %LINEPROTO-5-UPDOWN: Line protocol on Interface vlan1,

changed state to down
20:00:18: VRRP: vlan1 happen UP/DOWN
20:00:18: VRRP 1: Shutdown event happen
20:00:18: VRRP 1[vlan1]: Change state to INIT
VRRP turns to the INIT state.

changed state to up
20:00:28: VRRP: vlan1 happen UP/DOWN
20:00:28: VRRP 1: Startup event happen
20:00:28: VRRP 1[vlan1]: Change state from INIT to BACKUP
VRRP turns to the BACKUP state.
20:03:32: VRRP 1[vlan1]: Timeout event happend
20:03:32: VRRP 1[vlan1]: Change state from BACKUP to MASTER
VRRP turns to the MASTER state.

VBRP Technology
This chapter describes the VBRP protocol theory and how to realize it.
Main contents:
VBRP protocol terms
Introduction to VBRP protocol
VBRP Protocol Terms

VBRP: Virtual Backup Router Protocol, compatible with the HSRP protocol
of Cisco
HSRP: Hot Standby Router Protocol
Active Router: The active device is responsible for forwarding packets;
Standby Router: The standby device
Standby Group: A group of devices added to VBRP; they maintain one

virtual device together
Introduction to VBRP Protocol

The VBRP protocol takes the function of backing up the device. By forming
the virtual IP address, multiple devices are simulated to one device
(including switch and router). Even one device fails, another device takes
over the corresponding work, which improves the network stability.

As shown in the above figure, the two devices that have unique IP address
respectively are in one network. In the normal state, the user must select
one of the two devices as the default gateway. The failure rate of the user
network depends on the failure rate of the device. However, if the two
devices are configured with the VBRP protocol, generate one logical device
with separate virtual IP address, which is used as the default gateway of
the host. In any specified time, one device is the active device and the
other one is the standby device. The master device forwards and
processes the data flow of the user. When the active device fails, the
standby device takes over all work of the active device and becomes the
new active device, so as to reduce the failure rate of the network to the
concurrent failure rate of the two devices.
Basic Hierarchy of VBRP in TCP/IP
The VBRP packet is one UDP packet. Both the source and destination ports
are 1985.
VBRP Packet Format

The format of the VBRP packet is as follows:

Version: The version number is 0;
Op code: The packet type, 0-Hello, 1-Coup, and 2-Resign;
Hello message: It indicates that the router is running and can become the
active or standby device;
Coup message: When one device hopes to become the active device, send
the message;
Resign message: When device does not hope to become the active device,
send the message;
State: The current status of the device;
0x00Initial, 0x01Learn, 0x02Listen, 0x04Speak, 0x08Standby,

0x10-Active.
Hellotime: It indicates the Hello interval of the sender of the Hello packet
(s). The field is valid in the Hello packet. The router that sends the Hello
packet must fill its own Hellotime into the packet. By default, the Hellotime
is s3s;
Holdtime: It indicates the validity of the Hello packet (s). The field is valid
in the Hello packet. The receiver of the Hello packet regards the Holdtime
in the packet as the validity of the Hello packet. Holdtime should be 3
times of Hellotime at least.
Priority: The priority field; it is used when selecting the active and standby
device. The one with larger value is preferential. If the devices have the
same priority, the one with larger address is preferential.
Group: The standby group number; the value range is 0-255.
Authentication Data: The authentication password; if the authentication

password is not configured, the default value is 0x63, 0x69, 0x73, 0x63,
0x6F, 0x00, and 0x00.
Virtual IPAddress: The virtual IP address used by the standby group.

VBRP Workflow
To make VBRP work, first create one virtual IP address. In this way, one
virtual device is added to the network. However, when the host on the
network communicates with the virtual device, do not need to know any
information of the physical device on the network. One VBRP device is
specified as the active device and another physical device serves as the
standby in case that the active device fails. The active device responds to
not only its own IP address but also the virtual IP address.
When the host sends one packet to the networks except for the local
network, the host configuration indicates that the next hop of the packet is
the default gateway. The IP address of the default gateway is configured,
but to send the Ethernet frame to the device, the host needs to know the
MAC address of the device. The host sends one ARP request to the
network to query the MAC address of the default gateway. The actual host
on the network does not have the MAC address of the virtual device, so
the active device responds to the ARP request. The active device monitors
any traffic to the virtual IP address and maintains the traffic. It looks like
the traffic is routed to the active device.
The device configured with VBRP uses the UDP call packet to advertise
their existing. The advertisement is used to detect the invalidity and
negotiation parameters of the device, such as virtual IP address and
authentication password. The advertisement is also used to select the
device. At any time, there can be only one active device and one standby
device on the network. All other devices configured in one standby group
are in the Listen state until the next route selection. The next selection
happens when the active or standby device becomes unavailable.
VBRP defines three types pf packets. The first is Hello packets, sent by the
active device, standby device and the router in the SPEAK state to inform
group members of their existing The Hello packet also contains the
configuration parameters, such as IP address and timer value. The device
that does not define the parameters can get the parameter values via the
Hello packet.
The second is the Resign packet. When the active device exits from the
VBRP group because the configuration changes or the device is disabled
and so on, the active device sends the Resign packet.
The third is Coup packet. The packet is sent when the preempt
configuration command causes one device to replace the active device. If

the device is the standby device with the highest priority, it becomes the
active device.
The VBRP protocol has 6 states, including INITIAL, LEARN, LISTEN, SPEAK,
STANDBY, and ACTIVE.
1. INITIAL state
All devices start from the initial state. This is one initial state, indicating
that VBRP does not run. When one interface is in DOWN state or turns to
the DOWN state, it enters the state.
2. LEARN state
In the LEARN state, the device waits for the hello packet from the ACTIVE
device and plans to learn the virtual IP address. When one device
configured with one virtual device group is not configured with VIP, the
state appears.
3. LISTEN state
In the LISTEN state, the device knows its VIP, but it is not the ACTIVE
device or STANDBY device. It only accepts the protocol packets from the
ACTIVE device and STANDBY device. It changes its status to take part in
the election of the ACTIVE device or STANDBY device when the protocol
packets are not received from one device within some time (the other
devices except for the ACTIVE and STANDBY devices are all in the LISTEN
state).
4. SPEAK state
In the SPEAK state, the device sends the periodical hello packets and
takes part in the election of the ACTIVE/STANDBY device. The device
cannot enter the SPEAK state before getting VIP.
5. STANDBY state
In the STANDBY state, the device becomes the candidate device of the
next ACTIVE device and sends the periodical hello packets. In one virtual
device group, there can be only one standby device.
6. ACTIVE state

In the ACTIVE state, the device is responsible for forwarding the packets
that are sent to the virtual MAC address of the virtual device group and
responding to the ARP request whose destination IP is VIP. The active
device sends periodical hello packets. In one virtual device group, there
can be only one active device.
VBRP Functions
1. Gateway backup: Multiple devices share one IP address,
preventing that the unique gateway fails and minimizing the
network black hole. This is the main function of VBRP.
2. Load balance: Configure two or more virtual device groups on one

interface. When the virtual device groups are in the normal
running state, they can forward the packets of the segment
balancedly. When one device fails, the other devices take over the
work of the faulty device. When the fault is fixed, they can
continue to work balancedly.
3. Tracking function: Track the status of some important interfaces.

When the status of one interface changes, adjust their priorities.
When the priority reaches some degree (for example, the device
in the standby state turns from the DOWN to UP because of one
interface status, its priority may increase to exceed the priority of
the ACTIVE device) and the status converting appears, so as to
provide the backup function when other link fails.
4. Remote login: When the IP address of the virtual device is like the
IP address of one interface, you can log into the device in the
ACTIVE state remotely;
5. Security authentication: VBRP provides 8-byte text authentication

mode.
Debug Command and Debug

Information
1. Packet debug
debug standby packets hello

The command is used to print the information of the Hello packet.
00:28:18: VBRP: vlan1 Grp 0 Hello out 128.255.17.54 Active Pri 100 vIP
128.255.17.1

The above information shows that the Ethernet port vlan1 sends the VBRP
Hello packet. The VBRP group number is 0; the main address of the
Ethernet port is 128.255.17.54 and the current status is Active; the
priority is 100 and the virtual IP address is 128.255.17.1.
00:38:44: VBRP: vlan1 Grp 0 Hello in 128.255.16.3 Standby pri 100 vIP
128.255.17.1
The above information shows that Ethernet port vlan1 receives the VBRP
Hello packet. The VBRP group number is 0; the source address of the
sender is 128.255.16.3; the current status is Standby; the priority is 100;
the virtual IP address is 128.255.17.1.
Only the VBRP devices in the Speak, Standby, and Active state can send
Hello packets.
debug standby packets coup

The command is used to print the information of the Coup packet.
00:28:18: VBRP: vlan1 Grp 0 Coup out 128.255.17.54 Active Pri 100 vIP
128.255.17.1
The above information shows that Ethernet port vlan1 sends the VBRP
Coup packets. The VBRP group number is 0; the main address of the
Ethernet port is 128.255.17.54; the current status is Active; the priority is
100; the virtual IP address is 128.255.17.1.
02:43:54: VBRP: vlan1 Grp 0 Coup in 128.255.16.3 Active pri 110 vIP
128.255.17.1
The above information shows that Ethernet port vlan1 receives the VBRP
Coup packets. The VBRP group number is 0; the source address of the
sender is 128.255.16.3; the current status is Active; the priority is 110;
the virtual IP address is 128.255.17.1.
debug standby packets resign

02:46:26: VBRP: vlan1 Grp 0 Resign out 128.255.17.54 Active Pri 100 vIP
unknown
02:45:37: VBRP: vlan1 Grp 0 Resign in 128.255.16.3 Active pri 110 vIP
0.0.0.0
The above two pieces of information shows that the vlan1 interface sends
and receives the resign packets respectively.
debug standby packets detail

The command is used with the above debug commands to print the details
of the specified packet, as follows:

r2#debug standby packets detail

r2#debug standby packets hello
02:50:30: VBRP: vlan1 Grp 0 Hello out 128.255.17.54 Active Pri 100 vIP
128.255.17.1
02:50:30: hel 3 hol 10 auth cisco
The above information shows the details of the Hello packet. The Hellotime
is 3s; Holetime is 10s; the authentication password is Cisco.
2. Event debug
debug standby events

The command is one important debug command. The command can be
used to view the status change of the VBRP device.

changed state to up
03:01:15: VBRP: vlan1 API Software interface going up
03:01:15: VBRP: vlan1 Grp 0 Init: a/VBRP enabled
03:01:15: VBRP: vlan1 Grp 0 Init -> Listen
The interface configured with VBRP becomes UP. VBRP first turns from Init
state to Listen state.
03:01:25: VBRP: vlan1 Grp 0 Listen: d/Standby timer expired (unknown)

03:01:25: VBRP: vlan1 Grp 0 Listen -> Speak
The Hello packet is not received from other device, so VBRP turns from
Listen to Speak.
03:01:25: VBRP: vlan1 Grp 0 Speak: c/Active timer expired (unknown)

03:01:35: VBRP: vlan1 Grp 0 Speak: d/Standby timer expired (unknown)
03:01:35: VBRP: vlan1 Grp 0 Standby router is local, was unknown
03:01:35: VBRP: vlan1 Grp 0 Speak -> Standby
The Hello packet is not received from other device, so VBRP turns from
Speak to Standby.
03:01:35: VBRP: vlan1 Grp 0 Standby: c/Active timer expired (unknown)

03:01:35: VBRP: vlan1 API MAC address update
03:01:35: VBRP: vlan1 Grp 0 Active router is local, was unknown
03:01:35: VBRP: vlan1 Grp 0 Standby router is unknown, was local
03:01:35: VBRP: vlan1 Grp 0 Standby -> Active
The Hello packet is received from other device, so VBRP turns from
standby to active.

r2(config-if-vlan1)#shutdown
changed state to down
03:08:32: VBRP: vlan1 API Software interface going down
03:08:32: VBRP: vlan1 Grp 0 Active: b/VBRP disabled
03:08:32: VBRP: vlan1 Grp 0 Active router is unknown, was local
03:08:32: VBRP: vlan1 Grp 0 Active -> Init
The vlan1 port becomes down, so VBRP turns from Active to Init.
The following debug information shows the converting process from Active
to Standby.
03:11:53: VBRP: vlan1 Grp 0 Active: g/Hello rcvd from higher pri Active
router (110/128.255.16.3)
03:11:53: VBRP: vlan1 Grp 0 Active router is 128.255.16.3, was local
03:11:53: VBRP: vlan1 Grp 0 Active -> Speak
The Active device receives one Hello packet with high priority from another
devicer (128.255.16.3). The router is configured as preempt, so the device
enters the Speak state.
03:11:56: VBRP: vlan1 Grp 0 Speak: g/Hello rcvd from higher pri Active
router (110/128.255.16.3)
router (110/128.255.16.3)
router (110/128.255.16.3)
03:12:03: VBRP: vlan1 Grp 0 Speak: d/Standby timer expired (unknown)
03:12:03: VBRP: vlan1 Grp 0 Standby router is local, was unknown
03:12:03: VBRP: vlan1 Grp 0 Speak -> Standby
The Hello packet is not received from other Standby device, so the device
turns from Speak to Standby.
The priority of the Standby device is adjusted to 200 and it turns to Active.
r2(config-if-vlan1)# standby priority 200

03:20:29: VBRP: vlan1 Grp 0 Standby: h/Hello rcvd from lower pri Active
router (110/128.255.16.3)
03:20:29: VBRP: vlan1 Grp 0 Active router is local, was 128.255.16.3
03:20:29: VBRP: vlan1 Grp 0 Standby router is unknown, was local
03:20:29: VBRP: vlan1 Grp 0 Standby -> Active

IPFIX Technology
Overview
This chapter describes the working principle of IPFIX.
Main contents:
Terms
Introduction to the principle
Terms
IPFIX-IP Flow Information Export
IPFIX Packets-The packets sent to the IPFIX workstation from the IPFIX
module; it carries the IP flow statistical information monitored by the
IPFIX on the network devices. The IPFIX packets are UDP packets and
assembled according to the NetFlow v9 mode.
IP flow-The IP packets processed by the network devices; categorize the

packets according to the ingress port, protocol ID, source address,
destination address, TOS field, TCP/UDP source port, and TCP/UDP
destination port. Each category is a IP flow.
IPFIX flow recording template-a type of IPFIX packets; it defines the

format of the subsequent IPFIX flow recording packets.
IPFIX option recording template-a type of IPFIX packets; it defines the

format of the subsequent IPFIX option recording packets.
IPFIX flow record-a type of IPFIX packets; it records the statistics of the
IP flow.
IPFIX option records-a type of IPFIX packets; it records the content of the
statistical options irrelevant with single IP flow in the IPFIX.

Introduction to the Principle

Main contents:
IPFIX working flow
IPFIX restrictions
IPFIX packet structure
IPFIX Working Flow

When the IPFIX function is enabled in the system, the IP packets are
classified into different IP flows according to the ingress port, protocol ID,
source address, TOS field, TCP/UDP source port, and TCP/UDP destination
port. Each IP flow is counted independently. The statistical data of the
flows are assembled into IPFIX packets by the IPFIX periodically and sent
to the specified IPFIX server. The IPFIX server provides powerful graphical
display and calculation capability. It analyzes the flow statistics in the
IPFIX packets to provide materials for traffic monitoring and management
for the network administrators
When the IPFIX is enabled in the switch, the simplest procedure is as

follows:
1. Determine the ports to monitor traffic. The ports are called observation
points.
2. In the observation points, use the ipfix ingress/egress command to

enable the IPFIX to monitor traffic. The ipfix ingress means monitoring
the IP flow received from the observation point; the ipfix egress means
monitoring the IP flow sent from the observation point.
3. Configure the address of the IPFIX server and the UDP destination port
number. The destination address of the IPFIX packets and the UDP
destination port number will use the configuration.
After the preceding configuration is complete, the IP traffic forwarded by

the observation point will be divided into different IP flows for processing
and calculation. The historical IP flow statistics are sent to the IPFIX
module periodically. After the statistical information is received, the IPFIX
module assembles the IP flow statistics into IPFFIX packets. Fill in the
destination address of the packets and the destination UDP port number
according to the configuration. Then, send the packets.
The time cycle of delivering IP flow statistics to IPFIX is determined by the

IPFIXinactive timer configured in the port. The inactive timer specifies the
failure time of a flow. If no packets are hit for an existing flow in the
inactive time, the flow record fails. If the inactive timer of the flow record

times out, the statistical information of the flow will be delivered to the
IPFIX.
IPFIX Restrictions
The restrictions of the IPFIX in a switch are as follows:
1. The IPFIX flow record is controlled by the chip, instead of software.

The switching chip that does not support IPFIX function cannot support
the IPFIX function.
2. For the statistics of INGRESS flow, only the unicast flow is counted. For
the unicast flow, the chip forwards the packets through a single port
instead of multiple ports (namely, it cannot be flooding). The flow
statistics of the egress is not restricted.
IPFIX Packet Structure

The IPFIX packet complies with the NetFlow v9 format. It is composed of
packet header and FlowSet.
Packet Header
Figure 32-1 Format of IPFIX Packet Header
Version: ver9 format, 0x0009.
Count: the quantify of records carried in the packets.
System Uptime: the running time of the device, with the unit of ms.
UNIX Seconds: the seconds from 1700 0 UTC till now.
Sequence: the sequence number of the packets; it is accumulated.
Source ID: the value is 0.

Flo wSet
FlowSet includes: Template FlowSet and Data FlowSet. One IPFIX packet
can contain multiple FlowSets.
Template FlowSet
One Template FlowSet is composed of multiple template records. Each

template record defines a template. The template defines the explanation
for corresponding data records. The IPFIX server explains the received
data subsequently according to the received template.
The template can be classified into flow record template and option record
template. The flow record template defines how to explain the flow record;
the option record template defines how to explain the option records.
The format of the FlowSet composed of flow record template is as follows:
Figure 32-2 Template FlowSet format of the flow template
FlowSet ID: the FlowSet composed of flow record template uses ID 0.
Length: the total length of FlowSet.
Template ID: for the matching of data and template. It starts from 256.
Field Count: the number of Template record fields.
Field Type: the type of the field, indicated with numbers
Filed Length: the number of bytes of the field defined by the field type.
The format of the FlowSet composed of option record template is as

follows:

Figure 32-3 FlowSet format of the option template
FlowSet ID: the FlowSet composed of the option template uses ID 1.
Length: the length of FlowSet, including the length of Padding.
Template ID: for the matching of data and template; it is greater than 255.
Option Scope Length: the number of bytes in the Scope field.
Options Length: the number of bytes in the Option field.
Scope Field Type: the type of the scope field quoted by the relevant data
of the IPFIX process 0x1: system; 0x2: interface; 0x3: line card; 0x4:
IPFIX cache; 0x5: template.
Scope Field Length: The length of Scope field.
Option Filed Type: the type of the option data, the used value is the same
as the field type value described in flow template.
Option Field Length: the length of option data (number of bytes).
Padding: for the FlowSet to align by 32 bits.
The types of the fields used in the IPFIX template are as follows:
Type value Name Description
42 TOTAL_FLOWS_EXP Total exported flow

records
41 TOTAL_PKTS_EXP Total exported IPFIX

packets

1 IN_BYTES Input bytes

2 IN_PKTS Input packets
21 LAST_SWITCHED The last hit time of the
packets
22 FIRST_SWITCHED The time of creating the
flow
8 IPV4_SRC_ADDR The source IP address.
12 IPV4_DST_ADDR The destination IP address
10 INPUT_SNMP The MIB index at the input
interface
14 OUTPUT_SNMP The MIB index at the
output interface
15 IPV4_NEXT_HOP The IPv4 address of the
next hop.
7 L4_SRC_PORT Source port number
11 L4_DST_PORT The destination port
number
4 PROTOCOL Protocol
5 SRC_TOS Source TOS
9 SRC_MASK The length of source mask
13 DST_MASK The length of destination
mask
6 TCP_FLAGS TCP flag
32 ICMP_TYPE ICMP type
16 SRC_AS The BGP AS of the source
route
17 DST_AS The BGP AS of the
destination route
18 BGP_IPV4_NEXT_HOP BGP route gateway
23 OUT_BYTES Output bytes
24 OUT_PKTS Output packets
Data FlowSet
Figure 32-4 Packet structure of the Data FlowSet

FlowSet ID: The FlowSet ID is corresponding to the template ID; the IPFIX
explains the data information according to the corresponding relation.
Length: the length of FlowSet.
Padding: round the FlowSet length according to 32 bits. The length

includes padding.

Port Isolation Technology
This chapter describes the port isolation technology of the switch.
Configure Port Isolation

Main contents:
Introduction to port isolation
Application instance of port isolation
Introduction to Port Isolation

Port isolation is the port-based security feature. The user can realize the
L2 and L3 data isolation between the port and the isolated port according
to the isolated port of the specified port, improving the network security
and provide flexible networking scheme for the user.
By default, the packet forwarding can be realized between any two ports in
one VLAN of the switch. To realize that any specified port in one VLAN
cannot communicate, you can configure the isolated port in the specified
port mode so that the port configured with the port isolation cannot
communicate with the specified isolated port.
The port isolation feature is not related with the port VLAN. Currently, the
switch supports configuring the isolated port in the common port and
aggregation port mode. The configured isolated port can be common port
or aggregation port. The port isolation function only realizes the uni-
directional packet dropping. Suppose that the configured isolated ports on
port A are port B, C, and D. If the destination port of the packet entering
from port A is B/C/D, the packet is directly dropped. But if the destination
port of the packet entering from port B/C/D is A, the packet can be
forwarded normally.

Port Isolation Application

Applic ation Instance 1
Application instance of port isolation
Illustration
Three ports of the switch are connected to three terminal devices

respectively. Port 0/0/1, port 0/0/2 and port 0/0/3 are connected to
terminal 1, terminal 2, and terminal 3 respectively. Port 0/0/1, port 0/0/2
and port 0/0/3 belong to one VLAN. To make terminal 1 cannot
communicate with terminal 2 and terminal 3, use the previous commands
to complete the configuration of the function.
The switch configuration:
Command Description
switch(config)#port 0/0/1 Enter the port configuration mode
switch (config-port-0/0/1)#isolate-port port0/0/2- Configure port0/0/1 to be isolated
0/0/3 from port0/0/2 and port0/0/3
switch (config-port-0/0/1)#exit Exit the port configuration mode

IPv6 Unicast Routing
IPv6 RIPng Dynamic Routing

Protocol
Main contents:
Terms of IPv6 RIPng protocol
Introduction to IPv6 RIPng protocol
Terms of IPv6 RIPng Protocol

UDPv6 (IPv6 User Datagram Protocol): It is one simple IP network
transmission layer protocol based on the unreliable transmission of
packets.
D-V algorithm (Distance-Vector): It is one method of calculating the roite

of the computer network, also called Bellman-Ford algorithm.
IGP: Interior Gateway Protocol;
Request packet: It is used to request the IPv6 RIPng route information

of other route devices.
Resposne packet: It is used to advertise its own route information to the

IPv6 RIPng of other adjacent route device.
Split horizon: learn the route from one interface, but do not advertise
the route to the interface. The IPv6 RIPng protocol is one measure to
prevent the route loop.
Poisoned reverse: Learn the route from one interface and then advertise
the route to the interface with unreachable metric (16). IPv6 RIPng
protocol is one measure to prevent the route loop, which is more active
than Split horizon.
Triggered updates: It is one measure of IPv6 RIPng protocol to speed up

the convergence. When the route changes, generate the triggered updates,
advertising the changed route. Regular updates is opposed to triggered
updates. Regular updates means that the IPv6 RIPng protocol sends out
the updates of all route information with an interval of 30s (by default).

Introduction to IPv6 RIPng Protocol

IPv6 RIPng (Routing Information Protocol for IPv6) is one Distance-Vector
IGP, used for the simple IPv6 route learning of the small network. This
section describes how to configure the IPv6 RIPng dynamic routing
protocol on Maipu route devices for the IPv6 network interconnection.
The running mechanism of the IPv6 RIPng protocol is basically consistent

with the IPv4 RIP protocol. The unique difference is that the advertised
learned route changes from the IPv4 route to IPv6 route.
The advantages of the IPv6 RIPng protocol are that the protocol is simple
and the configuration is simple, but the route information that needs to be
advertised by the IPv6 RIPng is proportional to the route quantity of the
route table. When there are many routes, many network resources are
consumed. Meanwhile, the IPv6 RIPng protocol defines that the maximum
hops of the route devices that are passed by the route path is 15 hops.
Therefore, the IPv6 RIPng protocol is just used for the simple middle/small
networks.
The IPv6 RIPng protocol can be used for most of the campus networks and
the area networks with simple structure and strong continuity. Generally,
the complicated environments do not use the IPv6 RIPng protocol.
Locat ion of IP v6 RI Png Pro tocol in TC P/I P

IPv6 RIPng
TCPv6 UDPv6
Network Layer (IPv6)
Data Link Layer
Figure 34-1 Location of IPv6 RIPng protocol in TCP/IP
A shown in the above figure, the IPv6 RIPng protocol is one routing
protocol based on the UDP protocol. The protocol packet sent by the IPv6
RIPng protocol is encapsulated in the UDPv6 packet. By default, IPv6
RIPng protocol uses the 521 port to send and receive the protocol packets
from the remote route device, updates the local route table according to
the route information in the received protocol packet, and then add the

metric with 1 to advertise to the other adjacent route device. In this way,
all route devices in the route domain can learn all routes.
IPv6 RIPng protocol sends the protocol packets in three modes, as follows:
Table 34-1 The mode of IPv6 RIPng protocol sending packets
Mode Address Port Usage

Multicast ff02::9 521 Send the protocol packets to all adjacent route
devices on one interface
Unicast Unicast IPv6 The source The response packet of one request packet
address packet of
the
request
packet
Unicast Unicast IPv6 521 The protocol packet sent to the configured
address neighbor
I Pv6 RI Png Protoco l Packet Typ e

The IPv6 RIPng protocol has two kinds of protocol packets, including
request packet and response packet. The IPv6 RIPng protocol packet type
and function are as follows:
IPv6 RIPng protocol packet type
Packet Type Function Sending status

Request packet Request the route information When IPv6 RIPng just starts
from the IPv6 RIPng of the running on the interface, request
adjacent route device. You can all route information from IPv6
request the specified route RIPng of the adjacent route
information or all route device.
information (there is only one
route entry whose destination
address is 0, prefix length is 0
and metric is 16).
Response packet Advertise the route information 1. Answer the request packet;
to the IPv6 RIPng of the 2. When the route changes,
adjacent route device trigger updating the route
information;
3. Advertise all route
information to IPv6 RIPng of
the adjacent route device
regularly (regular updates).

I Pv6 RI Png Protoco l Packet Struc ture

Data Link IPv6 UDPv6 IPv6 RIPng IPv6 RIPng routing
Header Header Header Header information
command version must be zero route table entry route table entry
(1 byte) (1 Byte) (2 Bytes) (20 Bytes) (20 Bytes)
Figure 34-2 Basic structure of IPv6 RIPng protocol packet
As shown in the above figure, the IPv6 RIPng protocol packet is

encapsulated in the UDPv6 packet. In the IPv6 header of the IPv6 RIPng
protocol packet, the Hop count field is set as 255, preventing the IPv6
RIPng protocol packet from being forwarded by other route device.
IPv6 RIPng header has two fields: Command field identifies the packet is
the request packet (the value is 1) or the response packet (the value is 2);
the version field is always 1.
Route table entry can have two types, which are described as follows:
Table 34-2 Route table entry type of the IPv6 RIP protocol
Route table entry Type Format Description

The route table entry As shown in Bear the IPv6 route information
the
following
figure
The entry of the next As shown in Bear the next-hop address of the IPv6 route
address route table the information. The using method is: First, add the
following entries of the next-hop address route table, and
figure then add the next-hop address as the route table
entry of the address, at last, end with the next-hop
address route table entry whose next-hop address
is 0:0:0:0:0:0:0:0.
IPv6 prefix (16 Bytes) IPv6 next hop address (16 Bytes)
Route Tag Prefix len Metric Must be zero Must be zero 0xFF
(2 Bytes) (1 Bytes) (1 Bytes) (2 Bytes) (1 Bytes) (1 Bytes)
Route table entry Next hop route table entry

Format of the IPv6 RIPng protocol route information entry

Basic Work Principle of IPv6 RIPng

Protocol
IPv6 RIPng receive
packets
Packet type?
Request Response
packet Else packet
IPv6 RIPng protocol start
packet
Response routing Update routes in

Send Request packet asking
information in unicast database by packet
for all routing information
from neighobr
Routes
changed?
N
Update all routing
information to neighbor
Y
Trigger update
routing information
30 Sec
Protocol start flow Receive packet process flow
Basic work flow of the IPv6 RIPng protocol
The basic work flow of the IPv6 RIPng protocol is as shown in the above
figure, including two parts. One is the flow of starting the protocol and the
other is the flow of processing the received packet.
Protoco l Start Process

When the IPv6 RIPng protocol starts to run on one interface, send the
route request packet to the interface in the multicast mode to request all
route information from all adjacent route devices on the interface, so as to
reach the purpose of fast convergence.
After receiving the response packet of the request packet, update the
routes in the route database according to the route information in the
packet and then advertise the changed route to IPv6 RIPng of other
adjacent route device (Triggered updates).

Meanwhile, enable the Updates Timer and use the route response packet
to advertise all route information to IPv6 RIPng of all adjacent route
devices, so as to ensure the synchronization of the route database
between IPv6 RIPng of each route device and update the advertised route.
In this way, the previous advertised route does not time out and become
invalid on other route devices.
R oute Database
The route database records all route information of the IPv6 RIPng
protocol. Each route information comprises the following elements:
1. Destination subnet address: The destination host or subnet of the

route;
2. Metric: The metric of the destination;
3. Next-hop interface: the interface that forwards the packet to the

destination, that is, the interface that learns the route;
4. Next-hop IPv6 address: The interface IPv6 address of the adjacent

route device that needs to be passed, so as to reach the destination.
Generally, it is the source IPv6 address of the response packet that
learns the route.
5. Source IPv6 address: The source IPv6 address of the response packet
that learns the route;
6. Route tag: It is defined by the user, used to tag one type of route. For
example, tag one route is got by re-distributing the BGP route.
Sources o f Route En tries in R oute Database

The sources of the route entries in the IPv6 RIPng protocol route database
are as follows:
1. The protocol covers the direct-connected route of the interface;
2. The protocol re-distributes the route of other protocol;
3. The RIPng instance re-distributes the route of other RIPng instance;
4. The route generated by the protocol configuration command, such as

generate the command of releasing the default route (default-
information originate);
5. The route learned from IPv6 RIPng of the adjacent route device;

H o w to Get Route Ne xt Hop

In IPv6 RIPng, the next-hop interface of the route is the interface that
learns the route, but the next-hop IPv6 address is selected from the
following two addresses, that is, the source IPv6 address of the response
packet that learns the route and the next-hop IPv6 address in the route
information. If the next-hop IPv6 address exists in the route information
and it is the link local address, the next-hop IPv6 address of the route is
the next-hop IPv6 address in the route information. Otherwise, the next-
hop IPv6 address of the route is the source IPv6 address of the response
packet. This is to realize the function similar to re-direction.
Therefore, for the re-distributed route, when the sending interface is the
next-hop interface of the route, the route carries the next-hop address of
the route.
The following provides one instance to describe the using of the next-hop
address information of the route information in IPv6 RIPng.
Instance diagram of IPv6 RIPng route re-direction
As shown in above figure, IPv6 RIPng runs on Switch-A; IPv6 RIPng and
IPv6 OSPFv3 run on Switch-B; IPv6 OSPFv3 runs on SwitchC. IPv6 RIPng
in Switch-B re-distributes the IPv6 OSPFv3 route 11::/24 learned by the
local device so that switch-A can learn the route to the subnet 11::/24.
When the route is learned on switch-A, the next-hop is Switch-B, that is,
fe80::0201:7aff:fe4f:73f8 by default. As a result, the packets forwarded
from switch-A to the destination subnet 11.0.0.0/8 all first pass switch-B
and then reaches Switch-C.

To solve the problem, when switch-B advertises the route 11::/24 to

switch-A, the next-hop of the route is specified as Switch-C, that is,
fe80::0201:7aff:fe4f:73f7. When switch-A learns the route, the next hop
of the route 11::/24 is specified as Switch-C, that is,
fe80::0201:7aff:fe4f:73f7. As a result, the packets forwarded from switch-
A to the destination subnet 11::/24 are all directly forwarded to Switch-C,
but do not need to pass Switch-B.
R oute Update
When IPv6 RIP of the adjacent route device learns one route, add 1 to the
metric before route processing, so as to accumulate the metric hops.
When the metric is smaller than 15, the route is the reachable route; when
the metric is larger than or equal to 16, the route is un-reachable route.
If the route complies with the following conditions, use the route to update
the routes in the route database:
1. The route does not exist in the route database and the metric of the
route is smaller than 16 hops;
2. The route exists in the database and the source IPv6 address is
consistent with the source IPv6 address of the learned route;
3. The route exists in the database, but the metric is larger than or equal
to the metric of the learned route.
Protoco l Packet Au thenticat ion

IPv6 RIPng protocol packet is not authenticated by the protocol, but is
authenticated by UDP v6.

Status Transition of IPv6 RIPng Protocol

Route Entry and Related Timer
Running
invalid timer on
Valid nexthops of routes
Invalid Timer timeout
or metric is updating Route Update
to 16 (Unreachable)
Running Holddown
Timer timeout Running
holdown timer Invalid +
Invalid flush timer on
and Holddown routes
flush timer on
routes
Flush Flush
Timer timeout Timer timeout
Flush
(Delete route from
database)
Status transition of IPv6 RIPng protocol route entry
IPv6 RIPng protocol has four timers, including Update Timer, Invalid Timer,
Holddown Timer, and Flush Timer. The timers are described as follows:
IPv6 Timers of the RIPng protocol
Timer Name Operation Default Start Function

Object Value Condition
Update Timer Route 30s When RIP is Use the response packet to
database enabled, start advertise all route information to
the timer the RIP of the adjacent route
circularly. device regularly.
1. Ensure the route database
synchronization between the
RIP of each route device;
2. Refresh the previous
advertised route so that the
previous advertised route
does not time out or become
invalid on other route device.
Invalid Timer The next- 180s Start the timer One route becomes invalid when it
hop of the when learning is not updated within some time.
route entry one route The status transition is as shown in
entry the above figure. The timer can be
updated by the response packet.
When the route entry becomes
invalid, disable the timer.
Holddown Route 0s Start the timer One route is not permitted to be
Timer entry when the route updated by the response packet
entry enters within some time after becoming
the invalid invalid, so as to prevent the route

state loop. The status transition is as

shown in the above figure.
Disable the timer when the route
entry leaves the holddown state.
Flush Timer Route 240s Start the timer One route is deleted from the
entry when the route route database after becoming
entry enters invalid for some time. The status
the invalid transition is as shown in the above
state figure.
Disable the timer when the route
entry is deleted.
Avoidance of IPv6 RIPng Protocol Route

Loop
The IPv6 RIPng protocol is the dynamic routing protocol based on
Distance-Vector and does not know the topology of the whole network.
When the network changes, the routes of the whole network need some
time to converge and as a result, the route database of the route device
cannot synchronize in some time. Meanwhile, the topology of the whole
network is not known, so the rout loop may appear. The IPv6 RIPng
protocol uses the following mechanisms to reduce the possibility of
generating the route loop because of the inconsistency on the network,
including Counting to Infinity, Split Horizon, Poisoned Reverse, Holddown
Timer, and Triggered updates.
C ounting to In fin it y
The IPv6 RIPng protocol permits the maximum metric to be 15. The
destination whose metric is larger than 15 is regarded as unreachable.
This limits the network size and prevents unlimited transmission of the
route information. The route information is transmitted from one route
device to another route device and the metric is added with 1 after
transmitting for one time. When the metric exceeds 15, the route is
deleted from the route table.
Spli t Horizon
The route learned from one interface cannot be advertised to the same
interface. If the route learned from one interface is advertised to the same
interface, it may result in the route loop.
The Split Horizon rule of the IPv6 RIPng protocol is as follows: If IPv6
RIPng of the route device learns the route information A from one
interface, the response packet sent to the interface cannot contain the
route information A.

Split Horizon has one special case. When one interface receives a part of
the route information request packet, the response of the packet does not
perform Split Horizon.
Poisoned Re verse
The purpose of the poisoned Reverse is the same as that of Split Horizon,
but there is a little difference as follows.
The Split Horizon rule of the IPv6 RIPng protocol is as follows: If IPv6
RIPng of the route device learns route information A from one interface,
the route response packet sent to the interface contains route information
A, but the metric is set as 16 (that is unreachable).
Compared with Split Horizon, the advantage of Poisoned Reverse is to

advertise the route information to the source route device by setting the
hops as unreachable. If there is route loop, it can be broken at once, while
Split Horizon can only wait for the wrong route entry to be deleted
because of timeout. The disadvantage is that Poisoned Reverse increases
the size of the route response packet, and as a result, the protocol
bandwidth consumption is increased,
H olddo wn Ti mer
Holddown timer is to deny the route entry to be updated by the route
response packet within some time after becoming unreachable.
Holddown timer ensures that the unreachable route is not updated by the
response packet before each route device receives route unreachable
information. The information of the route entry in the received response
packet may be the one advertised previously.
Triggere d updates
Triggered updates is to use the route response packet to advertise the
route change information to the adjacent route device at once when the
route changes.

Poisoned Reverse and Split Horizon breaks he route loop formed by ant
two route devices, but the route loop formed by three or more route
devices still appear until the metric of he route is transmitted and
accumulated to unreachable (16). Triggered Updates can speed up the
route convergence, so as to shorten the time of breaking the route loop.
IPv6 OSPFv3 Dynamic Routing

Protocol
Main contents:
Terms of OSPFv3 Protocol
Introduction to the OSPFv3 protocol
Terms of OSPFv3 Protocol

AS- Autonomous System: a group of route devices exchanging information
through the same routing protocol.
Area: the collection of route devices, which has such topology database:
OSPFv3 divides one AS into multiple areas; the topology of one are is
invisible to another area, which reduces the number of routing information
in an AS. The area is used to contain link state updates and enables the
administrator to create hierachical network.
areaID-the 32-bit ID of the area in the AS.
IGP- Internal Gateway Protocol: the routing protocol running on the

route devices of an AS system, each AS system has an independent IGP;
different AS system may run different IGP. OSPFv3 is one kind of IGP.
Router ID-a 32-bit number, it is granted to the OSPFv3, as a result, each

route device can identify the route device in the AS.
Point To Point network-the network composed of a pair of route devices,

such as a 56kb serial port connection.
Broadcast Networks-the network supports multiple (more than 2) route

devices. The route devices can exchange information with all netowkr
(broadcast) route devices. The neighbor route device is dynamically
detected by the OSPFv3 hello packets. If the network has the multicast
capability, OSPFv3 also uses multicast. Each pair of route device on the
network is supposed to directly connect with the opposite party. The
Ethernet is an example of the broadcast network.

Non-broadcast Multi-Access network-the network supports multiple

(more than 2) route devices. But it has no broadcast capability. The
neighbor is maintained by the Hello packets of the OSPFv3. Owing to the
lack of broadcast capability, configuration is required in the case of
detecting neighbors.
OSPFv3 can exchange information in two types of non-broadcast network:

1. Non-Broadcast Multi-Access, OSPFv3 in the network is similar to the
broadcast network; 2. Point-to-MultiPoint, OSPFv3 processes the network
like processing multiple point-to-point collection.
Interface-the connection between the route device and the reachable

network; each interface has the relevant status information, which can be
obtained through the bottom layer or routing protocol. Each interface has
one associated and unique IPv6 address and mask (except for
unnumbered point-to-point connection).
Neighbor-two route devices have an interface connecting to the same

network. The neighbor relationship is maintained through the OSPFv3 hello
packets.
Adjacency-OSPFv3 creates adjacency between neighor route devices and

then they can exchange routing information. Not every pair of neighbor
route devices can be adjacent.
LSA- Link state advertisement: the data unit for describing local route
device or network state. For a route device, the interface state of the route
device and the adjacency state are contained. The advertisement of each
link is sent to the entire area. The route device uses the collected link
state advertisement to form the link state database.
Stub Area-the area that has only one interface connected with the
external. Category 5 LSA cannot be flooded to the area.
Backbone Area-Composed of all area boarder route devices and the links
among them.
ASE- AS external route: the routes obtained by the non-OSPFv3 protocols,

such as BGP4+, RIPng, and static configured route of the system.
DR- Designate Router: to reduce the number of adjacencies; the

adjacencies are formed in the multiple access network, such as Ethernet,
token ring, and frame relay. The reduction of the number of formed
adjacencies lowers the scale of the topology database. The DR forms
adjacencies with all route devices in the multiple-access network. The
route device send the LSA to the DR, and the DR sends the LSA to the
entire network. Each routng device has a convergence point for sending
information. At the same time, each route device exchanges information
with other route devices in the network.
BDR- Buckup Designate Router: applied in a multi-access network; the

task is to takes over the DR when it fails.
Inter-Area Route-a route generated in non-local area

Intra-Area Route- a route in an area
Flooding-a technology distributing LSA among route devices, as a result, the

route devices running OSPFv3 synchronize the link state database
Hello-hello packets: to create and maintain the neighbor relationship In
the broadcast network, the hello packets can discover the neighbor route
devices dynamically; in addition, hello packets can be used to select a DR
in the network.
NSSA- Not-So-Stubby-Areas: allow the external route to advertise to the

OSPFv3 AS; at the same time, for other parts of the AS system, the stub
area features are reserved. In NSSA ASBR, type 7 LSA is generated to
advertise external routes of the AS area; when the ABR of the NSSA
receives type 7 LSA and the P bit is set to 1, type 7 LSA is converted to
type 5 LSA to other parts of the AS area.
Introduction to the OSPFv3 Protocol

OSPFv3 is an expansion of OSPFv2. OSPFv2 is started in IPv4 and OSPFv3
is started in IPv6. OSPFv3 manages the IPv6 link and IPv6 address. It is
different from OSPFv2 for they are based on different IP protocols, but the
mechanisms of the OSPF protocols are the same.
OSPFv3 detects the changes of IPv6 link and network in the AS and
advertises the link state information. After the convergence for some time,
new route is formed. The convergence time is short and the link state
information is insufficient. In the OSPFv3 protocol, each route maintains
one network topology database describing the AS. Each specific route
device has the same database. Each record of the database is the local
state of the specific route device The route device distributes the local
states through the flooding mode in the AS.
All route devices run the same algorithm in parallel. Each route device
uses the link state database to generate a shortest path tree with itself as
the root. The shortest path tree provides the route to each destination in
the AS. The external routing information serves as leaves in the tree.
OSPFv3 allows the combination of multiple networks. The combination is

called an area. The topology information in an area is invisible to other
areas in the AS. The information shielding can reduce the route traffic. In
addition, the determination of interior route in an area requires the
topology information about the area. Then, the routing information in the
area can be protected. Normally, in the area, the route is determined by
its own topology. One area is the division of a type of application or a
geographical area.

OSPFv3 advertises the IPv6 information including IPv6 prefix and the
prefix length. The last calculated IPv6 route includes one prefix and the
prefix length. IPv6 datagram is routed to the best route.
External routes (such as exterior gateway protocol: BGP) are advertised in

AS. External routes use specific LSA advertisement and serve as part of
the OSPFv3 link state data.
The hierarchy of the OSPFv3 in the network protocol stack is as follows:
Figure 34-1 Hierarchy of OSPFv3 in the network protocol stack
Area Di vi sion in O S PFv3
Figure 34-2 OSPFv3 area, AS division

SW1, SW2, SW3, and SW4 comprise area 1; SW3 is the area boarder
router (ABR);
SW6, SW7, and SW8 comprise area 2; SW6 and SW8 are the area boarder
router (ABR);
SW8, SW9, and SW10 comprise area 3; SW8 is the area boarder router
(ABR);
SW5 is the AS boarder router (ASBR).
SW3, SW5, SW6, and SW8 comprise the backbone area 0.
Process o f OSPFv3
The basic idea of OSPFv3: in the AS, each route device running OSPFv3
collects the IPv6 link state. Broadcast the link state in the entire system
through the flooding mode. Then, the entire system maintains the
synchronized link state database. Each route device calculates a shortest
path tree with the device itself as the root and other network nodes as the
leaves through the database. Then, the best routes to many places in the
system are obtained.
The route devices running the OSPFv3 form an AS. The AS can be divided
into multiple areas. For each route device in the area, an AS topology (link
state database is required).
When the OSPFv3 is enabled in a route device, it creates relationship with

other route devices in the area. By sending hello packets, other route
devices know its existence. It knows the existence of the opposite part by
receiving the hello packets. Then, the neighbor relation with other route
devices is created.
If the network type is broadcast or NBMA network, the route device A will
select the DR and BDR from the known neighbors. In addition, it creates
adjacency with them. As a result, the data traffic is reduced for all route
devices create adjacencies only with the DR and BDR.
If the network type is point-to-point or point-to-multiple point, route

device A attempts to create adjacency with all neighbors. In this case,
route device A exchanges network topology with neighbors that have
created adjacencies.

Routing device A exchanges network topology through the database

description (DD) with adjacent neighbor-route device B.
When route device A discovers updated route in route device B, request

the route from route device B through the link state request. Routing
device B also requests updated route from route device A. After the two
parties receives the requests from the opposite party, the two parties send
detailed routing information to the opposite party through the link state
update packets. And confirm the receiving of link state update packets
(link state ACK).
After the topology is obtained, route device A runs the SPF algorithm to
generate a shortest path tree to other route devices in the area with its
own as the root. Calculate the shortest path of each route according to the
routing information advertised by each route device and then record it in
the IPv6 routing table. The route to the destination in the future is
obtained from the routing table.
Each route device in the area exchanges link state information with
specified route devices continuously. Therefore, the adjacencies of each
point-to-point link exchange link state information paralelly. After the link
state information is exchanged, the link state information will also be
flooded. Therefore, the route devices in the entire area have the same link
state database.
The area boundary router belongs to multiple areas at the same time.
Therefore, the route of the home area of route device A will be advertised
to other areas, and the routes of other areas will be advertised into the
area. Through the exchange of topology in the boundary route devices,
the home area of route device A learns the network topology and routes of
the entire AS area. In the OSPFv3, the boundary routers form the
backbone area.
When the AS boundary router knows the AS external route, the AS

boundary router will advertise the routes to the internal of the AS. As a
result, route device A can obtain a topology of the entire network.
O SPFv3 Gr acefu l Restart

To support the None Stop Forwarding function of the device, the protocol
needs to support Graceful Restart, so as to prevent the route flap and
route black-hole after the device is restarted or active/standby switchover.

The basic principle of graceful restart: Prevent the neighbor relation

between the neighbor route device and the restarting route device from
flapping during restarting. The neighbor route device still keeps the
protocol information and topology information of the restarting route
device during the restarting and regards that the restarting route device
still can forward packets. After restarting, the restarting route device
completes the synchronization of the route information with the neighbor
route device as soon as possible and then updates the local route
information.
Graceful Restart Roles
According to the NSF capability, the route devices are divided as follows:
NSF-Capable routing device: the route device with the None Stop
Forwarding capability. It is required that the device has the dual-control
redundancy and routing protocol GR capability.
GR-Capable routing device: the route device with the graceful restarting
capability.
GR-Aware route device: the route device that can be aware that GR
happens to the neighbor and can help the neighbor to complete GR. GR-
Capable route device is also the GR-Aware route device.
GR-Unaware route device: the route device that cannot be aware that GR
happens to the neighbor and cannot help the neighbor to complete GR.
According to the role of the route device in the GR process, the route
device can be divided as follows:
GR-Restarter route device: the route device that performs the protocol
graceful restarting;
GR-Helper route device: the route device that helps the protocol graceful
restarting.
Process of OSPFv3 Restarting Restarter Gracefully
Restarter is the device that performs the device restarting or OSPFv3

protocol restarting. The process is: Generate Grace-LSA, inform the
neighbor, prepare the graceful restarting and the interval of the graceful
restarting is also called graceful period. During the graceful period, the
neighbor regards that the restarted route device is normal and the
neighbor status keeps as FULL. For the restarted route device, there are
two processes, including enter and exit the graceful restarting. During the

period, the neighbor plays the role of Helper, also called Helper mode,
including enter and exit Helper mode.
Graceful period rule: Do not generate any type of LSA. Do not perform the
update processing for the received self-generated SLA, but just receive it.
Permit the route calculation, but do not install the route to the system
forwarding table. If the device is DR before restarting, it is still DR after
restarting.
The features of entering the graceful restarting period: After the interface
becomes up, first generate Grace-LSA to advertise the neighbor. Delay
sending the Hello packet, so as to receive the hello packet of the neighbor
and enter the 2-way status. After the adjacency becomes FULL, perform
the SPF calculation, but do not install the route to the core route table.
As long as meeting any of the following conditions, exit the graceful

restart status: Finish setting up all adjacency relations; receive the
Router-LSA inconsistent with the one before restarting (for example, the
link of the Router-LSA generated by the neighbor does not contain itself,
which indicates that the neighbor exits the helper mode abnormally or
other abnormality); the graceful restart time arrives.
The action of exiting the graceful restarting: Re-generate Router-LSA. If it

is DR, Network-LSA needs to be re-generated. Re-run SPF to calculate the
route, generate Summary-LSA, NSSA-LSA, and As-External-LSA, and
update the route table. Delete the invalid LSAs and Grace-LSAs (that is,
set LSA age as 3600 and be advertised).
Process of OSPFv3 Restarting Helper Gracefully
If Route device (X) wants to complete the graceful restarting, its neighbor
route device (Y) must help to complete the graceful restarting. The device
that helps to complete the graceful restarting is Helper. During the period,
Helper is also called entering the Helper mode. The feature is that it is
based on each segment, that is, the link with the adjacency relation;
During the restarting period, advertise the link of the restarting route
device. For the virtual link, still set Bit V.
When the route device at Helper end receives Grace-LSA of the neighbor,
set the neighbor restart flag and prepare to enter the Helper mode. The
following conditions need to be met: Check whether X (the graceful
restarting route device Restarter) and Y (Helper route device) are the FULL
adjacency; After X restarts, the related link does not change; whether the

local configuration is to permit the Helper mode; Y is not in the graceful

restarting Restarter status.
If meeting any of the following conditions, exit the Helper mode: Grace-
LSA is deleted; Grace period is due; the link database contents change.
The action of exiting the Helper mode: Re-elect the DR of the segment and
regenerate Router-LSAs of the segment. If it is DR, regenerate Network-
LSA; if it is virtual link, regenerate the Router-LSA of the virtual link.
Link State Database (L SD B) of the O SPFv3

The LSDB of the OSPFv3 contains the information about the entire area. It
exchanges information with the adjacent neighbor to maintain the
synchronization of the LSDB in the entire area. It enables the OSPFv3 to
dynamically perceives the route changes through the hello packets and the
link state update packets.
The LSDB is composed of link state advertisements (LSA). The LSA can be
divided into 8 categories:
Router-LSA: generated by the route devices in the area. It describes the

link state of the route device and is flooded only in the area.
Network-LSA: generated by the DR in the area. It describes the reachable

route devices in the area on is only flooded in the area.
Inter-Area-Prefix-LSA: generated by ABR. It describes the network

Inter-Area-Prefix-LSA: generated by ABR. It describes the network

AS-External-LSA: generated by ASBR. It describes the external route

information outside of the AS.
NSSA-LSA: generated by the ASBR. It describes the external route

information outside of the AS (it is flooded only in the NSSA area).
Link-LSA: generated by the route devices in the domain area. It describes

the IPv6 Link-Local address of the IPv6 link and the IPv6 prefix. It is
advertised only in the local link.
Intra-Area-Prefix-LSA: generated by the route devices in the area. It

describes the IPv6 prefix and the association information about router-LSA
and network-LSA.

In the area boarder router, all areas use the intra-area routes calculated
to form an Inter-Area-Prefix-LSA and flood it to other areas. The backbone
area uses the intra-area inter-area routes calculated to form an Inter-
Area-Prefix-LSA and flood it to other areas. All boarder routers and the
links among them form the backbone area. Backbone areas are mutually
reachable. They can be connected physically or through the virtual link. In
the case of configuring the virtual link, the passed area must be transit
area, instead of stub or NSSA area.
The ASBR of the AS sends the external routing information to all areas
except the stub area in the AS. The route devices in the stub area are
directed to the ASBR through the default route.
NSSA allows external routes to be advertised to the OSPFv3 AS. In

addition, the stub features of other parts in the AS are reserved. ASBR of
the NSSA generates NSSA External LSA (type 7) to advertise external
routes. The NSSA External LSAs are flooded in the NSSA are but
terminated in the ABR. When the ABR of the NSSA receives the type 7 LSA
and the P bit is set 1, the type 7 LSA will be converted into type 5 LSA to
other AS areas. If the P bit is set to 0, it will not be converted. Therefore,
the NSSA External LSA will not be advertised to external NSSA.
O SPFv3 Packet Encapsu lation

The OSPFv3 packet is composed of multiple encapsulations. The external
layer of the packet is IPv6 header. In the IPv6 header, the encapsulated
packet can be one of the following five types. The format of each type of
packet starts with the OSPFv3 header with unified format. The packet data
of the OSPFv3 packet varies with the packet format.

Figure 34-3 OSPFv3 packet encapsulation
OSPFv3 Packet Header
Figure 34-4 OSPFv3 packet header
OSPFv3 packet has a standard OSPFv3 header. The length of the packet
header is 16 bytes. The recorded information determines whether further
processing is required.
Version: the version number of OSPFv3; the value is 3.
Type: the packet type at the later part of the OSPFv3 header. The OSPFv3
has five types of packets. Hello packets, type=1; database description
packets, type=2; link state request packets, type=3; link state update
packets, type=4; link state acknowledgement, type =5.
Area ID: the area where the packet is generated; when the packet passes
the virtual link, area ID is 0.0.0.0.
Checksum, the checksum of the entire packets.
Instance ID: an IPv6 link can be started in multiple OSPFv3 processes.

Different instance ID is used to identify the OSPFv3 process. When the
neighbor packets are exchanged, the instance IDs must be the same.
0: reserved field. It is not used currently.
Hello Packet Format

Figure 34-5 Hello packet format
The hello packets are for creating and maintaining adjacencies. After the
interface is UP, if the OSPFv3 is started, the hello packets are sent
periodically to detect neighbors and thus to create adjacency relation.
After the adjacency relation is created, periodically hello packets are
required to maintain the adjacency. Hello packets contain some necessary
consistent parameters required when the neighbor sets up the adjacency,
such as the hello interval and neighbor dead time. If they are inconsistent,
the hello packets will be discarded.
Interface ID: a 32-bit number; it identifies the interface sending the hello
packets in the local route devices, such as the IfIndex.
Router priority: it is used in the case of selecting DR and BDR. When the
router priority is 0, the route device cannot be selected as DR or BDR..
Option: The optional capability supported by the route devices. See the
option domain in OSPFv3 packets.
Hello interval: the interval of sending hello packets periodically
Router Dead Interval: if no hello packets are received in the router dead
interval, the neighbor is considered to be down. Delete the neighbor.

Designated Router: the router ID of the DR selected by the interface

generating the packets.
Backup DR: the router ID of the BDR selected by the interface generating
the packets.
Neighbor: the list of the neighbors that can receive hello packets at the
interface generating the packets in the router dead interval.
Format of Database Description Packets
Figure 34-6 format of the database description packets
DD packets are exchanged at the beginning of adjacency creation. The DD

packets carry the summary description information of LSA. The summary
information of all LSAs in the link state database should be exchanged
through the DD packets. The DD packets are exchanged through the
select-respond mode. The two neighbors creating the adjacency negotiate
a Master and a Slave. The Master first starts sending DD packets. After the
Slave receives the DD packets, a response is given. The response DD
packets contain its LSA summary. When the information of all LSAs is
exchanged, the DD packet exchange process is over.

Interface MTU: the maximum IPv6 packets that can be transmitted when
the interface generating the packets is not fractionized When the packets
are transmitted in the virtual link, the interface MTU is set to 0.
Option: see the option domain in the OSPFv3 packets.
I-bit: initial bit, when the packet is the initial packet of the DD packet
M-bit: when the packet is the last packet of the DD packet sequence, the
bit is 1.
MS-bit: Master/Slave bit, when the Master is set to 1 in the case of

generating packets, the slave is set to 0.
DD Sequence Number: sequence number of the DD packets, set by the Master

LSA Headers: the LSA header list of the link state database.
Format of Link State Request Packet
Figure 34-7 Format of the link state request packets
After the DD packets are exchanged, compare the link summary

information and the LSA in the database. For the LSA unavailable in the
database or the older LSA, send the link state request to neighbors for
new LSA or the LSA unavailable in the database.
Link State Type: for describing the LSA type
a LSA.
Advertising Router: the router ID of the route device generating the LSA

Format of the Link State Update Packet
Figure 34-8 Format of the link state update packets
In the process of creating neighbors, when the link state request packets
are received, the LSA in the local database is sent to neighbors through
the update packets. In addition, if the local link state changes, the
changed LSA is sent out through the update packets. The flooding
mechanism is used in the case of sending update packets.
#LSAs (Number of LSA): the number of LSAs contained in the packet
LSAs: the list of the LSAs sending updates
Format of the Link State Acknowledgment Packet

Figure 34-9 Format of the link state acknowledgement packets
LSA headers: the LSA headers acknowledged
LSA header
Figure 34-10 LSA header
Age: the duration after the LSA is generated
Type: the type of LS.
a LSA.
Advertising Router: the router ID of the route device generating the LSA
Sequence Number: the sequence number of LSA, when new instances of

LSA are generated, it increases.

Checksum: the checksum of the LSA except Age
Length: length of LSA, with the unit of byte
Format of Router LSA Packet
Figure 34-11 Format of the router LSA packet
V: Virtual Link Endpoint bit; set the bit when the route device generating
the packet is one end of a virtual link
E: External bit, set the bit when the route device generating the packets is
ASBR
B: External bit, set the bit when the route device generating the packets is
ASBR
W: Multicast bit, it is set when the route device generating the packet is
the wild-card multicast receiving route device.
Options: supported option capability.

Type: the described interface type, including point-to-point, multipoint

access, and virtual link.
Metric: the output cost of the interface
Interface ID: described in the interface index.
Neighbor Interface ID: the interface ID of the neighbor, point-to-point

interface type refers to the neighbor interface ID; multipoint access
interface type refers to the interface ID of the DR route device.
Neighbor Router ID: the router ID of the neighbor route device; the point-
to-point interface refers to the router ID of the neighbor route device; the
multipoint access interface type refers to the router ID of the DR router.
Format of Network LSA Packet
Figure 34-12 Format of the Network LSA packet
Link State ID: for the Network LSA, it is the interface ID of the DR
interface
Attached Router: the list of the route devices adjacent to the DR in the
network
Format of Inter-Area-Prefix-LSA Packet

Figure 34-13 Format of Inter-Area-Prefix-LSA packet
Metric: the cost of the destination route
PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix

of the destination address.
Format of Inter-Area-Router-LSA Packet

Figure 34-14 Format of Inter-Area-Router LSA packet
Options: the option capability of the route devices described in the LSA.
Metric: the cost for reaching the destination route device described in the
LSA.
Destination Router ID: the router ID information about the described route
devices.
Format of the Autonomous System External LSA Packet

Figure 34-15 Format of the Autonomous System External LSA packet
E: External metric bit, the type of the external cost used by the route If
the E bit is set to 1, the cost type is E2; if the E bit is 0, the cost type is E1.
F: forwarding address bit, if it is set to 1, it indicates that the forwarding

address exists.
T: the tag bit of the route, if it is set to 1, it indicates that the tag value
exists.
Referenced LS Type: the LS type related with the LSA; if the value is set,
the Referenced Link State ID exists; through the LS Type, Link State ID
and the advertised router ID of the LSA, you can find the related LSA.
Metric: the cost of the route, set by the ASBR

PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix

of the destination address.
Forwarding Address: the destination address of the generated packets If

the forwarding address is not set, the packets of the advertised destination
should be sent to the ASBR generating the packets.
External route tag: the tag of the external route
Referenced Link State ID: the related link state ID
Format of Link LSA Packet
Figure 34-16 Format of the Link LSA packet
Each IPv6 link in the route device generates a corresponding link LSA. The
link LSA is advertised only in the local link. The content of the
advertisement contains the IPv6 link-local address and the IPv6 prefix
address in the link. The link ID of the LSA is the interface ID.
Rtr Pri: the priority of the router

Options: the options will be used in the Network LSA where the link
resides.
Link-local Interface Address: the IPv6 link-local address of the link.
#Prefixs: the number of prefixes contained
PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix.
Format of Intra-Area-Prefix LSA Packet
Figure 34-17 Format of Intra-Area-Prefix LSA packet
Intra-Area-prefix LSA: it is used to advertise the interface address, stub

network prefix address, and transit network prefix address of the route
devices. The information is advertised through Router-LSA and Network-
LSA in OSPFv2. In Router-LSA and Network-LSA of OSPFv3, there in no
prefix address information. You need to use Intra-Area-Prefix LSA to
advertise.
# prefixs: the number of IPv6 prefix addresses advertised in LSA.
Referenced LS Type, Link State ID, Advertising Router: the LSA related
with IPv6 prefix advertised by LSA can be router-LSA and network-LSA.

PrefixLength, PrefixOptions and Address Prefix: describes the IPv6 prefix.
Metric: the cost of advertisement prefix.
Option Domain in the OSPFv3 Packets
Figure 34-18 Option domain of the OSPFv3 packets
DC: set the bit in the case of configuring the demand line
EA: set the bit when the source route device has the capability of
receiving/sending external attributes LSA
N: used only in the hello packets, set it to 1 when the NSSA external LSA
is supported; set it to 0 when the NSSA external LSA is not supported;
when N is set to 1, the E bit must be 0.
P: used only in the NSSA external LSA headers If P bit is set, the ABR of
NSSA must convert type 7 LSA to type 5 LSA.
MC: set the bit when the source route device forwarded multicast packets.
E: set the bit when the source route device received the ASE LSA packets.
IPv6 Address in the OSPFv3 Packets
IPv6 address is a 128-bit string. It is composed of three parts: PrefixLegth,

describes the prefix length of the IPv6 address; PrefixOptions: describing
the option capability of the prefix; Prefix, the prefix information of the
address.
The prefixoptions include:
Figure 34-19 Option domain of the OSPFv3 prefix
NU: non-unicast address, if the bit is set to 1, it indicates that the prefix
address cannot be used in the case of calculating routes.
LA: local address, indicates that the prefix address is a local address
advertising the route device.

MC: set the bit when the source route device forwarded multicast packets.
P: the prefix used in NSSA External LSA. If P bit is set, the ABR of NSSA
must convert type 7 LSA to type 5 LSA.
D if ference Betwe en O SPFv3 and O SPFv2

1. Based on Different IP Protocol
OSPFv2 is running on IPv4 protocol. It manages IPv4 links and IPv4

addresses.
OSPFv3, namely OSPF for IPv6, it is the expansion of OSPFv2 to support

IPv6. It is based on OSPFv2. It advertises the IPv6 link state and the
address of IPv6 link prefix It is running on IPv6 protocol.
2. Same Points
The basic packet types are the same, including hello, LS-DD, LS-Req, LS-
Upd, and LS-Ack. The process and principle of neighbor discovery and
adjacency creation are the same. The types of the supported interface
network are the same, including P2P, P2MP, Broadcast, NBMA, and Virtual.
The flooding mechanism and the aging mechanism of LSA are the same.
The SPF calculation principles are also the same. The contained LSAs are
basically same. Two types of LSA are added in OSPFv3 to advertise IPv6
Link-local address and IPv6 prefix address. Router ID, Area ID, and Link
ID use the IPv4 address format.
3. Difference
OSPFv3 is running on an IPv6 link. The concept of subnet does not exist.
OSPFv2 is running on a subnet.
In one IPv6 link, multiple OSPFv3 processes are allowed and they are
identified through the Instance ID. But one IPv4 interface can run one
OSPFv2 process only.
The link ID of the OSPFv2 LSA expresses the IPv4 address information.
But the link ID of OSPFv3 does not express the address information. It is
used to identify different LSAs and it has no special meaning.(a few link
IDs of the LSA express the interface ID information, such as Network LSA).
OSPFv3 multicast packets use the IPv6 multicast address to send. Unicast
packets use the IPv6 link local address to send.
The scope of the OSPFv2 LSA flooding is judged from the LSA type. The
header of OSPFv3 LSA contains the flooding scope (flag bit of other
capabilities are also contained, for example, how to process the

unidentified LSA), through which to determine the flooding scope of the

LSA.
OSPFv3 neighbors is identified through the router ID.
Two types of LSAs are added in OSPFv3: Link LSA, advertises the link local
address, and it is flooded only in the local link; Intra-area-prefix LSA, used
to advertise the IPv6 address information of the interface.
O SFPv3 Features
OSFPv3 Features
1. OSPFv3 is a kind of IGP, designed for using in the AS system
2. The link state advertisement packet is small in size, each

advertisement describes one part of the link state dabase.
3. Support NBMA; OSPFv3 processes the network like processing LAN-

select DR, generate network LSA. Some configurations are required
when the route devices discover the network neighbor.
4. In OSPFv3, the AS system can be divided into multiple areas. It has

the following advantages: 1) the routes in an area and the routes
between areas are separated; 2) dividing the AS system into areas can
reduce the calculation of SPF.
5. Input external information flexibly: each external route in the OSPFv3

is input in the AS system in a single LSA. It reduces the flooded data
volume. As a result, when a single route changes, it is possible to
update part of the routing table.
6. Four route levels: intra-area, inter-area, external type1, and external

type 2. Then, the route protection of multiple levels is implemented
and the route management of the AS is simplified.
7. Support virtual link: through the configuration of allowing virtual link,

the OSPFv3 can partly remove the restriction over the AS system of
the physical topology.
8. Flexible metric: in the OSPFv3, the metric is specified as the output

cost of the route device interface. The path cost is the total of the cost
of all interfaces. The route metric can be specified by the system
administrator according to the network features (delay, bandwidth, and
cost).
9. Equivalent multiple paths: If there are multiple paths with the same
cost to the destination, OSPFv3 finds the paths and uses load balancing.
10. Support stub area: when the area is set to stub area, the external
LSAs cannot be flooded to the stub area. In the stub area, the route to
the external destination is specified by the default route.

Resource Cost of the OSPFv3
Link bandwidth: in the OSPFv3, the reliable flooding mechanism ensures

the synchronization of the link state database of the route device. When
the network topology is not changed, single LSA packet update lasts for
long (30 minutes by default). When the size of the database increases, the
bandwidth used by flooding algorithm also increases.
Memory of route device: the link state database of the OSPFv3 may
become very large, especially when many external link states are
advertised. In this case, the memory of the route device must be very
large. In the process of updating and synchronizing the link state database,
large amount of memory is used.
CPU usage: in the OSPFv3, it is related with time of running the SPF
algorithm. Moreover, it is related with the number of route devices in the
OSPF system. In addition, when the link state database is very large, in
the process of protocol convergence, if large amount of packets should be
exchanged, a great deal of CPU is occupied.
Specify the router role: specify the router in the multi-access network to
receive and send more packets than other route devices. At the same time,
when the specified router fails, it is switched to a new specified router.
Because of this, the number of the route devices connected to a network
should be restricted.
Precautions of OSPFv3
Limiting the size of the OSPFv3 system can save the memory of the route
device.
In the area, to reduce the database size, do as follows: 1. the area can
use the default route, so reduce the external route that should be input; 2.
EGP external gateway protocol can use its own information to pass the
OSPFv3 AS area instead of depending on the IGP (such as OSPFv3) to
transmit information; 3. You can specify the route device to be the stub
area; 4. If the prefix address of external network is regular address, you
can summarize the addresses. After the summary, the external
information of the OSPFv3 decreases dramatically.
IPv6 IS-IS Dynamic Routing

Protocol
Main contents:
Terms of IPv6 IS-IS protocol

Introduction to IPv6 IS-IS protocol
Route learning of IPv6 IS-IS protocol in Single-Topology
IS-IS Multi-Topology
Terms of IPv6 IS-IS Protocol

PDU (Protocol Data Unit): The packet unit that bears the protocol data
information;
SPF: Shortest Path First Algorithm;
IS (Intermediate System): It is equivalent to the route device in TCP-IP. It

is the basic unit of generating the route and transmitting the route
information in the IS-IS protocol. In the following contents, IS has the
same meaning as the route device;
ES (End System): It is equivalent to the host system in TCP/IP. ES does

not take part in the processing of the IS-IS routing protocol. ISO has the
specified ES-IS protocol to define the communication between the end
system and the intermediate system;
NET (network entity title): It is used to identify the ISO address of one
intermediate system. It is similar to the IP address and is divided to area
ID and system ID;
Area: The route area in the IS-IS protocol, including Level-1 Area and
Level-2 Area;
LSP (Link State PDU): Bear the link status information to be publicized,
including the adjacency information and reachable subnet information;
LSDB (Link State Database): It comprises the LSP generated by all ISs of
the whole area, describing the adjacency topology and related route
information of the whole area. LSDB has the same backup on each IS. IS
uses the SPF algorithm to calculate the route according to its own LSDB;
IIH (Intermediate System to Intermediate System Hello PDU): It is used

to discover and keep the IS neighbor alive.
SNP (Sequence Number PDU): The abstract information of advertising a

group of LSP packets, including PSNP and CSNP. It is used to confirm the
LSP packet, request the LSP packet, and advertise the LSDB abstract
description information;
PSNP (Partial Sequence Number PDU): It is one kind of the SNP packet,
used to confirm the LSP packet (point-to-point network) and request the
LSP packet (broadcast network);
CSNP (Complete Sequence Number PDU): It is one kind of the SNP packet,
used to advertise the LSDB abstract description information;

Pseudo-node: One IS node simulated by DIS in the broadcast network,

used to simplify the adjacency topology of the broadcast network;
DIS (Designated IS): One IS system elected from all ISs on the broadcast
network. It is responsible for simulating one Pseudo-node and maintaining
the synchronization of LSDB of all ISs on the broadcast network.
Introduction to IPv6 IS-IS Protocol

IS-IS (Intermediate System to Intermediate System) is the IGP based on
the SPF algorithm. The basic design idea and algorithm of the IS-IS
protocol are basically consistent with OSPF. The IS-IS protocol is the
routing protocol based on the link layer, is unrelated with network layer
(IPv4, IPv6, OSI) and is not limited by the network layer, so it has good
expansibility.
The IS-IS protocol can support the routes of multiple protocol stacks,
including IPv4, IPv6, and OSI. At first, the IS-IS protocol is applied in the
OSI protocol stack (ISO10589) and then is used in the routes of IPv4
protocol stack (RFC1195) and the IPv6 protocol stack (draft-ietf-isis-ipv6).
Meanwhile, the IS-IS protocol can support the CSPF calculation of MPLS-TE
(RFC3784).
The IS-IS protocol has good compatibility (the different devices that
realize different expanding functions can also be compatible with each
other) and large the network capacity; it supports the multiple protocol
stacks and can be upgraded smoothly; it is simper than OSPF and is
unlikely to have problems. Therefore, IS-IS is suitable for large core
backbone network.
Route Learning of IPv6 IS-IS Protocol in

Single-Topology
O ve r vie w
Single-Topology means that the IS-IS database only records and describes
one network topology and all address stacks (IPv4, IPv6, OSI) use one
network topology (adjacency information topology) to calculate the route.
To generate the route of the IPv6 address stack according to the topology,
each route device needs to advertise the IPv6 reachable subnet
information when advertising the link status information. After calculating
the shortest path (SPF tree) to all route devices, generate the IPv6 route

according to the shortest path and the IPv6 reachable subnet information
advertised by the route devices.
I S-I S Single -topolog y Ne ighbor Check

The address stacks (such as IPv4 and IPv6) supported by the local IS-IS
interface must be supported by the IS-IS interface of the neighbor.
Meanwhile, you need to check the interface address: When checking the
IPv4 address stack, check whether the Hello packet of the neighbor
advertises the IPv4 address of the same subnet as the local interface;
when checking the IPv6 address stack, check whether the Hello packet of
the neighbor advertises the Link Local Address.
Ad vert ise Reachable Infor mation o f I S -I S

Singl e -topology Subnet
When advertising the link status information, the route device advertises
its own IPv6 reachable subnet information, which is the same as IPv4.
C alcula te I Pv6 Route of I S - IS Sing le - Topolog y

The process of calculating the IPv6 route is similar to the process of
calculating the IPv4 route.
First, calculate the shortest path to the route device, that is, calculate the
SPF tree by the SPF algorithm. And then calculate the route according to
the shortest path and the advertised IPv6 reachable subnet information to
the route device.
IS-IS Multi-Topology
O ve r vie w
In the previous IS-IS protocol, the advertised link status database has
only one network topology, which is Single-Topology.
To advertise and learn the IPv6 route in Single-Topology, it is required

that the network topology of the IPv6 route domain is consistent with the
network topology of the IPv4 route domain, because the link status

database has only one network topology. However, in the actual

application, IPv4 network topology is inconsistent with the IPv6 network
topology and the single-topology cannot meet the application. Therefore,
Multi-Topology appears.
Multi-Topology means that the advertised link status database advertises

multiple separate network topologies. Each topology is identified by the MT
ID. Multi-Topology is not just for the separation of the IPv4 unicast route
topology and the IPv6 unicast route topology, but can support the
separation of various route topologies.
I S-I S Mu lti -topolog y Pa cket For mat

To advertise multiple separate topologies, IS-IS multi-topology adds
several new TLV formats. The basic principle is to encapsulate the previous
single-topology link status TLV in the multi-topology link status TLV, so as
to distinguish the link status TLV of the topologies. This kind of TLVA is
released in the LSP packet. The TLV format is as follows:
Octect MT Intermediate MT Reachable IPv4 MT Reachable IPv6

Num Systems TLV Prefixes TLV Prefixes TLV
1 Type 222 Type 235 Type 237
1 Length Length Length
2 R |R |R |R | MT ID R |R |R |R | MT ID R |R |R |R | MT ID
extended IS TLV format extended IP TLV format IPv6 Reachability format
0 - 253
extended IS TLV format extended IP TLV format IPv6 Reachability format
Figure 34-26 Format of IS-IS MT link status TLV
In the IS-IS multi-topology, add one TLV of advertising the topology

status, which records which multi-topology the system supports and the
status of the topology (Overload, Partition, and Attached). This kind of TLV
is encapsulated in the LSP packet and Hello packet. The format of this kind
of TLV is as follows:

Octect
Num
Multi-Topology TLV
1 Type 229
1 Length
2 O |A |R |R | MT ID
O |A |R |R | MT ID
Figure 34-27 Format of IS-IS MT TLV
M aintai n I S -I S M ulti -topolog y N eighbor

When any protocol stack on the local IS-IS interface uses the multi-
topology, use the multi-topology checking method when checking the
neighbor protocol. Otherwise, use the single-topology checking method.
The following describes multi-topology neighbor checking method.
Point-to-point Neighbor
When the neighbor has any kind of topology which is the same as the
interface, set up the neighbor. Otherwise, the neighbor cannot be set up.
Broadcast Interface Neighbor
To elect DIS on the broadcast interface, set up the neighbor no matter

whether the neighbor has the same topology as the interface.
Ad vert ise Adjacenc y Reachable Infor mation of

I S-I S Mu lti -topolog y
Adjacency Information of Point-to-point Interface Neighbor
For point-to-point neighbor, the adjacency information only appears in the

link status database of the topology supported by the local interface and
neighbor.

Adjacency Information of Broadcast Interface Neighbor
For the broadcast adjacency, as long as it is the topology supported by the

interface, all adjacency information on the interface appears in the link
status database of the topology. When checking the adjacency in route
calculation, check whether there are both the forward and backward paths.
If only one end supports one topology and the other end does not,
although supporting end advertise the adjacency information, there is no
backward path in the link status database because the other end does not
advertise. Here, the adjacency information is not used when calculating
the topology route.
Adjacency Information of Virtual Node
Generating the adjacency information of the virtual node is consistent with

the previous single-topology. Calculating the routes of all topologies uses
the information.
Ad vert ise Reachable Subnet Infor mat ion of IS -

I S Mul ti -topo log y
The reachable subnet information is added to the corresponding topology
according to the protocol stack (IPv4 or IPv6) of the subnet information.
O ve rload, Partit ion and Attac hed Flags of I S -

I S Mul ti -topo log y
The Multi-Topology TLV of the LSP packet contains the Overload, Partition
and Attached flags of the topology. When calculating the route of the
topology, use the related flag of its own topology.
When calculating the single-topology route, use the related flag carried by
the LSP header.
I S-I S Mu lti -topolog y Rou te Calcu lation

When calculating the route, the route of each topology should be
calculated separately. When calculating the route, each topology can only
use its own topology information. The topology information of virtual node
is used in all topologies.

In the database, there can be the following topologies: single-topology

and IPv6 multi-topology. The IPv4 route can be calculated only in the
single topology, while the IPv6 route can be calculated in wide single-
topology or IPv6 multi-topology, but cannot be calculated in the two
topologies at the same time.
IPv6 BGP4+ Dynamic Routing

Protocol
Main contents:
Terms of IPv6 BGP4+ protocol
Introduction to IPv6 BGP4+
Terms of IPv6 BGP4+ Protocol

AS: Autonomous System AS is a set of routing devices and hosts in the
same management control domain and policy. The AS number is allocated
by the internet registration organization.
EBGP: BGP between AS systems. An EBGP neighbor is a routing device of

the management and policy control beyond the local AS.
IBGR: the BGP in the same AS. An IBGP neighbor is the routing device in
the same management control domain.
NLRI: Network Layer Reachability Information NLRI is a part of the BGP

update packets, used to list the collection of the reachable destination.
MP-BGP (Multiprotocol BGP): The BGP that carries different kinds of route
information is called multi-topology BGP.
Introduction to IPv6 BGP4+ Protocol

Border Gateway Protocol (BGP) is a kind of route selection protocol for
exchanging network layer reachability information (NLRI) between route
selection domains. Its main function is to exchange NLRI with other BGP
peers. A BGP peer refers to any device running BGP.
BGP uses the TCP as the transmission protocol (port 179). Then, reliable
data transmission is provided. The retransmission and acknowledgement

of data are implemented by the TCP, instead of BGP. As a result, the

process is simplified. The reliability need not be designed in the protocol.
Create a TCP connection between two routing devices running BGP. Then,
the two routing devices are called peers. Once the connection is created,
the two peer routing devices acknowledge the connection parameters by
exchanging the open packets. The parameters include BGP version number,
AS number, duration, BGP identifier and other optional parameters. After
the two peers negotiate parameters successfully, the BGP exchanges
routes by sending update packets. The update packets contain the list of
reachable destinations passing each AS system (namely NLRI), and the
path attributes of each route. When the route changes, incremental
update packets are used between peers to transmit the information. BGP
does not require refreshing routing information periodically. If the route
does not change, the BGP peers only exchange keepalive packets. The
keepalive packets are sent periodically to ensure the valid connection.
The present Internet is one large network that comprises multiple

interconnected AS systems. Here, BGP V4.0 (BGP version 4, BGP4) is the
route selection protocol.
IPv6 BGP4+ is the inter-domain routing protocol that supports IPv6. Based
on BGP4, it reflects the information of the IPv6 network layer protocol to
NLRI and Next_Hop attribute. It brings in two NLRI attributes, that is,
MP_REACH_NLRI (Multiprotocol Reachable NLRI, used to release reachable
IPv6 route and next-hop information) and MP_UNREACH_NLRI
(Multiprotocol Unreachable NLRI, used to cancel the unreachable IPv6
route). The Next_Hop attribute is identified by the IPv6 address, which can
be IPv6 global unicast address or next-hop link local address. IPv6 BGP4+
uses the BGP4 multi-topology expanding attribute to be applied in the IPv6
network, while the original message mechanism and routing mechanism of
BGP4 do not change, so we can say that the application situation and
working principle of IPv6 BGP4+ are the same as BGP4.
BG P Message H eader
The BGP message header contains a 16-byte tag, 2-byte length field, and
1-byte type field. The following figure illustrates the format of the BGP
message header.

Figure 34-28 Format of the BGP message header
The header can be followed by data or not. It depends on the message

type, for example, the keepalive message only requires the message
header, and no data is followed.
Marker: the marker field occupies 16 bytes, used to detect the

synchronization loss between BGP peers. If the message type is open, or
the open packets do not contain the authentication information, the
marker fields must be set to 1. Otherwise, the marker field is calculated by
the authentication technology.
Length: the length field occupies 2 bytes. It indicates the length of the
message. The minimum allowed length is 19 bytes and the maximum is
4096 bytes.
Type: The type field occupies one byte. It indicates the type of the BGP
message. The four types of the BGP message are as follows:
Table 34-3 Numbers of BGP message types
Number Type
1 Open
2 Update
3 Notification
4 Keepalive
O pen Messages
After the TCP connection is created, the first packet is the open message.
The Open message contains BGP version number, AS number, duration,
BGP identifier, and other optional parameters.

If the open message is acceptable, it means that the peer routing devices
agree with the parameters. In this case, the keepalive message is sent to
acknowledge the open message.
Except the fixed BGP header, the open message contains the following
fields:
Figure 34-39 Format of BGP Open message
Version: the version field occupies one byte. It indicates the version
number of the BGP protocol. When the neighbors are negotiating, the peer
routing devices agree on the BGP version numbers. Usually, the latest
version supported by the two routing devices is used.
My Autonomous System: the field is two bytes. It indicates the AS number

sending the routing device.
Hold Time: the field is two bytes. It indicates the maximum waiting time
when the sending party receives the adjacent keepalive or update
messages. The BGP routing device negotiates with the peer and set the
hold time to the smaller value of the two hold times.
BGP Identifier: the field is four bytes. It indicates the identifier of the BGP
sending routing devices. The field is the ID of the routing device, namely
the maximum loopback interface address or the maximum IP address of
the physical interfaces. You can set the address of the router-id manually.
Optional parameter Length: the field is one byte. It indicates the total
length of the optional parameter fields (the unit is byte). If there are no
optional parameters, the field is set to 0.

Optional Parameters: variable length field. It provides the list of the

optional parameters of the BGP neighbor negotiation.
U pdate Message
The update message is used to exchange routing information between BGP
peers. When you advertise routes to a BGP peer or cancel the routes, the
update message is used. The update message contains the fixed BGP
header and the following optional parts:
Unfeasible Routes Length: two-byte field. It indicates the total length of

the withdrawn route field. If the field is 0, there is no withdrawn routes.
Withdrawn Routes: variable length field. It contains the IP address prefix

list of the routes withdrawn from the services.
Total Path Attribute Length: the field is two bytes; it indicates the total
length of the path attribute field.
Path Attribute: the variable long field contains the BGP attribute list
related with the prefix in the NLRI. The path attribute provides the
attribute information of the advertised prefix, such as the priority or next
hop. The information is for route filtering and route selection. The path
attribute can be classified into the following types:
1. Well-Known Mandatory: the attributes must be contained in the BGP

update message and the attributes must be implemented and
recognized by all BGP vendors, such as origin, AS_PATH, and
Next_HOP.
ORIGIN: one kind of the well-known mandatory attributes. It gives the

origin of the route update message. There are three possible origins: IGP,
EGP, and INCOMPLETE. The routing device uses the information in the
processing of multiple route selections. Select the route with the lowest
ORIGIN attributes. IGP is lower than the EGP and EGP is lower than the
INCOMPLETE.
AS_PATH: The AS_PATH is a kind of well-known mandatory attributes.

AS_PATH indicates the AS systems that the route in the update message
passes.
NEXT_HOP: It is a kind of well-known mandatory attributes. The attribute

describes the IP address of the next-hop routing device of the destination
listed in the reaching update message.
2. Well-Known Discretionary: the attributes that must be recognized by

all BGP implementations. But the BGP update message can contain the
attribute or not.
LOCAL_PREF: used to distinguish the priority of multiple routes to the

same destination. The higher the attribute of the local priority is, the

higher is the route priority. The local_pref is not contained in the update
message sent to the EBGP neighbor. If the attribute is contained in the
update message from the EBGP neighbor, the update message will be
ignored.
ATOMIC_AGGREGATE: used to warn that the path information is lost in the

downstream routing devices. Some routing information is lost in the route
aggregation for the aggregation comes from different sources with
different attributes. If a routing device sends the aggregation that causes
the information loss, the routing device requires adding the
atomic_aggregate attribute to the route.
3. Optional Transitive: not all BGPs support the optional transitive

attribute. If the attribute cannot be recognized by the BGP process, it
views the transitive tag. If the transitive tag is set, the BGP process
accepts the attribute and transmit it to other BGP peers.
AGGREGATOR: the attribute marks the BGP peer (IP address) performing
the route aggregation and the AS number.
COMMUNITY: the attribute indicates that one destination serves as one

member of the destination group, and these destinations share one
multiple features. The type code of the community attribute is 8. The
community is regarded as a 32-bit value. To facilitate management,
assume that: the community values from 0 (0x00000000) to 65535
(0x0000FFFF) and from 4294901760 (0xFFFF0000) to 429467295
(0xFFFFFFFF) are reserved. The left community value should use the AS
number as the first two bytes. The meaning of the last two bytes can be
defined by the AS. Beyond the reserved values, several well-known
community values are defined.
NO_EXPORT (4294967041 or 0xFFFFFF01): the received routes with the

value cannot be published to the EBGP peers. If an alliance is configured,
the route cannot be published beyond the alliance.
NO_ADVERTISE (4294967042 or 0xFFFFFF02): the received route with

value cannot be published to the EBGP or IBGP peers.
LOCAL_AS (4294967043 or 0xFFFFFF03): the received route with the

value cannot be published to the EBGP peer or the peers of other AS in the
alliance.
4. Optional Nontransitive: not all BGPs support the optional nontransitive

attributes. If the attribute is not recognized by the BGP process, it
views the transitive tag. If the transitive tag is not set, the attribute is
ignored and is not transmitted to other BGP peers.
MULTI_EXIT_DISC (MED): used by BGP peers to distinguish multiple exits

to a adjacent AS. The lower the MED is, the higher is the route priority.
MED attributes are switched between AS systems. When the MED attribute
enters an AS, it will not leave the AS (nontransitive). This is different from

the processing of local priority. The external routing device may affect the
route selection of another AS. The local priority only affects the route
selection in the AS.
ORIGINATOR_ID: the attribute is used by the route reflector. The attribute

is a 32-bit value generated by the route originator. The value is the
routing device ID in the AS. If the originator finds its own router-id in the
received originator-id of the route, it knows that route loopback is
CLUSTER_LIST: the attribute is a list of the cluster ID of the route reflector

that the route passes. If the route reflector finds its own local cluster-id in
the received CLUSTER_LIST of the route, it knows that route loopback is
To advertise the IPv6 reachability information and cancel the IPv6

unreachability information in the BGP update message, create the
following two attributes:
MP_REACH_NLRI: Multiprotocol Reachable NLRI, used to release the

reachable IPv6 route and next-hop information;
MP_UNREACH_NLRI: Multiprotocol Unreachable NLRI, used to cancel the

unreachability IPv6 route.
Network Layer Reachability: the variable long field contains the list of
reachable IP address prefix advertised by the sender.
K eepal i ve Message
The keepalive messages are exchanged between peers periodically to
check whether the peer is reachable.
N oti fication Message

When any error is detected, the notification message is sent. The BGP
connection is closed after the message is sent. Except the fixed BGP
message header, the notification message contains the following fields:
Error Code: one byte, the field indicates the error type.
ERROR SUBCODE: one byte, the field provides more details about the
error.
DATA: variable length field, the field contains the data related with the
error, for example, invalid message header, illegal AS number. The
following table lists the possible error codes and the error subcodes.

Table 34-4 BGP Notification message error code and error subcode
Error Code Error Subcode

1-Message header error 1-Connection not synchronized
2- Message length is invalid
3-Message type is not supported
2-Open message errors 1-Version numbers not supported
2-AS number of invalid peers
3-Invalid BGP identifiers
4-Not supported optional parameters
5-Authentication failed
6-Unacceptable hold time
7-Not supported capability
3-Update message error 1-Format of the attribute list is incorrect
2-well-known attribute cannot be recognized
3-Well-known attribute is lost
4-Attribute tag error
5-Attribute length error
6-Source attribute is invalid
7-AS route cycling
8- next-hop attribute is invalid
9-Optional attribute error
10-Network field is invalid
11-AS path format is incorrect
4-Hold timer timeout Not used
5-FSM error (errors detected by FSM) Not used
6-Stop (critical errors except the listed Not used
errors)
BG P Finite -State Machin e

Before the BGP peer can exchange the NLRI, one BGP connection must be
created. The creation and maintenance of the BGP connection can be
described in the FSM. The following provides the complete BGP FSM and
the input events causing the state change.

Figure 34-40 BGP FSM
Table 34-5 Input Events (IE)
IE Description
1 BGP starts
2 BGP ends
3 BGP transmission connection opens
4 BGP transmission connection is terminated
5 Fail to open the BGP transmission connection
6 BGP transmission fatal errors
7 Retrying connection timer times out
8 Duration time terminated
9 Keepalive timer terminated
10 Receive Open messages.
11 Receive Keepalive messages.
12 Receive update messages
13 Receive notification messages
Idle: initial status, the BGP is in the idle status until an operation triggers
a startup event. The startup event is usually triggered by the creation or
restart of BGP session.
Connect: BGP is waiting for the completeness of the transmission protocol

(TCP). If the connection succeeds, send the Open message, and enter the
status of sending open message. If the connection failed, move to the

active status. If the re-connecting the timer times out, it remains in the
connection status; the timer will be reset and one transmission connection
is started. If any other events occur, it returns to the idle status.
Active Status: in the status, BGP attempts to create a TCP connection with
the neighbor. If the connection succeeds, send the Open message, and
move to the status of sending open message. If re-connecting timer times
out, the BGP restarts the connection timer and goes back to the
connection status to monitor the connection from the peers.
OpenSent: in the status, the open message is sent. BGP is waiting for the
open message sent from the peers. Check the received open message. If
any error occurs, the system sends a notification message and goes back
to the idle status. If no error occurs, the BGP sends a keepalive message
to the peer and resets the keepalive timer.
OpenConfirm: in the status, BGP is waiting for a keepalive or notification

message. If a keepalive message is received, it enters the created status.
If a notification message is received, it goes back to the idle status. If the
hold timer times out before the keepalive message reaches, send a
notification message, and goes back to the idle status.
Established: the last phase of the neighbor negotiation. In the status, the
connection between BGP peers is established. Between peers, the update,
notification, and keepalive messages can be exchanged.
BG P Path Att ributes

The path attribute is a major feature of the BGP route. The path attribute
provides the necessary information about the basic route function and
allows the BGP to set and exchange the route policy.
The route attribute can be one of the following:
Well-Known Mandatory;
Well-Known Discretionary;
Optional Transitive
Optional Non-Transitive;
Well-known mandatory: all BGP update messages contain the attribute,

Well-known discretionary: BGP update messages can contain the attribute,


Optional Transitive: BGP does not need to support the attribute, but it
should accept the path with the attribute and the paths should be
advertised.
Optional Non-Transitive: BGP does not need to support the attribute. If it

is not recognized, the update message with the attribute is ignored; the
path is not published to the peer.
The meaning of the common path attribute is as follows:
ORIGIN: Well-known mandatory, specifies the source of the update

message;
AS_PATH: Well-known mandatory; use the AS sequence to describe the

path between AS systems or the routes to the destination specified by the
NLRI.
NEXT_HOP: Well-known mandatory; describes the next-hop IP address of

the published destination path.
MULTI_EXIT_DISC: Optional non-transitive; allows one AS to notify the

first entrance point to another AS.
LOCAL_PREF: Well-known; the attribute is used to describe the first level

of the BGP device whose route has been published;
ATOMIC_AGGREGATE: well-known discretionary; used to warn the path

information loss in the downstream devices;
AGGREGATOR: Optional transitive, indicates the AS number and IP

address of the device launching the aggregation route;
COMMUNITY: Optional transitive, simplifies the implementation of policy;
ORIGINATOR_ID: Optional non-transitive, the route originator prevents

loopback by identifying the ID in the attribute;
CLUSTER_LIST: Optional non-transitive, the reflector prevents loopback by

identifying the ID in the attribute;
BG P Route Decis ion

BGP BGP Route Decision Process
When multiple routes with the prefix of the same length and to the same
destination exist, BGP select the best route according to the following rules:
1. Next-hop unreachable route is ignored;
2. Preferentially select the route with the maximum weight value;

3. Preferentially select the route with the maximum LOCAL_PREF value;
4. Preferentially select the route originated locally;
5. Preferentially select the route with the shortest AS_PATH;
6. Preferentially select the route with lowest ORIGIN attribute;
7. Preferentially select the route with the minimum MED value;
8. Preferentially select the route obtained through the EBGP, instead of

through IBGP;
9. Preferentially select the route whose next-hop has the minimum IGP
metric;
10. Preferentially select the first received EBGP route;
11. Preferentially select the route with the minimum BGP ROUTER-ID;
12. Preferentially select the route with shortest CLUSTER_LIST;
13. Preferentially select the route from the lowest neighbor address;
14. If the BGP load balancing is started, rules 10-13 are ignored. All routes
with the same AS_PATH length and MED values will be installed in the
routing table.
Instance of LOCAL_PREF and MED Preferential Selection
higher LOCAL_PREF value

User AS100 obtains routes from ISP1 and ISP2. But ISP1 is the preferred
ISP. When the device connected to the ISP1 announces routes to the
switch-F, set the LOCAL_PREF value higher. For the same destination,
preferentially select the routes learned by ISP1 for its LOCAL_PREF value
is higher.
lower MED value
The two-host structure is used between a user and an ISP. The ISP prefers
to use LINK2 and use LINK1 as the backup. When the user publishes
routes to the ISP, the update packets with lower MED value are
transferred on LINK2. If the routes transferred on EBGP neighbor created
on LINK2 and LINK1 have no different options, the route with lower MED is
selected preferentially. As a result, the traffic of ISP enters ISP from LINK2.
R oute Fi ltering
Route filtering means that a BGP speaker can determine the sent route
and the received route from any BGP peers. Route filtering is to define the
route policy. The procedure is as follows:
1. Identify Routes
2. Allow or deny routes
3. Operation attributes
We can complete route filtering through access list, prefix list, or AS path
access list. We can also use the route mapping to implement filtering and
attribute operation.

R oute Ref lector

The route reflector is the centralized routing device or focus of all internal
BGP (IBGP) sessions. The peer routing device of the route reflector is
called route reflector customer. The customers match with route reflector
and exchange routing information. Then, the route reflector exchanges or
reflects the information to all other customers to eliminate the
requirements for the full interconnection environment. As a result, large
amount of money is saved.
The route reflector is recommended only in the large scale internal BGP
closed network. The route reflector increases the overhead of the route
reflector server. If the configuration is incorrect, the route may be cyclic or
unstable. Therefore, route reflector is not recommended in every topology.
All iance
The alliance is another method for processing the sharp increase of IBGP
closed network in the AS. Similar to the route reflector, the alliance is
recommended only in the large scale internal BGP closed network.
The concept of the alliance is put forward because one AS can be divided
into multiple sub-AS systems. In each sub-AS, all IBGP rules are
applicable. For example, all BGP routing devices in the sub-AS must form
a fully closed network. Each sub-AS has different AS number. Therefore,
external BGP must be run between them. Although the EBGP is used
between sub-AS systems, the route selection in the alliance is similar to
the IBGP route selection in a single AS. Namely, when the sub-AS boarder
is crossed, the next-hop, MED, and local priority information is reserved.
An alliance looks likes a single AS.
The defect of the alliance is: in the case of changing the plan from the
non-alliance to the alliance, the routing devices should be reconfigured
and the logical topology should be changed. In addition, if the BGP policy
is not manually set, you cannot select the best route through the alliance.
R oute Da mping
Route damping (route attenuation) is a technology controlling the
unstability of routes. It significantly reduces the unstability caused by
route oscillation.

The route damping divides the route into normal performance and bad
performance. Routes with normal performance demonstrate long-term
high stability. In addition, the route with bad performance demonstrate
unstability in short term. The route with bad performance should be
punished with direct proportion to the expected route unstability. Unstable
routes should be suppressed until the route becomes stable.
The recent history of the route is the basis of evaluating the future
stability. To know the route history, first, you should know the swing times
of the route in certain period. In the route damping, when the route
swings, it is punished. When the punishment reaches a predefined limit,
the route is suppressed. After the route is suppressed, the route can
increase punishments. The more frequent the route swing is, the earlier
the route will be suppressed.
Similar rules are used to un-suppress the route and re-advertise the route.
An algorithm is used to exit (reduce) punishment according to the power
law. The basis of configuring the algorithm is the parameters defined by
users.
BG P G raceful R estart
Principle of BGP Graceful Restart
After the route device becomes faulty, the neighbors in the BGP route
layer will detect that the neighborship becomes down and up, which is
called BGP neighbor oscillation. The oscillation of neighborship finally
causes the route oscillation. As a result, route blackhole occurs after the
routing device is restarted for a while or the data service of the neighbor
bypasses the restarted routing device. Consequently, the reliability of the
network is decreased.
The BGP graceful restart in the case of routing device failure prevents the
route disturbance and accelerates the route aggregation, which ensures
the network reliability.
Process of BGP Graceful Restart
Through BGP graceful restart, the following aspects are expanded:
1. In the BGP OPEN message, the graceful restart capability is added. The
fields are as follows:
Restart-flag: indicates whether the neighbor is restarted, 1: Yes; 0: No.

AFI/SAFI: the address family supporting graceful restart;
Fwd-flag: if an address family has the graceful restart capability, and

request for reserving the address family route, the value is 1. Otherwise,
the value is 0;
2. In the BGP update packets, add the EOR flag to indicate that the
update is complete.
3. Three timers are added
Restart-timer: Helper end is started, indicates that the reconstruction

session enters the longest waiting time of the GR flow
Stale-path-timer: Helper end is started, the longest time of reserving

routes;
Defer-timer: restarter end is started, the longest time of delaying

calculation and advertisement
Figure 34-33 Graceful restart flow
Restarter end (Switch-A):
1. At the beginning of creating neighbors, negotiate the GR capability

through the open message;
2. When any fault occurs, the forwarding layer of switch A reserves the
route and continue guiding the forwarding;
3. Re-construct the neighbor, send open messages. The restart-flag is set

to 1, which indicates that the restart is performed, notifying the

restart-time value and the reserved address family route to the

neighbors.
4. After the neighbor is restarted, start defer-timer to receive updates

from the neighbors.
5. Delay the route calculation until the EOR flag from the neighbor is
received or the deter-timer times out.
6. Calculate the route, update the core route and advertise the route.
Helper end (Switch-B):
1. At the beginning of creating neighbors, negotiate the GR capability,

and record that the neighbor has the GR capability.
2. After the restarter end becomes faulty, if any TCP error is detected,
run step 3, if no TCP error is detected, run step 4.
3. Reserve Routes; start the restart timer.
4. Re-construct neighbors and delete the restart timer. If the timer exists,
start the stale-path timer.
5. Before the creation, the restart timer times out, or the fwd-flag in the
corresponding address family of the open message is not 1, or the
corresponding address family information is not contained, run step 8.
6. Send routes to the restart routing device. Then, send EOR flag.
7. If the stale-path times out before the EOR is received, run step 8.
8. Delete the reserved route and then enter the normal BGP flow.

GVRP Technology
This chapter describes the GVRP and GARP technology and the application.
Main contents:
GVRP overview and GARP principle
Implementation of GVRP
Typical Application
GVRP Overview and GARP

Principle
This chapter describes the GVRP concept and GARP principle.
Main contents:
GVRP overview
GARP principle
GVRP Overview
Generic Attribute Registration Protocol (GARP) provides the mechanism of
generic attribute registration, de-registration, and transfer. According to
different attributes of the GARP protocol packets, different upper layer
protocol applications are supported.
GARP VLAN Registration Protocol (GVRP) is one application of GARP. It

implements VLAN dynamic registration, de-registration, and attribute
transfer. The GARP protocol distinguishes applications through the
destination MAC of the protocol packets. The destination MAC of GVRP is

01-80-c2-00-00-21. The GVRP can only be configured in the port of trunk

mode.
GARP Principle
G AR P Message
The information exchange between GARP members is through three types
of messages: join message, leave message, and LeaveAll message.
Join Message
When a GARP application entity wants other entity to register its own
attribute information, it will send join message. When the join message
from other entities is received or some attributes are statically configured
in the entity, if you need other GARP application entity to register, it will
send join message.
The join message includes JoinEmpty and JoinIn. The differences are as
follows:
JoinEmpty: Announce an attribute not registered.
JoinIn: Announce an attribute registered.
Leave Message
When a GARP application entity wants other devices to de-register its own
attribute information, it will send the Leave message. When you de-
register some attributes after receiving the Leave message from other
entities or you de-register some attributes statically, it will send the leave
message.
The Leave message includes LeaveEmpty and LeaveIn. The differences are
as follows:
LeaveEmpty: De-register an attribute not registered.
LeaveIn: De-register an attribute registered.
Leaveall Message
When each GARP application entity is started, the LeaveAll timer will be
started at the same time. If the timer times out, the GARP application
entity will send the LeaveAll message. The LeaveAll message is used to

de-register all attributes. Then, other GARP application entity re-register

all attribute information in the local entity.
Note
For the GARP protocol standard, refer to IEEE 802.1D.
G AR P Timer
Join Timer
The Join timer is used to control the sending of Join message (including
JoinIn and JoinEmpty). To ensure the reliable transmission of the Join
message, you have to wait for the interval of the Join timer after the first
join message is sent. If the JoinIn message is received within one Join
timer interval, the second Join message will not be sent. If the JoinIn
message is not received, re-send a Join message.
Hold Timer
The hold timer is used to control the sending of Join message (including
JoinIn and JoinEmpty) and Leave message (including LeaveIn and
LeaveEmpty).
When the attribute is configured in the application entity or the application

entity receives a message, the entity will not send the message to other
devices immediately. It waits for a hold timer interval before sending the
message. The device encapsulates the messages received in the Hold
timer interval into the least packets to reduce the amount of sent packets.
The value of Hold timer should be less than or equivalent to half of the
Join timer value.
Leave Timer
The Leave timer will be started after each application entity receives the
Leave or LeaveAll message. If the Join message of the attribute is not
received before the Leave timer times out, the attribute will be de-
registered.
LeaveAll Timer
After each GARP application entity is started, the LeaveAll timer will be
started. If the timer times out, the GARP application entity will send
LeaveAll message. Then, the LeaveAll timer is started to start a new cycle.

G AR P Packe t For mat
GARP Packet Format
The description of the GARP protocol fields
Field Description Value
Protocol ID Protocol ID 1
Message Each message is composed of

attribute type and attribute list
Attribute Type The type of the attribute The value of GVRP attribute
type is 1; it indicates the VLAN
ID
Attribute List Attribute list
Attribute Each attribute is composed of

attribute length, attribute event,
and attribute value.
Attribute Length Attribute length (including the 2-255 bytes

length field)

Attribute Event Attribute event 0: LeaveAll Event

1: JoinEmpty Event
2: JoinIn Event
3: LeaveEmpty Event
4: LeaveIn Event
5: Empty Event
Attribute Value Attribute value The attribute value of GVRP is

the VLAN ID. But the value of
LeaveAll attribute is invalid
End Mark End flag 0x00
Implementation of GVRP
GVRP is one application of the GARP. It maintains the VLAN dynamic
registration information and transmits the information to other devices
based on the GARP working mechanism. The manually configured VLAN is
called a static VLAN. The VLAN created through the GVRP protocol is called
a dynamic VLAN.
Enable the GVRP function (enable the GVRP function globally; enable GVRP
in the trunk port). Transmit the VLAN information allowed by the trunk
port to the connected network segment through the GVRP packet. When
the switch on the network segment receives the GVRP packets, it registers
or de-registers the LAN according to the parsed packet information. At the
same time, the switch transmits the VLAN information to the network
segment of the active port. As a result, the VLAN information is
transmitted to the entire switching network. When the GVRP is
transmitting information, the VLAN information is only transmitted in the
corresponding active ports (in the forwarding status). The active status of
the port is retrieved from the MSTP module. If the port is not in the
FORWARDING state in the instance mapped by the message VLAN after
receiving the message, directly drop the message and do not transmit it.
GVRP has three registration modes. Different modes have different

processing mode for static VLAN and dynamic VLAN. The definition of
three GVRP registration modes is as follows:
Normal Mode
Allow the port to dynamically register or de-register VLAN, to transmit the

information about dynamic VLAN and static VLAN.

Fixed Mode
Forbid dynamic registration or de-registration of the VLAN in the port;

transmit only the information about static VLAN and the information about
dynamic VLAN is not transmitted. Namely, for the trunk port set to be
Fixed, even if all VLANs are allowed, only the manually configured VLANs
can pass the port.
Forbidden Mode
Dynamic registration and de-registration of VLAN in the port are forbidden.

The information about the VLAN except VLAN1 cannot be transmitted.
Namely, for the trunk port set to be Forbidden, even if all VLANs are
allowed, only the VLAN1 can pass the port.
Note
For the GVRP protocol standard, refer to IEEE 802.1Q.
Typical Application
Through the GVRP function, you only need to configure the VLAN of some
devices (boarder devices). Then, the VLAN configuration can be
automatically applied to the switching network, which reduces the work of
the administrator and reduces the possibility of making mistakes.
GVRP networking diagram

The preceding figure describes the dynamic creation of VLAN in the

network. In each device, the GVRP function is enabled. The GVRP function
is enabled in the ports where devices are connected. The port is configured
as trunk port and permit all VLANs to pass. In this case, you only need to
statically create VLAN 10-20 in switch A and switch G. As a result, other
devices can dynamically learn the VLAN attributes and then VLAN10-20
can be created dynamically.

Private VLAN Technology
This section describes the Private VLAN protocol technology and the
application. The function is just applicable to MyPower 3400 and
MyPower4100.
Related Terms of Private VLAN Protocol

Private VLAN(PVLAN): The private VLAN divides the L2 broadcast domain
of one VLAN to multiple sub domains. Each sub domain comprises one
private VLAN: Primary VLAN and Secondary VLAN.
Primary VLAN: The primary VLAN represents one sub domain. All PVLANs
in one PVLAN domain share one primary VLAN;
Secondary VLAN: There are two types of primary VLAN, including Isolate
VLAN and Community VLAN;
Isolated VLAN: The ports in one Isolated VLAN cannot perform the L2
communication each other. There is only one Isolated VLAN in one PVLAN
domain;
Community VLAN: The ports in one community VLAN can perform the L2
communication each other, but cannot perform the L2 communication with
the ports in other community VLAN. There can be multiple community
VLANs in one PVLAN domain.
Promiscuous port: It belongs to the primary VLAN and can communicate

with any port in the PVLAN domain, including the Isolated ports and
Community ports of the secondary VLAN in one PVLAN domain.

Isolated port: It belongs to the Isolated VLAN and can only communicate
with the promiscuous port.
Community port: It belongs to the community VLAN. The community ports

in one community VLAN can communicate with each other and also can
communicate with the promiscuous ports, but cannot communicate with
the community ports in other community VLANs or the Isolated ports in
the Isolated VLAN.
Introduction to Private VLAN Protocol

The VLAN domain in the standard Ethernet is the broadcast domain. The
L2 communication can be performed between the users in one VLAN,
which is sure to bring a serious hidden trouble for the network security.
The traditional solution is to distribute one separate VLAN for each user
that needs to be isolated, which brings twp aspects of problems. One is
the resource problem. At first, there are only 4096 VLANs and 1-4094
VLANs are usually configured, which restricts the user quantity supported
by the service provider. Secondly, one VLAN is usually specifies one
subnet address or a series of addresses. If distributing too many VLANs,
too many IP resources are consumed. The other is the management
problem. Based on the previous description, when there are users that
need to be added or deleted, you need to ser VLAN and IP and the
network management is difficult. To sum up, the traditional scheme of
solving the L2 isolation brings two aspects of problems, that is, resource
consumption and management.
PVLAN (Private VLAN) is the technology of distributing and using VLAN

resources in the carriers network. The basic theory of the technology is to
endure the VLAN with two different kinds of attributes, that is, Primary
VLAN and Secondary VLAN. Primary VLAN is for the carriers network,
while Secondary VLAN is for the connected network of the user. According
to the different L2 forwarding isolation rules, Secondary VLAN is divided to
Isolated VLAN and Community VLAN. The port contained in Secondary
VLAN is called host port. According to the two types of Secondary VLAN,
the host port is divided to Isolated Port and Community Port. The port for
the carriers network in Primary VLAN is called promiscuous port.
Primary VLAN and Secondary VLAN form one PVLAN domain. One PVLAN
domain must contain one and at most one Primary VLAN (therefore, we
take Primary VLAN to represent PVLAN domain), and can contain multiple
Community VLANs and at most one Isolated VLAN. The promiscuous port
belongs to all PVLANs of the PVLAN domain, while the host port only
belongs to its own Secondary VLAN and Primary VLAN.

In PVLAN domain, the host port of Isolated VLAN can only communicate
with the promiscuous port of primary VLAN, while the host ports in
Isolated VLAN cannot communicate with each other. The host port of
Community VLAN can communicate with the promiscuous port of primary
VLAN and the other host ports in the Community VLAN.
The Secondary VLAN of PVLAN domain is transparent for the L3 function,

that is, all L3 functions should be bound to the Primary VLAN. All ports of
the PVLAN share the same L3 interface.
To ensure the normal forwarding of the packets in the PVLAN domain,

ensure that all VLANs in the PVLAN run on one MSTP instance.
Typical Application of Private VLAN

The PVLAN networking is as follows:
The above figure is one complete PVLAN domain. VLAN 2 is Primary VLAN;
VLAN 100 is Isolated VLAN; VLAN 101 and VLAN 102 are Community VLAN.
Port 0/0/7 is Promiscuous Port; Port 0/0/1 and Port 0/0/2 are Isolated

Port ; Port 0/0/3, Port 0/0/4, Port 0/0/5 and Port 0/0/6 are all Community
Port.
Port 0/0/7 can communicate with Port 0/0/1-Port 0/0/6; Port 0/0/1 and
Port 0/0/2 can only communicate with Port 0/0/7; Port 0/0/3 and Port
0/0/4 can communicate with each other and with Port 0/0/7. Port 0/0/5
and Port 0/0/6 can communicate with each other and with Port 0/0/7.
For details about the PVLAN configuration, refer to PVLAN Configuration

Manual.

Voice VLAN Technology
This chapter describes the Voice VLAN protocol technology and application.
The function is only applicable to MyPower 3400 and MyPower4100.
Related Terms of Voice VLAN Protocol

Voice VLAN: It is the VLAN used to transmit the VoIP data. It also means
the function of identifying and distributing the VoIP data at the access
layer, provided by the MyPower 3400 and MyPower4100 series switch.
OUI address: The address range got by performing and on the MAC
address and address mask, used to identifying the packet sent by the VoIP
device of the manufacturer.
Introduction to Voice VLAN

With the development of the VoIP technology, the IP telephones and IAD
(Integrated Access Device) are being applied more widely, especially in
the broadband districts. In the network, there is voice data and service
data at the same time. During the transmission, the voice data should
have the higher priority than the service data, so as to reduce the delay
and packet loss.
The traditional method of improving the transmission priority of the voice

data is to use ACL to distinguish the voice data and use QoS to ensure the
transmission quality. To simplify the user configuration and manage the
transmission policy of the voice flow more conveniently, MyPower 3400
and MyPower4100 series switch provides the Voice VLAN function, which
identifies the voice flow via the source MAC address of the packet and
sends the voice flow to the specified VLAN (Voice VLAN).
MyPower 3400 and MyPower4100 series switch matches the source MAC
address of the packet via the OUI address. The packet that complies with

OUT address is regarded as the VoIP packet. By default, five OUI

addresses are pre-configured in the system.
Table 1: The pre-set default OUI address of the switch
Serial
OUI address Manufacturer
No.
1 0003-6b00-0000 Cisco phone
2 000f-e200-0000 H3C Aolynk phone
3 00d0-1e00-0000 Pingtel phone
4 00e0-7500-0000 Polycom phone
5 00e0-bb00-0000 3Com phone
When the source MAC address of the packet matches the OUI address of
the VoIP device, the data is regarded as the VoIP data, the priority of the
packet is automatically modified, and the packet is forwarded to the
corresponding Voice VLAN, ensuring the call quality.
When configuring the Voice VLAN on the port, the user can choose the
following two application modes:
Auto mode: When the port configured as the auto mode receives the
VoIP packet, automatically modify the priority of the packet, forward
the packet to the corresponding Voice VLAN, and use the aging
mechanism to maintain the ports in Voice VLAN. If the port does not
receive the data from the MAC address any more before the aging
time reaches, the MAC address automatically exits from the Voice
VLAN.
Manual mode: The user needs to use the default vid of the command
configuration port as the vid of the voice vlan.
The port in auto mode only processes the untagged voice flow. The system
uses the untagged packet sent on the VoIP device regularly, learns the
source MAC address and automatically adds the MAC address of the VoIP
device to the Voice VLAN; the MAC address that reaches the aging time,
but cannot update the OUI address is automatically deleted from the Voice
VLAN. The user needs to adopt the command to add the port to the Voice
VLAN or remove the port from the Voice VLAN manually.
The port in manual mode processes the voice flow in the configured VLAN.
The user needs to adopt the command to add the port of the access IP
telephone to the Voice VLAN directly.
The system regards that the tag packet is distributed with the priority, so
does not need to modify the packet priority.

Ports Cooperating with IP Telephone

Sending tagged Voice Flow
To send tagged voice flow, the IP telephone needs to get the Voice VLAN
information automatically or manually. As for this, different types of ports
need the corresponding configurations so that the voice packets can be
transmitted normally in the Voice VLAN and does not affect the forwarding
of the common service packets.
The IP telephone that is configured with Voice VLAN manually does not
need the process of requesting the IP address in the default VLAN for the
first time, but always send/receive the voice flow with Voice VLAN Tag.
However, the IP telephone that is configured with IP address and voice
VLAN directly initiates registration and communication with the voice
gateway.
Therefore, when cooperating with the IP telephone whose Voice VLAN

information is known and that sends the tagged voice flow, the ports
connected to the IP telephone on the switch need to meet the following
conditions:
Table 2 The conditions for all types of ports to cooperate with the IP phone
that automatically gets the Voice VLAN information
Voice VLAN Work

Port Type Support or Not Condition
Mode of Port
Do not support;
the port sends
the Tag data, so
Access - -
it cannot be
configured as the
Access port.
You need to configure the default
VLAN of the port and configure the
port to permit the packets of the
Auto mode
Trunk Port Support default VLAN to pass; the default
(tag+untag)
VLAN cannot be Voice VLAN
(PVID=Voice-VLAN, allowed tag-
list contains the PVID)

Voice VLAN Work

Port Type Support or Not Condition
Mode of Port
Similar to the auto mode, you also
need to configure the port to
Manual mode permit the packets of the default
(tagged) VLAN to pass (PVID=Voice-
VLAN, allowed tag-list contains the
Voice-VLAN)
You need to configure the default
VLAN of the port and configure the
port to permit the packets of the
Auto mode
default VLAN to pass without tag;
(untag+tag)
the default VLAN cannot be Voice
VLAN (PVID=Voice-VLAN, PVID
Hybrid Port Support is untag mode)

Similar to the auto mode, you also
need to configure the port to
Manual mode permit the packets of Voice VLAN
(tagged) to pass with Tag (PVID=Voice-
VLAN and tag-list contains Voice-
VLAN)
Note In the above conditions, if the user configures the Voice VLAN
information of the IP phone manually, whether the access port needs to
permit the packets of the default VLAN to pass depends on whether the
port is connected to common PC, so the default VLAN is mainly used to
transmit the common service packets of the PC. If no common PC is
connected, the port does not need to permit the packets of the default
VLAN to pass.
Ports Cooperating with IP Telephone

Sending untagged Voice Flow
To make the switch receive the untagged packets, the user needs to
configure the default VLAN of the receiving port and configure the port to
permit the default VLAN to pass. When the IP phone sends the untagged
voice flow, the default VLAN of the port needs to be configured as the
default VLAN of the port as Voice VLAN so that the voice flow can be
transmitted in the Voice VLAN. This is equivalent to configure the port to
be added to the Voice VLAN manually. Therefore, if the IP phone sends
untagged voice flow, the Voice VLAN work mode of the port can only be
manual mode, but cannot be configured as auto mode.

Table 3 The conditions for all types of ports to cooperate with the IP phone
that sends the untagged voice flow in manual mode
Port Type Support or not Condition

Configure the default VLAN as Voice VLAN
Access Support
(PVID=Voice-VLAN)
The default VLAN of the access port must be Voice
Trunk Port Support VLAN and the access port permits the VLAN to pass
(PVID=Voice-VLAN).
The default VLAN of the access port must be Voice
VLAN and must be in the untagged VLAN list that the
Hybrid Port Support
access port permits to pass (PVID=Voice-VLAN
untag-list contains PVID)
From the point of the switch:
If the port enables the Voice VLAN function and is configured as the auto
mode, use the PVID of the port to forward when receiving the first
untagged packet, and later, forward the packets according to the matching
status of the source MAC; if the tagged packet is received and tagged is
Voice-VLAN, forward the packet in Voice-VLAN.
If the port enables the Voice-VLAN function and is configured as the

manual mode, use the PVID of the port to forward when receiving the
untagged packet (PVIDVoice-VLAN); if the tagged packet is received and
tag is permitted to pass the port, forward the packet in tag-vlan.
For the port in manual mode whose default VLAN is Voice VLAN, any
untagged packet can be transmitted in Voice VLAN, but do not need to use
OUI to check.
Precautions
Voice VLAN uses some limitation conditions, as follows:
VLAN1, super-vlan, p-vlan, and QinQ cannot be configured as voice-vlan.

The interactive check is needed in the realization.
Voice-VLAN supports the aggregation port.

By default, the OUI information of the manufacturer is loaded. When the

Voice VLAN initiates, the OUI information of the manufacturer is written
into ACL; meanwhile, the user cannot delete the OUI information.
Both Voice VLAN and MAC VLAN need to use the hardware resources of
MAC VLAN. When Voice VLAN and MAC VLAN are configured for one MAC
address, only the configuration of Voice VLAN takes effect.
Typical Application of Voice VLAN

The auto mode is suitable for the networking (as shown in Figure 1) where
the PC-IP phones are connected in series (the ports transmits the voice
data and common service data); when the user performs the voice
communication, the port can transmit the voice data first; when there is
no voice flow, the port can process the common service data in full sail.
Figure 1 The network diagram when host and IP phone are connected to
switch in series
The manual mode is suitable for the network mode (as shown in Figure 2)
where the IP phone is separately connected to the switch (the ports only
transmit the voice packets). The static adding mode can make the port be
used to transmit the voice data privately, avoiding the influence of the
service data for the voice data transmission furthest.
Figure 2 The network diagram when IP phone is separately connected to

switch

Neighbor Discovery
Technology
This chapter describes the neighbor discovery technology and its

application.
Main contents:
NDSP and relevant terms
Introduction to NDP
Typical Application
NDSP and Relevant Terms

Neighbor: the devices connected with the local device are the neighbors of
the device
Hello packets: the packets are the basis of maintaining the neighbor
relation. In the packets, the information about the sender is encapsulated
for the receivers to learn and update.
Aging time: when the local devices failed to receive hello packets sent
from the neighbors after the aging time, the neighbor is thought to be
nonexistent. Delete the neighbor from the neighbor list.
Introduction to NDSP
The NDSP protocol is for detecting the directly connected Maipu devices.
The NDSP uses the hello messages (NDSP packets) periodically sent
between two directly-connected devices to maintain the neighbor relation.
By default, each Maipu device sends a NDSP packet to the connected
opposite party at an interval of 60 seconds. If no NDSP packets from the

opposite party are received after three hello periods (180 seconds,
holdtime or TTL), the local device deletes the NDSP neighbor device in the
NDSP neighbor table.
Typical Application
Illustration
As shown in the preceding figure, two switches are connected through port
0/0/0.
Configuration of Switch-a:
Command Description
SwitchA(config)#ndsp run Enable NDSP globally
SwitchA(config)#ndsp timer 30 Send hello packets of NDSP at an interval
of 30 seconds
SwitchA(config)#ndsp holdtime 150 Set the aging time of NDSP neighbor to
150 seconds
SwitchA(config)# port 0/0/0 Enter the port configuration mode.
SwitchA(config-port-0/0/0)#ndsp enable Enable NDSP port
Configuration of Switch-b:
Command Description
SwitchB(config)#ndsp run Enable NDSP globally
SwitchB(config)#indsp timer 35 Send hello packets of NDSP at an interval
of 30 seconds
SwitchB(config)#ndsp holdtime 160 Set the aging time of NDSP neighbor to
160 seconds
SwitchB(config)# port 0/0/0 Enter the port configuration mode.
SwitchB(config-if-dialer0)#ndsp enable Enable NDSP port

MFF Technology
This chapter describes the MFF technology and the application.
Main contents:
MFF technology
Typical application
MFF Technology
In the traditional Ethernet networking scheme, to realize the L2 isolation
and L3 intercommunication between different client hosts, adopt the
method of dividing VLAN on the switch, but when there are many users
that need the L2 isolation, it occupies lots of VLAN resources. Meanwhile,
to realize the L3 intercommunication between clients, you need to divide
different IP segment for each VLAN and configure the IP address of the
VLAN interface. Therefore, dividing too many VLANs reduces the
distributing efficiency of the IP addresses.
To improve this, MAC-Forced-Forwarding (MFF for short) provides one

solution for realizing the L2 isolation and L3 intercommunication between
the client hosts in one WAN.
MFF intercepts the ARP request packet of the user and replies the ARP
response packet of the gateway MAC address via the ARP pick-up
mechanism. In this way, you can force the user to send all traffic
(including the traffic in one subnet) to the gateway so that the gateway
can monitor the data flow, avoiding the vicious attack between users and
ensuring the security of the network deployment.

MFF Terms
Related terms:
AN (access node): the access node of the user terminal; usually, it

refers to the access switch of the user;
AR (access router): the access router of the user terminal or the switch
with the L3 function; usually, it refers to the gateway of the subnet where
the user is located;
AS (access server): the server that provides the specified service;
User port: the port that is directly connected to the network terminal user;
Network port: the ports that connect to other network devices, such as
access switch, aggregation switch and gateway.
MFF principle:
The MFF principle processes the following three aspects:
Get the IP address and MAC address of AR. In the DHCP environment,
get the IP address of AR via DHCP snooping and get the MAC address
of AR via ARP; in the static IP address environment, you need to pre-
configure the default IP address of AR and then get the MAC address
of AR via ARP.
Intercept the ARP request packet of the user and reply the MAC
address of AR to the user. In this way, the ARP request host forms
the MAC addresses to all other hosts as the ARP entries of the MAC
addresses of AR. When receiving the request packet for the user host
from AR, reply the MAC address of the user host to AR.
Filter the uplink packets and drop all unicast packets except for those
whose MAC address is AR. Because of the virus or other network faults,
the unicast packets whose destination MAC is other host may be
received, so these packets need to be dropped.
MFF port features:
The VLAN in which MFF is enabled include two port roles, that is, user port
and network port. The two ports only limit the ingress packets.
1. User port (the port connected to the user terminal device) processes
different packets as follows:
Permit multicast packets and DHCP packets to pass

The ARP packets are sent to CPU for processing
When the MAC address of AR is learned, permit the unicast packet

with destination MAC as AR to pass and drop the other packets;
when the MAC address of AR is not learned, drop the unicast
packet with destination MAC as AR;
Drop the other packets;
2. Network port (the port of AN connected to other device devices)

processes different packets as follows:
Permit multicast packets and DHCP packets to pass
Send the ARP packets to CPU for processing
Permit unicast packets to pass
Drop the other packets
In the VLAN enabled with the MFF function, all ports are the user ports by
default. The network ports need to be enabled via the command. The
limitation feature of the network ports and users for packets is just in the
VLAN enabled with the MFF function. In the VLAN not enabled with the
MFF function, the user ports and network ports do not have the above
features.
MFF gateway detection:
To get the ARP information of the gateway and ensure the availability of
the gateway, after enabling the MFF function of VLAN, the gateway
detection function is enabled by default. The user can force disabling the
gateway detection function via the command. The gateway detection relies
on the ARP information of the user. When one user is connected, MFF
intercepts the ARP packet of the user and uses the ARP information of the
user to detect the gateway. If the gateway is unavailable , the detection
interval is 5s; if the gateway is available, the detection interval is 30s by
default. The user can configure the detection interval of the gateway (the
gateway detection interval configured by the user can take effect only
when the gateway is available; when the gateway is unavailable, the
gateway detection interval is fixed as 5s.)
User ARP aging
After MFF learns the ARP of the connected user from the user port, the
ARP aging function of the user is enabled by default. You can use the
command to disable it. By default, the aging interval is 300s. The user can
configure the aging interval. If the user ARP is not received in successive
four aging time, regard that the user does not exist any more and delete
the ARP information of the user.

Typical Application
Figure 40-1 MFF typical application example
As shown in the figure, switch A and switch B are the access devices of the
user terminal; switch C is the aggregation device.
Gateway: 10.1.1.254 0001.7a4c.a945; server: 10.1.1.253
Host A, Host B and Host C are the user hosts, which all belong to VLAN 10.
The corresponding IP addresses are 10.1.1.1 10.1.1.2 10.1.1.3. The MFF
function is enabled on the access device of the user terminal switch A and
switch B. When host A wants to communicate with host B, send ARP to
request the MAC address of host B; switch A intercepts the ARP request
and replies the MAC address of the gateway to Host A. As a result, host A
regards the MAC address of the gateway as the MAC address of Host B by
mistake. Therefore, it sends data to gateway. After the gateway receives
the data from host A, it is found that the destination IP address is Host B.
After querying the route, the gateway sends the route query result to Host
B. Similarly, the data sent from Host B to Host A is forwarded via the
gateway.
The data forwarding path is as follows:

Figure 40-2 MFF data forwarding path
Switch A configuration:
Command Description
SwitchA(config)#port 0/1-0/2 Enter port mode
SwitchA(config-port-range)#port access vlan 10 Add port 0/1,0/2 to VLAN 10
SwitchA(config-port-range)#port 0/3 Enter port 0/3
SwitchA(config-port-0/3)#port mode trunk Set port 0/3 as trunk port
SwitchA(config-port-0/3)#port trunk allowed vlan 10 Add port 0/3 to VLAN 10
SwitchA(config-port-0/3)#mac-forced-forwarding Set port 0/3 as the network port
network-port
SwitchA(config-port-0/3)#exit Exit the port mode
SwitchA(config)#vlan 10 Enter the VLAN mode
SwitchA(config-vlan10)#mac-forced-forwarding Configure the default gateway of VLAN
default-gateway 10.1.1.254 as 10.1.1.254
Switch B configuration:
Command Description
SwitchB(config)#port 0/1 Enter port mode
SwitchB(config-port-0/1)#port access vlan 10 Add port 0/1 to VLAN 10
SwitchB(config-port-0/1)#port 0/2 Enter port 0/2
SwitchB(config-port-0/2)#port mode trunk Set port 0/2 as trunk port
SwitchB(config-port-0/2)#port trunk allowed vlan 10 Add port 0/2 to VLAN 10

SwitchB(config-port-0/2)#mac-forced-forwarding Set port 0/2 as the network port

network-port
SwitchB(config-port-0/3)#exit Exit the port mode
SwitchB(config)#vlan 10 Enter the VLAN mode
SwitchB(config-vlan10)#mac-forced-forwarding Configure the default gateway of VLAN
default-gateway 10.1.1.254 as 10.1.1.254

PPPoE+ Technology
This chapter describes the principle and application of the PPPoE+

technology.
Main contents:
PPPoE+ principle
PPPoE+ typical application
PPPoE+ Principle
With the popularity of the network construction based on the IP
technology and being richer of the user service type, carriers need to
enhance the control capability for the user service data. Currently, IP
DSLAM serves as the main access device of DSL. The upstream BAS
cannot or is hard to get the user port information from the Ethernet packet,
so it cannot authenticate and manage the user ports in a unified manner
and cannot prevent the user account from being embezzled effectively.
PPPoE is short for PPPoE Intermediate agent. At first, the scheme is put
forward on the DSL FORM forum and is defined according to the RFC 3046
user line ID field. The original idea of the PPPoE+ scheme is that after
receiving the PPPoE PADI and PPPoE PADR packets of the user, DSLAM
adds the PPPoE+ tag that indicates the user physical port number or PVC
in the packet. After identifying PPPoE+ Tag, the upstream BRAS extracts
the physical location information of the user and uses the Radius NAS-
Port-ID attribute to Radius Server for user identification and user
management.

Figure 41-1
As shown in the above figure, the PPPoE+ flow is as follows:
1. The user terminal initiates the PPPOE request and sends the PPPOE
PADI packet;
2. DSLAM captures the PADI packet and sends it to PPPoE Intermediate

Agent for processing;
3. PPPoE Intermediate Agent writes the physical location information of

the user into the PADI packet as VSA (Vendor Specified Attribute)
according to the physical location of the user. The VSA is PPPoE+ Tag.
4. After receiving PADI+VSA, BRAS replies the PADO packet to the user;
5. The terminal sends the PADR packet to request access according to

the normal flow;
6. DSLAM captures the PADR packet and inserts PPPoE+ Tag to the PADR
packet;
7. After receiving PADR+VSA, BRAS distributes one PPP Session ID for

the STB and bind the PPPoE+ Tag and PPP Session ID;
8. Here, BRAS can process the PPP flow normally. After the PPP flow is
complete, BRAS sends PPPoE+ Tag to the IPTV service system and
Radius Server via Radius NAS-Port-ID.

PPPOE+ Typical Application
Figure 41-2
The above figure is the typical application environment of PPPoE+. pc A

and pc B initiate the PPPoE connection request to router A via switch A and
switch B. After enabling the PPPoE+ function in the access ports of switch
A and switch B, radius server records the access information of pc A and
pc B. If changing the port or the switch is re-connected after pc A and pc b
are connected successfully, radius server can discover the change of the
access location and do the corresponding processing according to the user
configuration, so as to control the user access.

MyPower Switch Manual SEO

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MyPower Switch Manual SEO

Uploaded by

Copyright:

Available Formats

MyPower Switch Technical

Maipu Communication Technology Co., Ltd

Maipu Confidential & Proprietary Information Page 1 of 628

All rights reserved. Printed in the Peoples Republic of China.

Maipu Communication Technology Co., Ltd

Maipu Confidential & Proprietary Information Page 2 of 628

Maipu Feedback Form

Document Title MyPower Switch Technical Manual

If you wish to be contacted, complete the following:

Maipu Confidential & Proprietary Information Page 3 of 628

Application of OSI Model ....................................................................................... 20

System Displayed Information ............................................................................... 22

Switch Principles ................................................................................... 41

Multiple Layer Switching Technology ...................................................................... 46

Maipu Confidential & Proprietary Information Page 4 of 628

Switch and Router .............................................................................................................. 48

VLAN Division ....................................................................................................... 51

Typical Application ................................................................................................ 54

Classification of Link Aggregation ........................................................................... 57

Typical Application ................................................................................................ 60

MSTP Protection Function ...................................................................................... 67

MSTP Typical Application ....................................................................................... 69

Maipu Confidential & Proprietary Information Page 5 of 628

L2 Protocol Peer ................................................................................................................. 77

Realize L2 protocol control ..................................................................................... 78

Typical Application ................................................................................................ 78

L2 Static Multicast and Its Application ..................................................................... 82

IGMP Snooping and Its Application ......................................................................... 83

IGMP Proxy and Its Application .............................................................................. 87

MVR and Its Application......................................................................................... 89

MVP and Its Application ......................................................................................... 92

Security Technology .............................................................................. 95

DHCP Snooping and Its Application ...................................................................... 108

IP Source Guard and Its Application ..................................................................... 113

Maipu Confidential & Proprietary Information Page 6 of 628

Introduction ..................................................................................................................... 114

Dynamic ARP Detection and Application................................................................ 116

Port Security ...................................................................................................... 120

Port Monitoring ................................................................................................... 122

Port Isolation...................................................................................................... 123

SPAN Technology................................................................................. 126

IPv4 Unicast Routing ........................................................................... 132

M-VRF ............................................................................................................... 137

Load Balancing ................................................................................................... 138

RIP Dynamic Routing Protocol.............................................................................. 139

IRMP Dynamic Routing Protocol ........................................................................... 151

Maipu Confidential & Proprietary Information Page 7 of 628

Different TLV Defined in IRMP ............................................................................................ 152

OSPF Dynamic Routing Protocol ........................................................................... 153

IS-IS Dynamic Routing Protocol ........................................................................... 179

BGP Dynamic Routing Protocol............................................................................. 192

ACL Technology ................................................................................... 209

Introduction to Action Group................................................................................ 214

Introduction to Time Domain ............................................................................... 215

QoS Technology ................................................................................... 217

Queue Scheduling Mode ...................................................................................... 219

Drop Mode ......................................................................................................... 221

Speed Restriction................................................................................................ 222

Maipu Confidential & Proprietary Information Page 8 of 628

VLAN-based Traffic Shaping................................................................................. 223

EIPS Technology .................................................................................. 233

Hierarchical EIPS ................................................................................................ 246